CloudFlow: Catching up to the Digital Archive Deluge

Jim Katz (https://www.linkedin.com/in/jimkatzarkiv/)

The Buddhist Digital Resource Center, founded 1999, digitizes and publishes Buddhist texts from many cultures in Asia. Starting with a physical collection of texts , we built an ontology, a trilingual search engine, and a digital viewer which has served the Buddhist research community well. Initially the focus was on Tibetan Buddhism, but we've expanded into other Asian literatures.

This talk starts with a description of our initial processes, which began from an in-house digitization workflow, and how that we approach it as the scale doubles and doubles again.

In addition to the scale doubling, we:
- Receive more of our material in finished batches from partners in Asia
- Perform more processing operations (resizing, split-cropping n-up images into n 1-up images)
- Distribute sets to archival partners (Harvard University, Internet Archive)

I will describe the current state of our processing complex, and how we plan to evolve into CloudFlow: a more effective use of a hybrid AWS cloud to increase our throughput and flexibility:
- Moving from MAC platforms to Debian servers
- Bash to Python and Java
- AWS SNS, SQS, Workflow, for messaging and control
- AWS EBS for temporary file access
- AWS Cloudwatch as the monitoring plane: driving reporting and monitoring.

Drupal is a registered trademark of Dries Buytaert.