Yes, it’s possible: Migrating website content into Drupal without using a database or content export

Andrew Cox

Sometimes a client needs to migrate a website to Drupal, but the website database is either inaccessible or the website itself is too complex to setup locally. Since the Migrate API isn’t possible another option is to migrate content using a website crawler built within Drupal using a custom module and the Guzzle library instead of, or in addition to, the Migrate API.

The session will start by sharing information about open source tools outside of Drupal that are available to crawl public web content, as well as edge cases where even when the website database is accessible the migrate API is not the best option. Finally, there will be a demonstration of a custom module that will show in real time how a website can still be migrated smoothly providing just the publicly accessible domain of the website and using tools provided by a basic Drupal installation. In addition to just crawling content, it will demonstrate how to migrate rich text, taxonomy reference, and image media fields as well.

With a minimal amount of additional configuration or coding, website page elements can be used to provide both a list of pages to crawl as well as content fields to migrate. While code examples will be shared and discussed during part of the demonstration, the session will attempt to ensure that even non-developers will learn something during the demonstration.

Drupal is a registered trademark of Dries Buytaert.