Videos

Databricks Auto Loader Demo: Scalable Ingestion with Schema Evolution

In this full demo, Kiran walks through how to use Databricks Autoloader with PySpark to build scalable, resilient file ingestion pipelines in a lakehouse architecture.

You’ll learn how to:

  • Ingest files from cloud storage (ADLS Gen2) without reprocessing old data
  • Avoid missing newly arriving files using file event mode
  • Configure streaming ingestion with spark.readStream
  • Use checkpoint and schema locations correctly
  • Implement schema evolution in two ways
  • Write ingested data to Delta tables
  • Move or manage processed files

The demo covers setup, configuration, streaming concepts, and schema tracking using Autoloader’s schema location and checkpointing.

 

This walkthrough is ideal for:

  • Data engineers building lakehouse ingestion pipelines
  • Architects designing scalable cloud-native data platforms
  • Teams migrating from batch-based ingestion to streaming patterns

If you’re working with Azure Databricks, ADLS Gen2, or Delta Lake, this end-to-end example demonstrates how Autoloader simplifies incremental ingestion and schema management.

Watch the full discussion demo below.