Videos
In this full demo, Kiran walks through how to use Databricks Autoloader with PySpark to build scalable, resilient file ingestion pipelines in a lakehouse architecture.
You’ll learn how to:
- Ingest files from cloud storage (ADLS Gen2) without reprocessing old data
- Avoid missing newly arriving files using file event mode
- Configure streaming ingestion with spark.readStream
- Use checkpoint and schema locations correctly
- Implement schema evolution in two ways
- Write ingested data to Delta tables
- Move or manage processed files
The demo covers setup, configuration, streaming concepts, and schema tracking using Autoloader’s schema location and checkpointing.
This walkthrough is ideal for:
- Data engineers building lakehouse ingestion pipelines
- Architects designing scalable cloud-native data platforms
- Teams migrating from batch-based ingestion to streaming patterns
If you’re working with Azure Databricks, ADLS Gen2, or Delta Lake, this end-to-end example demonstrates how Autoloader simplifies incremental ingestion and schema management.
Watch the full discussion demo below.