Articles

Snowflake table types: Key characteristics and use cases

It goes without saying, but I’ll say it anyway: In the Snowflake Data Cloud, as in most data platforms, tables are a primary component. A fundamental understanding and proper usage of them is critical. For many applications, using the default, permanent table is sufficient. However understanding when and why to leverage other types can aid in the areas of security, performance, organization, and cost management.

While reading and working with tables in Snowflake, I found myself wanting a definitive list of table types. Why are there multiple types and what’s the difference? It was difficult to find a concise source. Below is intended to be a general overview of the table types in Snowflake with some basic guidelines about table usage, configuration and use cases.

Permanent Tables

First and foremost is the Permanent Table, sometimes referred to as a user table. This is the default type and the primary means of storing and managing structured data. Permanent tables generally have a defined schema and live in Snowflake’s compressed columnar format. Snowflake optimizes this data for you—and generally performs wonderfully without much user modification. Below are a few key characteristics of Permanent Tables:

  • Data is stored persistently and retained until explicitly dropped/truncated.
  • Applicable for storing long-term data that may need to be accessed over long periods of time.
  • Supports the loading of structured data, as well as unstructured data such as JSON, Avro, or Parquet data in its variant data type. Note in this instance you do not need a predefined schema.
  • Indexes and statistics are managed by Snowflake—Metadata is leveraged to optimize queries (which means less tuning for us).
  • Created by default (unless it’s created in the context of a transient database).
  • Tables are concurrent—users can read and write from the same tables without interfering with each other.
  • Permanent tables are fail-safe. Backups are managed by Snowflake.

Transient Tables

Transient tables are unique to Snowflake and share characteristics of both temporary and permanent tables.

  • Transient tables are temporary and typically used for intermediate storage.
  • For data that needs to be kept longer than a session but does not require data recovery.
  • Useful for performing complex data manipulations when a permanent table is not required.
  • Transient tables have NO fail-safe period, and a limited time travel period.
  • Can be accessed by others who have the needed permissions.
  • Leveraging these tables provides an opportunity for cost savings.
  • Can have a specified time-to-live (TTL).

Considerations on Transient Tables Use

Transient tables in Snowflake are a flexible option for many use cases, but there are certain situations where they might not be the best choice. Here’s what to consider when deciding whether transient tables are the right fit:

  • Long-Term Data Storage: Transient tables are not suitable for long-term data storage due to their temporary nature and limited time travel capabilities.
  • Critical Data That Requires High Durability and Availability: For critical business data, transient tables are risky because they do not have a fail-safe period. This means that if the data is lost or corrupted, recovery options are limited.
  • Compliance and Regulatory Requirements: If your data storage needs to comply with specific regulatory requirements that mandate data recovery capabilities or longer retention periods, transient tables may not meet these requirements.
  • Data That is Difficult or Impossible to Reproduce: If the data stored cannot be easily reloaded or regenerated (e.g., historical data that is no longer available from the source), using transient tables is risky due to the lack of fail-safe mechanisms.
  • Auditing and Tracking Changes Over Time: If your use case requires tracking changes over a more extended period for auditing purposes, transient tables, with their limited time travel period, are not ideal.
  • Data with Uncertain Usage Patterns: If you are unsure about how long the data will be needed or the frequency of its access, opting for a more permanent solution might be safer to avoid accidental data loss.

Transient tables in Snowflake are best used for temporary or intermediate data storage where the data is not critically important, easily reproducible, and does not require long-term retention or compliance with strict data recovery regulations.

Temporary Tables

  • Exists in the session or transaction
  • Cannot be accessed by others
  • Cannot be cloned
  • Also useful for intermediate results and automatically dropped at the end of the session or transaction
  • Snowflake Optimizes temporary tables like permanent tables so they perform well
  • Temp tables can have the same name as an existing table. Best practice is to ensure a unique name

Transient and Temporary tables are quite similar. The key differences lie in their purpose, scope, and duration.

Key differences between Transient and Temporary Tables

Transient and temporary tables in Snowflake serve different purposes based on their persistence, data recovery options, and access controls. Transient tables, unlike temporary tables, persist beyond the session or transaction that created them and are not automatically dropped at the end of the session. This makes them suitable for storing data that needs to be accessible over a longer period but does not require the permanence and fail-safe features of permanent tables. Additionally, they can be accessed by other users who have the necessary permissions, making them useful for collaborative or ongoing projects that extend beyond a single session.

Temporary tables, on the other hand, are designed for short-lived data needs. They exist only within the scope of the session or transaction that creates them and are automatically dropped at its conclusion, ensuring data privacy and security for session-specific operations. These tables are ideal for scenarios requiring data isolation, such as temporary data manipulations, where data should not be visible or accessible to other users. The ephemeral nature of temporary tables, coupled with their session-bound privacy, distinguishes them significantly from transient tables, which offer a longer-term, yet still not permanent, data storage solution.

External Tables

  • Allows you to access data from files stored in external stages, such as Amazon S3, Azure blob storage or GCP.
  • Allows you to process data from various formats, including delta lake / parquet, Apache Iceburg, JSON, CSV.
  • Enables Snowflake to understand the format and layout of external data files so they can be queried or processed.
  • Schema CAN be defined for external data.
  • Querying the data is real-time, it is not loaded into Snowflake.
  • Security—uses Snowflakes RBAC, privileges are granted to users and roles to read from external tables.
  • Data from Snowflake can be written back to external tables.

Considerations when using external tables

External tables in Snowflake offer significant advantages in accessing and querying data stored in external storage like Amazon S3, Azure Blob Storage, or GCP, however, there are reasons or use cases where they might not be the ideal choice:

  • Data Modification Requirements: External tables are designed primarily for reading data, not for scenarios where data needs to be frequently written back or updated.
  • Performance Considerations: For operations requiring high-performance data processing, external tables might not be the best fit. Querying data is real-time and because the data isn’t loaded into Snowflake, performance may be impacted compared to using Snowflake’s internal storage, particularly for complex queries or large datasets.
  • Consistency and Transactional Integrity: If the use case requires strong consistency and transactional integrity, external tables may not be the best choice. They are more suited to read-only and analytical operations rather than transactional workloads.
  • Data Integration and Transformation: When extensive data integration and transformation are required, using external tables can be limiting. Since the data is not stored within Snowflake, it might complicate the process of integrating and transforming data across different systems and formats.
  • Cost and Resource Optimization: If cost and resource optimization is a priority, especially for large-scale data processing, external tables may incur additional costs and resource utilization due to the nature of accessing data from external sources.
  • Real-Time Analytics on Frequently Changing Data: For real-time analytics where the underlying data changes frequently, external tables might not provide the most up-to-date view of the data, as they are more suited for stable, read-only datasets.

While external tables are useful for certain types of data access and analysis, they are not the best fit for scenarios requiring data modification, high-performance processing, stringent security and compliance adherence, transactional integrity, complex data integration, cost optimization, or real-time analytics on frequently changing data.

Dynamic Tables

Dynamic Tables are currently in public preview. Some of their key characteristics include:

  • Query results are loaded into the Target table defined so transformations can be made prior to storing.
  • Data is automatically refreshed as new data streams in.
  • Cannot Insert, update or delete rows (no DML).

Use cases for Dynamic Tables

Per Snowflake, use scenarios for dynamic tables include when:

  • You don’t want to write code to track data dependencies and manage data refresh.
  • You don’t need, or want to avoid, the complexity of streams and tasks.
  • You do need to materialize the results of a query of multiple base tables.
  • You need to build multiple tables to transform data via an ETL pipeline.
  • You don’t need fine-grained refresh schedule control and you just want to specify the target data freshness for your pipelines.
  • You don’t need to use unsupported dynamic query constructs such as stored procedures, non-deterministic functions not listed in non-deterministic functions supported in dynamic tables, or external functions, or need to use sources for dynamic tables that are external tables, streams, or materialized views.

Hybrid Tables

Hybrid tables are a new table type currently in private preview. They are designed to work as OLTP relational databases with the data stored in rows. This feature in Snowflake would provide a one-stop data shop for applications and analytics.

  • Optimized tables to support transactional and analytical workloads concurrently.
  • Support ACID transactions.
  • Row-based storage enables faster reads and writes.
  • Hybrid tables will require primary keys.
  • Referential integrity is mostly enforced.

The best use cases for Hybrid Tables

Hybrid tables present unique advantages for specific use cases. Here are some of the best scenarios for utilizing hybrid tables:

  • Transactional Workloads: Hybrid tables are ideal for applications that require frequent, short, and fast read-and-write operations, typical in transactional systems like e-commerce platforms, booking systems, or financial transaction processing.
  • Concurrent Analytical and Transactional Processing: If your business requires running analytical queries alongside transactional processes without compromising performance, hybrid tables offer the necessary optimization to handle both workloads concurrently.
  • Real-Time Data Processing: For scenarios where real-time data processing and immediate insights are crucial, such as in inventory management or real-time monitoring systems, the faster reads and writes of hybrid tables are highly beneficial.
  • Applications Requiring ACID Compliance: Hybrid tables support ACID transactions, making them suitable for applications where the integrity of each transaction is critical, such as in banking systems or any application where data consistency and reliability are paramount.
  • Systems with Complex Relationships: Given that hybrid tables require primary keys and mostly enforce referential integrity, they are well-suited for databases with complex relationships and dependencies among data entities.
  • High-Performance Requirements: For applications where high performance is a priority, especially in terms of rapid data retrieval and updates, the row-based storage of hybrid tables can provide the necessary speed and efficiency.
  • Data Integration Platforms: In scenarios where transactional data needs to be integrated with analytical systems for real-time analytics, hybrid tables can serve as a bridge, facilitating seamless data flow between transactional and analytical environments.
  • Legacy System Modernization: Businesses looking to modernize their legacy transactional systems to take advantage of cloud scalability and performance without losing the benefits of traditional row-based relational databases might find hybrid tables an ideal solution.

Hybrid tables in Snowflake are a versatile solution for a range of use cases, particularly where the convergence of transactional and analytical processing is required, along with the need for speed, ACID compliance, and handling complex data relationships.

Green Leaf Consulting Group stands out as a distinctive technology consultancy that provides substantial value across a diverse array of technologies, including Snowflake. We specialize in targeted solutions within data analytics, software development services, and technical strategy. Connect with us to discover how we can assist you in extracting maximum value from your investments in the Snowflake Data Cloud.