Which design yields better partition pruning when loading weather data with a city field?

Master Snowflake Data Engineer Exam. Study with flashcards and multiple choice questions, each question includes hints and explanations. Prepare for your success!

Multiple Choice

Which design yields better partition pruning when loading weather data with a city field?

Explanation:
Partition pruning in Snowflake relies on micro-partition metadata and per-column statistics. When city is stored in its own separate column, Snowflake can use the city column’s min/max and value distribution to determine which micro-partitions might contain matching rows, allowing it to skip partitions that cannot satisfy a city predicate. Loading the city into a scalar column makes these statistics directly applicable and enables pruning to be done at scan time before touching the data inside rows. If the city value is embedded inside a VARIANT column, the database must extract the city from semi-structured data to evaluate a predicate. That extraction happens after the micro-partition pruning decision, so the pruning effectiveness is reduced because the inner field’s statistics aren’t used for partition pruning. As a result, more micro-partitions may be scanned. So, creating a separate CITY column and extracting CITY during load provides better partition pruning because it exposes the field as a concrete, statistics-enabled column for the optimizer to prune on.

Partition pruning in Snowflake relies on micro-partition metadata and per-column statistics. When city is stored in its own separate column, Snowflake can use the city column’s min/max and value distribution to determine which micro-partitions might contain matching rows, allowing it to skip partitions that cannot satisfy a city predicate. Loading the city into a scalar column makes these statistics directly applicable and enables pruning to be done at scan time before touching the data inside rows.

If the city value is embedded inside a VARIANT column, the database must extract the city from semi-structured data to evaluate a predicate. That extraction happens after the micro-partition pruning decision, so the pruning effectiveness is reduced because the inner field’s statistics aren’t used for partition pruning. As a result, more micro-partitions may be scanned.

So, creating a separate CITY column and extracting CITY during load provides better partition pruning because it exposes the field as a concrete, statistics-enabled column for the optimizer to prune on.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy