Documentation Index
Fetch the complete documentation index at: https://lightdash-mintlify-7725ae17.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Availability: Pre-aggregates are a Beta feature available on Enterprise plans only.
Getting started
Define pre-aggregates in your dbt project and configure scheduling.
Monitoring and debugging
Track materialization status, debug query matching, and view hit/miss stats.
CLI audit
Inspect dashboard coverage from the terminal and gate CI on hit rates.
How it works
Pre-aggregates follow a four-step cycle:- Define — You add a
pre_aggregatesblock to your dbt model YAML, specifying which dimensions and metrics to include. - Materialize — Lightdash runs the aggregation query against your warehouse and stores the results. This happens automatically on compile, on a cron schedule you define, or when you trigger it manually.
- Match — When a user runs a query, Lightdash checks if every requested dimension, metric, and filter is covered by a pre-aggregate.
- Serve — If a match is found, the query is served from the materialized data instead of hitting your warehouse.
Example
Suppose you have anorders table with thousands of rows, and you define a pre-aggregate with dimensions status and metrics total_amount (sum) and order_count (count), with a day granularity on order_date.
Your warehouse data:
| order_date | status | customer | amount |
|---|---|---|---|
| 2024-01-15 | shipped | Alice | $100 |
| 2024-01-15 | shipped | Bob | $50 |
| 2024-01-15 | pending | Charlie | $75 |
| 2024-01-16 | shipped | Alice | $200 |
| 2024-01-16 | pending | Charlie | $30 |
| … | … | … | … |
| order_date_day | status | total_amount | order_count |
|---|---|---|---|
| 2024-01-15 | shipped | $150 | 2 |
| 2024-01-15 | pending | $75 | 1 |
| 2024-01-16 | shipped | $200 | 1 |
| 2024-01-16 | pending | $30 | 1 |
| order_date_month | status | total_amount |
|---|---|---|
| January 2024 | shipped | $350 |
| January 2024 | pending | $105 |
sum can be re-aggregated — summing daily sums gives the correct monthly sum.
Query matching
When a user runs a query, Lightdash automatically checks if a pre-aggregate can serve the results. A pre-aggregate matches when all of the following are true:- Every dimension in the query is included in the pre-aggregate
- Every metric in the query is included in the pre-aggregate
- Every dimension used in filters is included in the pre-aggregate
- If the pre-aggregate itself defines
filters, the query must include an equivalent or narrower filter - All metrics use supported metric types
- The query does not contain raw SQL table calculations
- The query does not use Parameters,
sql_filter, or user attributes inside SQL - If the query uses a time dimension, the requested granularity must be equal to or coarser than the pre-aggregate’s granularity
Filtered pre-aggregates
Pre-aggregates can define static filters in their YAML definition. This lets you materialize a smaller slice of data for a common query pattern, such asstatus = completed or order_date: inThePast 52 weeks.
When a pre-aggregate has definition filters:
- Matching queries must include the same filter or a narrower one
- Queries without the filter, or with a broader or incompatible filter, fall back to another pre-aggregate or the warehouse
Dimensions from joined tables
Pre-aggregates support dimensions from joined tables. Reference them by their full name (for example,customers.first_name) in the dimensions list.
Supported metric types
Pre-aggregates support metrics that can be re-aggregated from pre-computed results:sumcountminmaxaverage
Current limitations
Pre-aggregates support a narrower subset of the Lightdash semantic layer than regular warehouse queries.Not supported
Pre-aggregates do not support:- Personal warehouse connections. Materialization always runs under a single user’s credentials, so warehouse-level access rules are not applied per viewer. If you rely on personal warehouse connections to enforce data access, use results caching instead.
- Parameters — parameter values are picked at query time, so they cannot be resolved during materialization. Queries that use parameters fall back to the warehouse.
- User attributes when referenced from SQL.
required_attributesandany_attributesare still supported throughmaterialization_role. - Custom metrics created in the Explorer
- Custom SQL dimensions created in the Explorer (Custom bin dimensions are supported)
- SQL table calculations (Formula table calculations are supported)
SQL compatibility
sql_filter (and its alias sql_where) runs both at materialization time and at query time on top of the materialized data.
- At materialization time, the filter is evaluated against your warehouse. If the SQL references Parameters or user attributes, the values injected come from the materialization context — you can pin this to a fixed identity or attribute set with
materialization_roleso the materialization captures the rows you need. - At query time, the same filter is re-applied against the materialized data, which is served by DuckDB. If the
sql_filterSQL uses warehouse-specific syntax that DuckDB doesn’t understand, the query will fail to run against the pre-aggregate and fall back to the warehouse.
Metrics that can’t be pre-aggregated
Pre-aggregates do not support metric types that cannot be re-aggregated from pre-computed results. For example, considercount_distinct on a daily pre-aggregate. If the pre-aggregate stores “2 distinct customers on 2024-01-15” and “1 distinct customer on 2024-01-16”, you cannot sum those daily values to get the monthly distinct count, because the same customer can appear on multiple days.
| order_date_day | status | distinct_customers |
|---|---|---|
| 2024-01-15 | shipped | 2 (Alice, Bob) |
| 2024-01-16 | shipped | 1 (Alice) |
2 + 1 = 3, but the correct monthly answer is 2 (Alice, Bob). The pre-aggregate no longer knows which customers were counted.
We’re investigating supporting count_distinct through approximation algorithms. Follow this issue for updates.
For similar reasons, the following metric types are also not supported:
sum_distinct,average_distinctmedian,percentilepercent_of_total,percent_of_previousrunning_total- Custom SQL / post-calculation metrics (including many
numbermetrics) — Follow this issue number,string,date,timestamp,boolean
Pre-aggregates vs results caching
Lightdash has two independent systems for speeding up queries: results caching and pre-aggregates. They work differently and are designed to be used together, not as replacements for each other.Results caching
Results caching stores the exact result of any query that runs through Lightdash, keyed by a hash of the generated SQL. The first time a query runs, Lightdash executes it against your warehouse and caches the result in S3. Subsequent identical queries are served from the cache until it expires (24 hours by default). Any change to the query — a different filter, column, limit, or user attribute — produces a new SQL hash, a new cache entry, and another warehouse query. Results caching covers every query shape, including custom metrics, table calculations, and SQL runner queries. See the caching guide for details.Pre-aggregates
Pre-aggregates are summary tables you define in your dbt YAML. Lightdash materializes them on a schedule (or on compile, or manually) and stores the results in S3. When a user query matches the pre-aggregate’s dimensions, metrics, filters, and granularity, Lightdash serves the query from the materialized data using in-memory DuckDB workers. The warehouse is not touched at query time, even on the first query. A single pre-aggregate can serve many different queries. A daily pre-aggregate with five dimensions can answer day, week, month, quarter, and year queries across any subset of those dimensions and with any narrower filter. Results caching, in contrast, needs one cache entry per unique SQL.Key differences
| Results caching | Pre-aggregates | |
|---|---|---|
| Configuration | Automatic once enabled for your instance | Defined in dbt YAML |
| Trigger | First query runs against warehouse, then cached | Materialized on compile, cron, or manual refresh |
| Storage | Query result (row set) | Pre-computed summary table |
| Query execution | Exact cached result is returned | DuckDB workers re-aggregate at query time |
| Warehouse hit on first query? | Yes | No — only materialization hits the warehouse, not query-time serving |
| Coverage | All metric types, all query shapes | Only re-aggregatable metrics (sum, count, min, max, average) |
| Scope | One cache entry per unique SQL | One pre-aggregate can serve many query shapes |
| Availability | Cloud Pro+ or self-hosted with license | Enterprise (Beta) |
When to use which
Use pre-aggregates when:- You have high-traffic dashboards with predictable query patterns
- You want to reduce warehouse cost or improve latency on the first query, not just repeat visits
- The metrics are re-aggregatable (sum, count, min, max, average)
- You’re willing to design and schedule the materializations
- Query patterns are ad-hoc or unpredictable
- You need unsupported features listed above, such as
count_distinct, Parameters,sql_filter, or raw SQL table calculations - You’re using the SQL runner
- You don’t want upfront configuration work