No Detail Too Small

By Margy Ross
October 30, 2003

Atomic fact tables are the core foundation of any analytic environment. Business analysts thrive on atomic details because they can be easily rolled up “any which way” by grouping on one or more dimension attributes. The robust dimensionality of atomic data is extremely powerful as it supports a nearly endless combination of inquiries. However, business analysts can’t always live happily ever after on atomic details alone.

We’ve allocated significant space in this column to stress the importance of designing atomic fact tables with the following characteristics, as captured in our four-step dimensional design technique:

Business process. Focused on a single business process or event. In most organizations, a single operational source system collects or generates each set of process- or event-centric metrics.
Granularity. Atomic. In other words, the lowest level of detail captured during the business process.
Dimensions. Foreign keys to conformed dimension tables densely populated with descriptive attributes.
Facts. Performance metrics, typically numeric, consistent with the stated grain.

Accumulating the Atoms

In addition to atomic fact tables, you’ll probably also build aggregated dimensional models. Aggregations and indexes are the most common tools for improving query performance. Summary aggregations may be structured as an OLAP cube or another relational star schema. Because the granularity is no longer atomic, you’ll need a different fact table for the aggregated data, which typically exhibits the following characteristics:

Business process. Focused on a single business process or event, just like the atomic fact tables.
Granularity. Specified as a roll-up of the atomic facts.
Dimensions. Detail and/or summary conformed dimension foreign keys corresponding to the level of detail. When atomic data is aggregated, some dimensions may be completely eliminated. In other cases, the relevant dimension tables are shrunken, subset versions of the more detailed dimension tables.
Facts. Aggregated performance metrics consistent with the stated granularity.

Consolidating Across Processes

In addition to aggregated fact tables that roll-up facts from a single atomic fact table, we sometimes construct fact tables that combine data from multiple atomic fact tables. These cross-process or cross-event tables are referred to as second-level or consolidated fact tables. The consolidated fact tables are identified as enterprise data warehouse bus matrix rows, but are typically listed beneath the single-process or first-level matrix rows they are dependent upon. Consolidated fact tables exhibit slightly different characteristic patterns:

Business process. As the name implies, these fact tables look across discrete business processes or events.
Granularity. Represents the lowest level of detail common to all the processes. In the case of consolidated accumulating snapshots, the grain is typically one row per “object” moving through a pipeline of discrete processes.
Dimensions. Often multiple dates are corresponding to the major events or milestones, plus “least common denominator” conformed dimensions and degenerate dimensions from each underlying process.
Facts. Key metrics from each of the individual business process fact tables. Calculated metrics such as differences or lag metrics are also common.

More Performance, Less Dimensionality

In “The Soul of the Data Warehouse, Part Two: Drilling Across” (April 5, 2003), Ralph discussed drilling across fact tables. Facts from distinct fact tables are grouped based on common dimension attributes as row headers in a multi-pass request using a robust query or reporting tool. Although joining on common row header values is powerful (assuming your query tool has multi-pass capabilities), it may not deliver the ease-of-use or query performance demanded by the business users.

Wouldn’t it be even easier if users just pointed their query tool at a single fact table that already combined the metrics? Similarly, if the metrics are frequently compared to one another, rather than repeatedly drilling across and comparing on the fly, it likely makes more sense to physically combine the data into a single fact table once during the staging process.

Of course, you make trade-offs with this technique. When we bring facts together in a consolidated table, we sometimes lose dimensionality. All the facts in the consolidated table must live at the same granularity. The consolidated fact table grain is the “least common denominator” of shared dimensions across the discrete, atomic fact tables. Another consolidation cost is the data staging resource burden as the effort to extract, transform, load, and maintain data from multiple sources may be significant.

Consolidated Fact Table Example

Fact tables supporting actual to forecast comparisons are a common example of consolidated fact tables. Most organizations forecast business performance at a more summarized level than transactions are processed. Sales projections may be generated monthly by sales rep and product group. Meanwhile, sales transactions identify each unique product sold by a sales rep to a given customer. We could roll-up the transactions into an aggregated fact table with one row per month, sales rep, and product group, and then compare it to the projected forecast.

However, if this side-by-side comparison is frequently requested, physically creating a fact table that brings together both actual and forecast facts at a common grain will likely deliver more timely results with fewer analytic back flips. Of course, the business user’s next question will be, “What’s the difference between the forecasted and actual performance?” Because that’s a simple intra-row computation, we could easily make it available as a calculated fact via a view.

Profitability fact tables, which combine revenue and all the elements of cost, are another classic, albeit nontrivial, example of consolidated fact tables.

Accumulating Snapshot Example

As you likely recall from earlier columns, fact tables come in three standard flavors. The most common are transaction fact tables with grains such as one row per transaction line. Periodic snapshot fact tables are also frequently encountered where a stack of fact rows is appended to the fact table following each regularly scheduled snapshot.

Less frequently encountered fact tables are accumulating snapshots. They behave very differently from the other two flavors of fact tables. Accumulating snapshots capture the results from the key events in a related series of processes. Fact table rows are loaded when the first event or milestone occurs. Unlike other fact tables, accumulating snapshot fact tables are revisited; we update existing fact table rows to reflect the current or accumulated results of each event.

Perhaps the business users want to analyze their procurement pipelines. They already have individual atomic fact tables capturing the rich details associated with each transactional event in the pipeline, such as submitting purchase requisitions, issuing purchase orders, receiving deliveries, receiving invoices, and issuing payments. Although each of these events shares many common dimensions, such as product, vendor, and requester, the individual fact tables have unique dimensionality and metrics.

Suppose someone in the business wants to know how quickly purchase orders are issued after requisitions are submitted. Or what’s the discrepancy between the quantities ordered vs. what was received? Or what’s the duration or time lag between invoice receipt and payment? The accumulating snapshot comes to the rescue, as illustrated in Listing 1.

We have five date keys in the procurement accumulating snapshot corresponding to the key milestones in the pipeline. The other dimensions would be limited to those that are common to all the underlying processes, plus the degenerate dimensions generated along the way. Accumulating snapshots often have a status dimension to easily determine the current “state.” Finally, there’s a series of quantities, amounts, and lag or velocity calculations, again capturing the core cross-process performance metrics.

As we describe the accumulating snapshot, it’s obvious that we’re really talking about another variation of a consolidated fact table. Imagine how much easier it would be to answer the questions posed earlier if you had a consolidated accumulating snapshot, rather than trying to put together the procurement puzzle pieces by drilling across five fact tables.

Details Always Come First

Because consolidated fact tables deliver ease-of-use and query performance, perhaps you’re thinking that you’ll start there. It’s especially tempting if you’re chartered with creating a flashy scorecard or dashboard for the executive team. However, don’t be lured onto this supposedly easy street. You need to focus on the atomic details before pursuing either aggregated or consolidated dimensional models. If you start at the more macro level without the detailed foundation, there’s nothing to drill into when a business user wants to probe into an exceptional condition or anomaly in the consolidated data. Remember, you can always roll-up, but you can only drill down if the lower level details are available.