Best practices are precision tools that should be wielded precisely and skillfully. This article describes five best practices drawn from the Kimball Method that often are described incorrectly.
Data warehousing is a mature discipline with well-established best practices. But these best practices are useless or even harmful if they are described inaccurately or incompletely. One such set of best practices, based on dimensional modeling, known specifically as the Kimball Method, is a very specific set of design guidelines and procedures, often described down to the level of individual database fields and detailed, sequential steps.
In the “Kimball University” series in Intelligent Enterprise, we have published more than 100 articles describing various aspects of the Kimball Method. We are greatly indebted to Intelligent Enterprise for providing this resource to the data warehouse community. Yet every year or two we encounter serious misrepresentations made especially by speakers at training organizations where hundreds of students are given misleading information about our approach. These speakers recommend a homogeneous, raise-no-objections approach that combines many different techniques, leaving the designer to wonder how to make actual design choices.
This article addresses major points of misunderstanding and vagueness by providing guidelines that business intelligence/data warehouse (BI/DW) professionals can tuck into their project notebooks and refer to as unassailable facts and best practices of the Kimball Method.
1. Take an Enterprise Approach
The Kimball Method is specifically intended to deliver large-scale enterprise business intelligence/data warehouse solutions. Occasionally it has been described as a “bottoms-up approach,” but it’s more accurately described as a blended approach starting with an enterprisewide, top-down view. At the same time, it’s tempered with the bottoms-up realities of real data sources.
We teach an enterprise point of view, starting with horizontal, cross-department gathering of requirements. This involves the executive team, senior managers and data analysts identifying and prioritizing high-value needs. The next step is to create the “enterprise bus matrix,” a pivotal design document and powerful tool for understanding and creating the appropriate enterprise data architecture to support the business requirements. As we’ve said many times, real data sources in their atomic form are the “data marts” of the enterprise, a definition that differs from other designers who define data marts only as aggregated releases from a centralized data store. When we then say (correctly) that the enterprise data warehouse is the sum of these data marts, other observers sometimes miss the point of our architecture (see The Bottom-Up Misnomer for more details).
2. Embrace Business Intelligence
Business Intelligence is a term that has emerged and evolved over the past few years and is now often used to describe all the systems and processes an enterprise uses to gather, process, provide access to and analyze business information. The term “data warehouse” is now used to mean the platform for all forms of business intelligence.
Since we have been writing on this topic for more than fifteen years, we are beholden to our legacy of books, articles and design tips. In fact, “data warehouse” is included in the title of all of our books! Nonetheless, changing industry vernacular does not change the core concepts and methodologies described by the Kimball Method. Our approach has always embraced the entire, end-to-end process as critical to an organization’s success.
3. Incorporate Dimensional Schemas
The Kimball Method is predicated on the principle that all business-user access to data is supported via dimensional schemas. Thus, what we call the “presentation area” of the overall business intelligence solution is comprised of a number of granular, atomic fact tables decorated with a rich set of descriptive, conformed dimension tables. We specifically avoid summarized, departmental data marts supported by a large, normalized data warehouse providing access to atomic data. We believe that such a centralized and normalized view of a data warehouse is responsible for many of the failures of data warehousing to support business intelligence applications.
Dimensional modeling is a design discipline focused on optimizing the business intelligence platform for business users’ ease of use and query performance. To achieve the goals of being simple and fast we describe a set of very specific design recommendations:
* Conformed master dimensions form the bedrock of the enterprise BI/DW system and by themselves address the central issues of integration
* Fact tables are derived directly from measurement processes found in familiar transaction-based applications. A fact table should never be departmentally or functionally bound but rather depends only on the “physics” of the original measurement process.
* Fact tables should always be populated at the most atomic level possible for maximum flexibility. Atomic data lets business users ask constantly changing, far-ranging and very precise questions. It also assures the extensibility of additional attributes, metrics or dimensions without disrupting existing reports and queries.
* Exposing snowflaked or normalized dimension tables directly to end users is strongly discouraged. We have shown repeatedly that properly designed denormalized (flat) dimension tables contain precisely the same information content as normalized schemas. The only difference is complexity, as experienced by the end users. We embrace (and teach) normalized designs in the extract/transformation/load (ETL) phases; however, we avoid normalization in the user-accessible presentation area.
Our Differences of Opinion and Facts and Fables articles describe these concepts in greater detail.
4. Use Conformed Dimensions for Integration
Data integration and consistency are key goals of any enterprise business intelligence effort. Data integration requires organizational consensus to establish and administer common labels and measures enterprisewide. In the Kimball Method, these labels and measures reside in conformed dimensions and conformed facts, respectively. Conformed dimensions are typically built and maintained as centralized, persistent master data during ETL, then reused across dimensional models to enable data integration and ensure consistency.
We enthusiastically support the recent Master Data Management (MDM) and Customer Data Integration (CDI) trends, as they are very consistent with the “conformed approach.” For more insight, read The Matrix: Revisited and Integration for Real People articles at Intelligence Enterprise.
5. Carefully Plan the ETL Architecture
Our approach describes a formal data-staging area, much like the kitchen in a restaurant, with detailed ETL processes required to bridge the gap between the production system source data and the presentation area dimensional schema. The approach further defines cleaning and conforming activities as part of the transformation process.
Let there be no doubt, the ETL effort is hard work. The ETL system is often estimated to consume 70 percent of the time and effort of building a business intelligence environment. Too often, little thought goes into architecting a robust ETL system, and it ends up as an uncoordinated, spaghetti-mess of tables, modules, processes, scripts, triggers, alerts and job schedules. This sort of design approach has unmistakably derailed many business intelligence efforts.
The Kimball Method describes a comprehensive set of ETL subsystems that comprise a robust set of ETL best practices, including those required to support real-time requirements (see The Subsystems of ETL) .
Be wary of any approach that suggests that ETL is no longer a required architectural component. Some architects believe that a simple intermediate data structure or an integration software layer is all that’s needed to perform translation on the fly. Unfortunately, true data integration can only succeed if the textual descriptors in each separate source are physically altered so they have the same label (column name) and content (data domain values). If it sounds too easy, it is (see Beware the Objection Removers for more on these kinds of misleading claims).
This article has highlighted five best practices drawn from the Kimball Method which we recommend designers study carefully in order to avoid the misrepresentations sometimes heard in various teaching and writing venues. As a designer, you are free to choose any approach you are comfortable with, but we want you to think critically when you are making these choices.