As the data warehouse market matures, the cause of data warehouse “pain” (otherwise known as vendor growth opportunity) within the IT organization is bound to evolve. Vendors promote centralization as a miracle elixir to treat data warehouse ailments. They claim it spins independent, disparate data marts into gold by reducing administrative costs and improving performance. Physical centralization may deliver some efficiencies; however, you can’t afford to bypass the larger, more important issues of integration and consistency. That’s the focus of this column, part 6 of the Fundamentals series.
If your data warehouse environment has been developed without an overall architecture or strategy, you’re probably dealing with multiple, independent islands of data with the following characteristics:
- Multiple, uncoordinated extracts from the same operational sources
- Multiple variations of similar information using inconsistent naming conventions and business rules
- Multiple business analyses delivering inconsistent metrics.
Some have tried to implicate data marts as the root cause of these problems. That’s a generalization that fails to acknowledge the benefits many organizations have realized with properly designed data marts. The problems I listed result from a nonexistent, poorly defined, or inappropriately executed strategy and can crop up with any architectural approach, including the enterprise data warehouse, the hub-and-spoke data warehouse, and distributed or federated data marts.
All That Glitters Is Not Gold
We can all agree that independent, isolated sets of data warrant attention, because they’re inefficient and incapable of delivering on the business promise of data warehousing. These stand-alone databases may be easier to implement initially, but without a higher-level enterprise integration strategy, they’re dead ends that perpetuate incompatible views of the organization. Merely moving these renegade data islands onto a bigger, better centralized platform to give the appearance of centralization is no silver bullet: Data integration and consistency are the true targets. Any approach that aims elsewhere treats the symptoms rather than the disease. While it may be simpler to just brush integration and consistency under the carpet to avoid the political or organizational challenges they pose, doing so will keep you from realizing the true business benefit of the data warehouse.
I can’t stress enough the importance of logical centralization and integration in the data warehouse, regardless of the physical implementation. In the vernacular of dimensional modeling, using this objective means focusing on the enterprise data warehouse bus architecture and conformed dimensions and facts. As Ralph Kimball previously described many times in this column, the enterprise data warehouse bus architecture is a tool to establish and enforce the overall data integration strategy for the warehouse. It provides the framework for integrating the analytic information in your organization. The result is a powerful centralized architecture that you can implement either as a distributed system on multiple hardware platforms and technologies or on a single, physically centralized technology. The enterprise data warehouse bus architecture is nondenominational and technology-independent.
The enterprise’s bus architecture is documented and communicated via the data warehouse bus matrix. (See Figure 1 for an example.) The matrix rows represent the core business events or processes of the organization, while the columns reflect the common, conformed dimensions. Conformed dimensions are the means for consistently describing the core characteristics of your business. They’re the integration points between the disparate processes of the organization, ensuring semantic consistency. There may be valid business reasons for not conforming dimensions — for example, if your organization is a diversified conglomerate with subsidiaries that sells unique products to unique customers through unique channels. However, for most organizations, the key to integrating disparate data is organizational commitment to the creation and use of conformed dimensions throughout the warehouse architecture, regardless of whether data is physically centralized or distributed.
As I warned earlier, physical centralization without integration may only throw more fuel on the fire of preexisting problems. Management may be convinced that buying a new platform to house the myriad existing data marts and warehouses will deliver operational efficiency and performance enhancements. Depending on the budget, these largely IT-centric benefits might be realized. However, they’re insignificant compared to the business potential from truly integrated data. Physical centralization without data integration and semantic consistency will distract an organization from focusing on the real crux of the problem. Inconsistent data will continue to flummox the organization’s decision-making ability.
Be Not Afraid Of Greatness
Moving to an enterprise data warehouse bus architecture will of course require organizational willpower and the allocation of scarce resources. No one said it would be easy. The issues brought to the surface when establishing a bus architecture are the generic, unavoidable issues all organizations face when trying to build an integrated view of their data.
Let’s examine some of the typical activities involved in migrating disparate data to a bus architecture with conformed dimensions. Of course, since each organization’s preexisting environment varies, you’ll need to modify these steps to reflect your specific scenario.
Step 1: Identify the existing data marts/warehouses in your organization, as well as those under development. You’ll probably be surprised by the sheer number lurking in nooks and crannies. (And don’t forget the data cubes sitting on your analysts’ desktops.) Note the level of detail (grain) for the data in each of these existing data warehouse deliverables, as well as the inevitable data overlaps. Overlaps in the descriptions of entities will drive the design of conformed dimensions, while overlaps in the calculation of metrics will drive the design of conformed facts.
Step 2: Understand the organization’s unmet business requirements for the data warehouse at a high level. Although the enterprise bus architecture needs to keep an eye on the outer boundaries of future data requirements in your organization, the initial implementation must practically focus on the most urgently needed data.
Step 3: Gather key stakeholders to develop a preliminary enterprise data warehouse bus matrix for your organization. These stakeholders include backroom DBAs and source system experts, as well as front-room business analysts. The first stakeholder meeting should be kicked off by a senior executive of the organization who stresses the business importance of reaching agreement on the conformed dimensions and facts. (Then the executive can leave!) Senior-level business commitment is critical to moving beyond the inevitable organizational obstacles.
Step 4: Identify a dimension authority or stewardship committee for each dimension to be conformed and subsequently released to the community. Design the core conformed dimensions by integrating and reconciling the existing, disparate dimension attributes. Realistically, it may be overwhelming to get everyone to agree on every attribute, but don’t let that bring this process to a crashing halt. You’ve got to start walking down the path toward integration in order to gain organizationwide agreement and final sign-off on the master conformed dimensions.
Step 5: Devise realistic, incremental development and administration plans for implementing and deploying or converting to the new conformed dimensions. Ultimately, the conformed dimensions should be used across all data sources to which they connect; however, you can’t expect to get there in one fell swoop.
All’s Well That Ends Well
These steps focus on the true, core issues of achieving logical integration across your data warehouse. Formulating the bus architecture and deploying conformed dimensions will result in a comprehensive data warehouse for your organization that’s integrated, consistent, legible, and well performing. You’ll be able to add data naturally, with confidence that it will integrate with existing data.
Of course, you have the option to implement either a physically distributed system or a classic hardware-centralized system. In both cases, using the enterprise bus architecture and conformed dimensions, you’ll deliver integrated business results to your users, which is the whole point of a data warehouse. Your organization’s decision-making capabilities will be turbo-charged with consistent data, rather than diverting inordinate attention to data inconsistencies and reconciliations.