A very large database that stores historical and up-to-date information from a variety of sources and is optimized for fast query answering.
It is involved in three continuous processes:
1) At regular intervals, it extracts data from its information sources, loads it into auxiliary tables, and subsequently cleans and transforms the loaded data in order to make it suitable for the data warehouse schema;
2) It processes queries from users and from data analysis applications; and
3) It archives the data that is no longer needed by means of tertiary storage technology.
Most enterprises today employ computer-based information systems for financial accounting, purchase, sales and inventory management, production planning and control. In order to efficiently use the vast amount of information that these operational systems have been collecting over the years for planning and decision making purposes, the various kinds of information from all relevant sources have to be merged and consolidated in a data warehouse.
While an operational database is mainly accessed by OLTP applications that update its content, a data warehouse is mainly accessed by ad hoc user queries and by special data analysis programs, also called Online Analytical Processing (OLAP) applications. For instance, in a banking environment, there may be an OLTP application for controlling the banks’s automated teller machines (ATMs). This application performs frequent updates to tables storing current account information in a detailed format. On the other hand, there may be an OLAP application for analyzing the behavior of bank customers. A typical query that could be answered by such a system would be to calculate the average amount that customers of a certain age withdraw from their account by using ATMs in a certain region. In order to attain quick response times for such complex queries, the bank would maintain a data warehouse into which all the relevant information (including historical account data) from other databases is loaded and suitably aggregated.
Typically, queries in data warehouses refer to business events, such as sales transactions or online shop visits, that are recorded in event history tables (also called `fact tables’) with designated columns for storing the time point and the location at which the event occurred. Usually, an event record has certain numerical parameters such as an amount, a quantity, or a duration, and certain additional parameters such as references to the agents and objects involved in the event.
While the numerical parameters are the basis for forming statistical queries, the time, the location and certain reference parameters are used as the dimensions of the requested statistics. There are special data management techniques, also
called multidimensional databases, for representing and processing this type of multidimensional data.