Q. What is a data warehouse?

Ans:

A data warehouse is a huge store of data accumulated from a broad range of sources within an organization and used to guide business decisions.

Q. What is Data Mining?

Ans:

Data mining is the phase of analyzing data from several perspectives and summarizing it into useful data.

 Q. What is Business Intelligence?

Ans:

Business Intelligence defines the technologies, functions, and systems for the collection, integration, analysis, and demonstration of business data and sometimes to the data itself. The purpose of business intelligence is to provide better business decision making. Thus, BI is also defined as a decision support system (DSS).

Q. What is Dimension Table?

Ans:

Dimension table is a table which contain attributes of measurements stored in fact tables. This table consists of hierarchies, categories and logic that can be used to traverse in nodes.

Q. What are the stages of Data warehousing?

Ans:

There are four stages of Data warehousing:

  1. Offline Operational Database
  2. Offline Data Warehouse
  3. Real Time Datawarehouse
  4. Integrated Datawarehouse

Q. What is OLTP?

Ans:

OLTP is abbreviated as On-Line Transaction Processing, and it is an application that modifies the data whenever it received and has large number of simultaneous users.

Q. What is OLAP?

Ans:

OLAP is abbreviated as Online Analytical Processing, and it is set to be a system which collects, manages, processes multi-dimensional data for analysis and management purposes.

Q. What is ODS?

Ans:

ODS is abbreviated as Operational Data Store and it is a repository of real time operational data rather than long term trend data.

Q. What is the difference between View and Materialized View?

Ans:

 

Ø A view is nothing but a virtual table which takes the output of the query and it can be used in place of tables.

Ø A materialized view is nothing but an indirect access to the table data by storing the results of a query in a separate schema.

Q. What is ETL?

Ans:

ETL is abbreviated as Extract, Transform and Load. ETL is a software which is used to reads the data from the specified data source and extracts a desired subset of data. Next, it transform the data using rules and lookup tables and convert it to a desired state.

Then, load function is used to load the resulting data to the target database.

Q. What is Fact Table?

Ans:

Ø Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.

Ø Example – If the business process is manufacturing of bricks

Ø Average number of bricks produced by one person/machine – measure of the business process

Q. What is VLDB?

Ans:

VLDB is abbreviated as Very Large Database and its size is set to be more than one terabyte database. These are decision support systems which is used to server large number of users.

Q. What is real-time data warehousing?

Ans:

Real-time data warehousing captures the business data whenever it occurs. When there is business activity gets completed, that data will be available in the flow and become available for use instantly.

Q. What are the different types of Datawarehouse?

Ans:

Following are the different types of Data warehousing:

  1. Enterprise Data warehousing
  2. Operational Data Store
  3. Data Mart

Q. What needs to be done while starting the database?

Ans:

Following need to be done to start the database:

1.     Start an Instance

2.     Mount the database

3.     Open the database

Q. What needs to be done when the database is shutdown?

Ans:

Following needs to be done when the database is shutdown:

1.     Close the database

2.     Dismount the database

3.     Shutdown the Instance

Q. What is the difference between metadata and data dictionary?


Ans:

Metadata is defined as data about the data. But, Data dictionary contain the information about the project information, graphs, abinito commands and server information.

 Q. What is OLTP?

Ans:

OLTP stands for online transaction processing. This system is a function that modifies data the instance it receives and has a huge number of concurrent users.

 Q. What is OLAP?

Ans:

OLAP stands for online analytical processing. This system is a function that collects, manages, processes, and presents multidimensional data for analysis and management process.

 Q. What is the difference between OLTP and OLAP?

Ans:

Basic

OLTP

OLAP

Meaning

OLTP stands for online transaction processing.

OLAP stands for online analytical processing.

Data Source

Operational data is initial data source of data.

Consolidation data is from different sources.

Process Goal

Snapshot of business processes which does essential business tasks

Multidimensional view of business events of planning and decision making

Queries and Process scripts

Simple quick running queries ran by customers.

Complex long-running queries by a scheme to update the aggregate data.

Database Design

Normalize small databases. Speed will not be an issue due to the smaller database, and normalization will not degrade performance. This adopts an entity-relationship (ER) model and function-oriented database design.

De-normalized large databases. Speed is an issue due to the larger databases, and de-normalizing will improve performance as there will be lesser tables to scan while performing tasks. This adopts star, snowflake, or fact constellation mode of subject-oriented database design.

Back-up and system administration

Regular database back-up and system administration can do the job.

Reloading the OLTP data is well treated as a good backup option.

 

 Q. What is a dimensional table?

Ans:

Dimensional tables include textual attributes of measurement saved in the fact tables. A dimensional table is a group of hierarchies, categories, and logic which can be used for the customer to traverse in hierarchy nodes.

Q. What are Aggregate tables?

Ans:

Aggregate tables are the tables which contain the existing warehouse data which has been grouped to certain level of dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.

This table reduces the load in the database server and increases the performance of the query.

Q. What is fact less fact tables?

Ans:

A fact less fact tables are the fact table which doesn’t contain numeric fact column in the fact table.

Q. How can we load the time dimension?

Ans:

Time dimensions are usually loaded through all possible dates in a year and it can be done through a program. Here, 100 years can be represented with one row per day.

Q. What are non-additive facts?

Ans:

Non-Addictive facts are said to be facts that cannot be summed up for any of the dimensions present in the fact table. If there are changes in the dimensions, same facts can be useful.

Q. What is conformed fact?

Ans:

Conformed fact is a table which can be used across multiple data marts in combined with the multiple fact tables.

Q. What is Datamart?

Ans:

A Datamart is a specialized version of Data warehousing and it contains a snapshot of operational data that helps the business people to decide with the analysis of past trends and experiences. A data mart helps to emphasizes on easy access to relevant information.

Q. What is Active Data warehousing?

Ans:

An active Datawarehouse is a Datawarehouse that enables decision makers within a company or organization to manage customer relationships effectively and efficiently.

Q. What are the key columns in Fact and dimension tables?

Ans:

Foreign keys of dimension tables are primary keys of entity tables. Foreign keys of fact tables are the primary keys of the dimension tables.

Q. What is SCD?

Ans:

SCD is defined as slowly changing dimensions, and it applies to the cases where record changes over time.

Q. What is BUS Schema?

Ans:

BUS schema consists of suite of confirmed dimension and standardized definition if there is a fact tables.

Q. What is Star Schema?

Ans:

Star schema is nothing but a type of organizing the tables in such a way that result can be retrieved from the database quickly in the data warehouse environment.

Q. What is Snowflake Schema?

Ans:

Snowflake schema which has primary dimension table to which one or more dimensions can be joined. The primary dimension table is the only table that can be joined with the fact table.

Q. What is a core dimension?

Ans:

Core dimension is nothing but a Dimension table which is used as dedicated for single fact table or DataMart.

Q. What is called data cleaning?

Ans:

Name itself implies that it is a self-explanatory term. Cleaning of Orphan records, Data breaching business rules, Inconsistent data and missing information in a database.

Q. What is Metadata?

Ans:

Metadata is defined as data about the data. The metadata contains information like number of columns used, fix width and limited width, ordering of fields and data types of the fields.

Q. What are loops in Data warehousing?

Ans:

In data warehousing, loops are existing between the tables. If there is a loop between the tables, then the query generation will take more time and it creates ambiguity. It is advised to avoid loop between the tables.

Q. What are the types of Dimensional Modeling?

Ans:

There are three types of Dimensional Modeling and they are as follows:

Ø  Conceptual Modeling

Ø  Logical Modeling

Ø  Physical Modeling

Q. What is surrogate key?

Ans:

Surrogate key is nothing but a substitute for the natural primary key. It is set to be a unique identifier for each row that can be used for the primary key to a table.

Q. What is the difference between ER Modeling and Dimensional Modeling?

Ans:

Ø ER modeling will have logical and physical model but Dimensional modeling will have only Physical model.

Ø ER Modeling is used for normalizing the OLTP database design whereas Dimensional Modeling is used for de-normalizing the ROLAP and MOLAP design.

Q. What are the steps to build the Datawarehouse?

Ans:

Following are the steps to be followed to build the Datawarehouse:

Ø  Gathering business requirements

Ø  Identifying the necessary sources

Ø  Identifying the facts

Ø  Defining the dimensions

Ø  Defining the attributes

Ø  Redefine the dimensions and attributes if required

Ø  Organize the Attribute hierarchy

Ø  Define Relationships

Ø  Assign unique Identifiers

Q. What is a fact table?

Ans:

The fact table includes the measurement of the business process. Fact table includes the foreign keys for the dimension tables.

Example: 

If we are business phase is "paper production," "normal production of paper by one device, "or "weekly production of paper" will be treated as a measurement of the business process.

 Q. What are the different methods of loading dimension tables?

Ans:

There are two different methods to load data in dimension tables:

  • Conventional (slow): All the constraints and keys are validated against the information before, it is loaded, and this method data integrity is maintained.
  • Direct (fast): All the constraints and keys are disabled before the information is loaded. Once the information is loaded, it is validated against all the constraints and keys. If the data is found invalid, it is not contained in the index, and all the future processes are skipped in this data.

 Q. Describe the foreign key columns in fact tables and dimension tables?

Ans:

Foreign keys of dimension tables are the primary key of entity tables.

Foreign keys of fact tables are the primary key of dimension tables.

 Q. What is ODS?

Ans:

ODS stands for Operational data store. A database architecture that is a repository for near real-time operational records rather than long term trend data. The ODS may further become the enterprise shared an operational database, allowing operational functions that are being re-engineered to use the ODS as there operational databases.

 Q. What is ETL?

Ans:

ETL stands for extraction, transformation, and loading process. ETL is software that allows the business to develop their disparate records while moving it from place to place, and it doesn't really matter that data is in several forms or formats. The data can come from any source. ETL is powerful enough to manage such data disparities.

First, the extract function reads data from a particular source database and extracts a desired subset of data.

Second, the transform function works with the acquired record using rules or lookup tables, or creating a combination with other records to convert it to the desired state.

Finally, the load function is used to write the resulting information to a target database.

 Q. What is VLDB?

Ans:

VLDB stands for a Very large database. A one terabyte database would generally be considered to be a VLDB. Typically, there are decision support applications or transaction processing applications serving a huge number of users.

 Q. What is real-time data warehousing?

Ans:

Data warehousing capture business event data. Real-time data warehousing capture business event data as it occurs. As soon as the business event is complete, and there is data about it, the completed event data flows into the data warehouse and becomes feasible instantly.

 Q. What are conformed dimensions?

Ans:

Conformed dimension defines the exact same thing with every possible fact table to which they are joined. They are simple to the cubes.

 Q. What are non-additive facts?

Ans:

Non-additive facts are the facts that cannot be examined for any of the dimensions present in the fact table. They are not treated as useless. If there is a transformation in dimensions, the same facts can be useful.

 Q. What is Star Schema?

Ans:

Star schema is a type of organizing the tables such that we can fetch the result from the database instantly in the warehouse environment.

 Q. What is a Snowflake Schema?

Ans:

Any dimension with extended dimensions is called snowflake schema, the dimension may be interlinked or may have one too many relationships with other tables. This schema is normalized and outcome in complex join and very complex queries as well as slower results.

 Q. What is a surrogate key?

Ans:

A surrogate key is a substitution for the essential primary key. It is just a unique identifier or statistic for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table. It is useful because the essential primary key can change, and this makes updates more difficult. Surrogate keys are always integer or numeric.

 Q. What is a junk dimension?

Ans:

A number of very small dimension may be lumped together to form a single dimension, a junk dimension is the attributes are not closely related. Grouping of random flags and text attributes in dimensions and changing them to a separate subdimension is called the junk dimension.

 Q. What is dimensional modeling?

Ans:

Dimensional data model concept contains two types of tables, and it is different from the third normal form. This concept uses a fact table, which includes the measurement of the business and Dimensional tables, which includes the context (dimension of the calculations) of the dimensions.

 Q. What is BUS Schema?

Ans:

BUS schema is collected from a master suite of confirmed size and a standardized description of facts.

 Q. What is active data warehousing?

Ans:

An active data warehouses provide data that allow decision-makers within an organization to handle customer relationships efficiently and proactively.

 Q. What is the difference between data warehousing and Business Intelligence?

Ans:

Data warehousing handle with all methods of managing the development, implementation and applications of a data warehouse or data mart containing metadata management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational documenting, analytical documenting, security management, backup/recovery planning, etc.

Business Intelligence is a set of software tools that allows an organization to analyse measurable methods of their business, such as sales performance, profitability, operational efficiency, effectiveness, of marketing campaigns, market penetration among certain user groups, cost trends, anomalies and exceptions, etc. Business Intelligence is used to encompass OLAP, data visualization, data mining, and query document tools.

 Q. Which one is faster, Multidimensional OLAP, or Relational OLAP?

Ans:

Multidimensional OLAP (MOLAP) is faster than Relational OLAP (ROLAP).

  • MOLAP: Here, data is saved in a multidimensional cube. The storage is not in the relational database but in a proprietary plan (one example is PowerOLAP's .olp file). MOLAP products are compatible with Excel, which can make record interactions easy to learn.
  • ROLAP: ROLAP products approach a relational database by using SQL (structured query language), which is the standard language that is used to describe and manipulate data in an RDBMS. Subsequent processing may occur in the RDBMS or within a middle-tier server, which accepts requests from users, translates them into SQL statements, and passes them on to the RDBMS.