Data Warehouse DataStage Questions Answers

DataStage Interview Questions and Answers Collection.

Questions and Solutions for Performance Tuning.

What do you mean by a conformed dimension in Datastage?
Answer : A conformed dimension is a single, coherent view of the same piece of data throughout the organization. The same dimension is used in all subsequent star schemas defined. This thereby enables reporting across the complete data warehouse in a simple format.

Why we find fact table in normal form?
Answer : The fact table consists of the Index keys of the dimension/look up tables and the measures. As such, whenever we have the keys in a table, it itself implies that the table is in the normal form.

Explain what do you mean by a linked cube in Datastage?
Answer : A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis servers. The end users connected to any of these analysis servers can then access the cube. This kind of arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple analysis servers. linked cubes can be connected using TCP/IP or HTTP. To the end users, a linked cube looks like a regular cube.

What do you mean by degenerate dimension table?
Answer : The values of dimension which is stored in fact table is called degenerate dimensions. These dimensions doesn't have its own dimensions.

What is ODS in Datastage?
Answer : ODS is the abbreviated form for Online Data Storage.

Explain general purpose scheduling tool?
Answer : The basic purpose of the scheduling tool in a Datawarehouse Application is to stream line the flow of data from Source To Target at specific time or based on some condition.

What is the significance of a surrogate key, why primary key is not used as a surrogate key?
Answer : Surrogate Key is an artificial identifier for an entity. In surrogate key values are generated by the system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not describe anything. Primary Key is a natural identifier for a particular entity. In Primary keys, all the values are entered manually by the user which are uniquely identified and hence there is no repetition of data.

Tell me what is a Hash file stage and explain its usage?
Answer : Hash file stage is used for Look-ups. It acts like a reference table. It is also used in-place of ODBC, OCI tables for improved performance.

Name the utility you use to schedule the jobs on a UNIX server other than using Ascential Director?
Answer : Try using crontab utility along with d***ecute() function along with appropriate parameters passed.

Explain what do you mean by OCI and its usage in the ETL Tools?
Answer : OCI means orabulk data which used client having bulk data its retrieval time is much higher.

Why are OConv () and Iconv () functions used in Datastage?IConv() - Converts a string to an internal storage format
Answer : OConv() - Converts an expression to an output format.

What do you mean by a Fact table in Datastage?
Answer : Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foreign keys for the dimension tables.

Describe the steps In Building the Data Model.
Answer : While ER model lists and defines the constructs required to build a data model, however there is no standard process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up approach.

Explain in brief what is Dimensional Modelling in Datastage?
Answer : Dimensional Modelling is a design concept used by many data warehouse designers to build their datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e, the dimensions on which the facts are calculated.

What is the usual Indexing mechanism we need to use for a typical datawarehouse?
Answer : On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.

Define Normalization?
Answer : Normalization can be defined as segregating of table into two different tables, so as to avoid duplication of values.

Is it feasible to develop a Data Mart using an ODS?
Answer : Yes, it is correct to develop a Data Mart using an ODS. As because ODS which is used to store transaction data and less historical data, this is what datamart is required so it is correct to develop datamart using ODS.

Explain in detail what are the Performance tuning steps and mechanisms you have applied in your project to boost the performance of slowly running jobs in Datastage?
Answer : Cited below are some of the ways to improve performance.

We need to stage the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts.
Need to tune the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts, updates and selects.
There is a need for tuning the 'Project Tunables' in Administrator for better performance.
Need to use sorted data for Aggregator.
Wee need to sort the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs.
We need to remove the data that is not used from the source as early as possible in the job.
Ensure that you work with DB-admin to create appropriate Indexes on tables for better performance of DS queries
Need to convert some of the complex joins/business in DS to Stored Procedures on DS for faster execution of the jobs.
If there is an input file having an excessive number of rows and can be split-up then use standard logic to run jobs in parallel.
Before you proceed on to write a routine or a transform, ensure that there is not the functionality required in one of the standard routines supplied in the sdk or ds utilities categories.
Need for tuning should occur on a job-by-job basis.
Try to use the power of DBMS as much as possible.
Ensure not to use a sort stage when you can use an ORDER BY clause in the database.
Try to use a constraint to filter a record set is much slower than performing a SELECT … WHERE….
Ensure that you make an effort to use the bulk loader for your particular database. Bulk loaders are generally much faster than using ODBC or OLE thus resulting in improved performance.

Data Warehouse DataStage Questions Answers

DataStage Interview Questions and Answers Collection.

Questions and Solutions for Performance Tuning.

No comments :