Check Knowledge Modules (CKM) play a critical role in ensuring the data integrity and quality of the data within the ETL process. They are used to verify that data conforms to predefined constraints before it is loaded into the target system. The CKM checks for consistency in existing data or incoming data, ensuring that only valid, compliant records are passed through the ETL pipeline.
Overview of CKM:
The CKM is responsible for checking the consistency of records against defined constraints, which are rules or conditions set to ensure the accuracy, validity, and integrity of the data. This module ensures that records meet the expected standards before they are processed or loaded into the target datastore.
There are two main ways the CKM is used:
- STATIC_CONTROL Mode: The CKM checks the consistency of existing data in a datastore.
- FLOW_CONTROL Mode: The CKM checks the consistency of incoming data before it is loaded into the target datastore.
CKM Modes of Operation:
- STATIC_CONTROL Mode:
- Purpose: In this mode, the CKM checks data already existing in a datastore. This mode is useful for performing consistency checks on data that has already been loaded into a datastore.
- Process:
- The CKM reads the constraints defined on the target datastore (e.g., primary keys, foreign keys, mandatory columns, conditions).
- It then checks the existing data in the datastore against those constraints.
- Any records that violate the constraints are flagged as errors and are written to an "E$" error table.
- This allows the ETL process to track and resolve inconsistencies in the data before performing further operations.
- Use Case: When you need to validate and clean up data already loaded in the target system or during the data migration process.
- Flow in STATIC_CONTROL Mode:
§ The CKM reads the constraints of the target table.
§ It checks the data against those constraints.
§ Any records violating constraints are written to an "E$" error table in the staging area for further inspection.
- FLOW_CONTROL Mode:
- Purpose: In this mode, the CKM checks the consistency of incoming data before it is loaded into the target datastore. This ensures that only valid data is passed on to the target system.
- Process:
- The CKM reads the constraints of the target table defined in the mapping.
- It checks the data in the "I$" flow table (a temporary table generated by the IKM in the staging area) against those constraints.
- Records that violate the constraints are written to the "E$" error table.
- This mode helps ensure that the data being loaded meets the required conditions before it affects the target datastore.
- Use Case: When you want to validate the data before loading it into the target system, ensuring that the incoming data adheres to the business rules and constraints.
- Flow in FLOW_CONTROL Mode:
§ The CKM reads the constraints from the target table of the mapping.
§ It checks the incoming data from the "I$" flow table.
§ The CKM isolates erroneous records that violate constraints.
§ These erroneous records are written to the "E$" error table for further review and resolution.
CKM Error Handling:
- Error Table (E$):
- Purpose: The CKM creates an "E$" error table in the staging area to hold all the records that fail the consistency checks.
- Structure of the E$ Table:
- The E$ table contains the same columns as the original datastore but also includes additional metadata to help trace the errors. This additional information can include:
- Error Messages (explanation of the violation)
- Error Origin (which part of the ETL process or transformation generated the error)
- Error Timestamp (when the error occurred)
- This helps keep track of issues and provides insight into why certain records failed the validation checks.
- Error Isolation:
- The CKM isolates the erroneous records based on the constraints defined for the datastore. This includes checking:
- Primary Keys
- Alternate Keys
- Foreign Keys
- Mandatory Columns
- Business Conditions
- The CKM ensures that each violation is captured and stored in the E$ table for analysis and rectification.
- Removing Erroneous Records:
- If required, the CKM can remove erroneous records from the data being processed.
- This ensures that only valid, constraint-compliant records are moved forward in the ETL process. In cases where data cleaning or validation is critical, the CKM can filter out data that doesn't meet the business rules.
- This is done before loading data into the target datastore to maintain integrity.
Key Tasks Performed by the CKM:
In both STATIC_CONTROL and FLOW_CONTROL modes, the CKM typically performs the following tasks:
- Create the "E$" Error Table:
- The CKM generates an error table (E$) in the staging area to log and store any rejected records.
- Check Data Consistency:
- The CKM compares data against the defined constraints for:
- Primary Keys
- Foreign Keys
- Mandatory Columns
- Business Rules
- Custom Conditions
- Any records that don't meet these constraints are considered errors and are logged in the E$ table.
- Error Isolation:
- The CKM isolates erroneous records based on constraint violations for each data set and logs the errors in the E$ table.
- Optionally Remove Erroneous Records:
- If configured, the CKM can remove erroneous records from the flow before proceeding with further ETL steps.
Summary of CKM Functions:
- Consistency Checking: The CKM checks the consistency of data against constraints defined for the datastore, either during data processing (FLOW_CONTROL) or for already-loaded data (STATIC_CONTROL).
- Error Logging: The CKM creates an "E$" error table to log records that fail the consistency checks.
- Error Handling: It isolates erroneous records based on primary key, foreign key, or other conditions and can remove them if necessary.
- Ensuring Data Integrity: By ensuring that only compliant records pass through the ETL pipeline, the CKM helps maintain data integrity and supports quality control initiatives in the ETL process.
Conclusion:
Check Knowledge Modules (CKM) are essential for maintaining data quality and integrity in the ETL process. They ensure that data complies with the predefined constraints before being processed or loaded into the target datastore. By using STATIC_CONTROL and FLOW_CONTROL modes, the CKM helps validate data in both existing and incoming datasets, logs errors, and provides mechanisms for resolving issues, ensuring the data's quality and consistency.
No comments:
Post a Comment