๐️ Core Data Warehouse Concepts
๐ข Data Warehouse (DWH): A central repository of integrated, historical data designed for query and analysis. Its primary purpose is to support business intelligence.
๐ช Data Mart: A subset of a data warehouse tailored to serve a specific business line (e.g., Sales, Finance). It contains a focused collection of data.
๐ Operational Data Store (ODS): A database for integrating data from multiple sources for operational reporting. It is more current and volatile, often acting as a staging area.
๐ Data Lake: A vast repository holding massive amounts of raw data in its native format (structured and unstructured) until needed. No predefined schema required.
๐ก Data Lakehouse: A modern architecture combining the flexibility of data lakes with the data management and ACID transactions of data warehouses.
๐ง Staging Area: A temporary storage area for data extraction, cleansing, and transformation before loading into the warehouse. Not typically queried by end-users.
๐ Business Intelligence (BI): Technologies and practices for collecting, analyzing, and presenting business information to support decision-making.
๐ Data Modeling & Architecture
๐บ️ Schema: The logical description of the entire database, including table structures and relationships.
⭐ Star Schema: The simplest schema consisting of a central fact table connected to multiple dimension tables in a star shape.
❄️ Snowflake Schema: A variation of the star schema where dimension tables are normalized into multiple related tables. Reduces redundancy but increases complexity.
๐ Galaxy Schema (Fact Constellation): A complex schema with multiple fact tables sharing dimension tables.
๐ Data Vault Modeling: A hybrid method for long-term historical storage, composed of Hubs, Links, and Satellites. Resilient to change and highly scalable.
๐ท️ Dimension: A category of information (the "who, what, where, when"). Provides context to facts (e.g., Customer, Product).
๐ข Fact: A measurement or metric, typically numerical (e.g., Sales Amount, Quantity).
๐ข Fact Table: The central table containing measurements (facts) and foreign keys.
๐ต Dimension Table: A table storing descriptive attributes related to a business dimension (e.g., Customer Name, City).
๐พ Grain (Granularity): The level of detail in a fact table (e.g., "one row per line item").
๐ Surrogate Key: A system-generated unique identifier (integer) used as a primary key, independent of the source system.
๐ Natural Key (Business Key): An identifier from the operational source system (e.g., CustomerID).
๐ฐ️ Slowly Changing Dimension (SCD): Techniques to manage data changes over time.
Type 1: Overwrite old value (No history).
Type 2: Add new row (Preserve history).
Type 3: Add new column (Limited history).
๐ค Conformed Dimension: A dimension that represents the same thing across different fact tables (e.g., Date).
๐ ETL & Data Integration
๐ ETL (Extract, Transform, Load): The process of moving data from source to warehouse.
Extract: Reading data.
Transform: Cleaning and structuring data.
Load: Writing data to the target.
☁️ ELT (Extract, Load, Transform): Loading data into the target system before transformation. Common in modern cloud platforms (Snowflake, BigQuery).
๐ฐ Data Pipeline: A system moving data from one place to another; may or may not involve heavy transformation.
๐ธ Change Data Capture (CDC): Identifying and capturing changes (inserts, updates, deletes) in a source database to apply them to the warehouse in near real-time.
๐งน Data Cleansing: Detecting and correcting corrupt or inaccurate records.
๐ Data Profiling: Examining source data to collect statistics and assess quality.
๐ก️ Data Governance: Managing data availability, usability, integrity, and security across an enterprise.
๐ฏ Key Performance Indicators & Metrics
๐ KPI (Key Performance Indicator): A measurable value demonstrating how effectively a company achieves objectives.
๐ Measure (Metric): A numerical value that can be aggregated.
➕ Additive Measure: Can be summed across all dimensions (e.g., Sales Amount).
๐ Semi-Additive Measure: Can be summed across some dimensions but not all (e.g., Account Balance).
๐ซ Non-Additive Measure: Cannot be summed (e.g., Ratios, Percentages).
๐ง OLAP & Querying
๐ง OLAP (Online Analytical Processing): Technology for interactive, multidimensional data analysis.
๐งพ OLTP (Online Transactional Processing): Systems managing transaction-oriented applications (e.g., ERP, CRM).
๐ฆ Cube: A multi-dimensional array of pre-aggregated data for fast querying.
↕️ Drill Down / Roll Up: Navigating data hierarchy from summary to detail (Drill Down) or detail to summary (Roll Up).
๐ฐ Slice and Dice: Viewing data from different perspectives by selecting subsets.
๐ Pivot: Changing the dimensional orientation of a report.
⌨️ MDX / DAX: Query languages for OLAP cubes (MDX) and Power BI/Analysis Services (DAX).
☁️ Modern Cloud & Big Data Terminology
๐ธ️ Data Mesh: Decentralized architecture organizing data by business domains, treating data as a product.
๐งต Data Fabric: Architecture providing a unified layer for data management across disparate environments.
๐ฅ️ Data Warehouse Appliances: Pre-configured hardware/software bundles (e.g., Teradata, Netezza).
⚡ MPP (Massively Parallel Processing): Multiple processors working simultaneously on a task; used by cloud DWHs like Snowflake.
๐ Data Swamp: A deteriorated, unmanaged data lake with little value.
๐ป Serverless: Cloud execution model where the provider manages machine resources dynamically (e.g., BigQuery).
๐ General & Administrative Terms
๐ท️ Metadata: "Data about data." Describes structure, source, and characteristics.
๐ฃ Data Lineage: Visual representation of data's origin and movement through systems.
๐ Data Catalog: Centralized inventory of data assets helping users find and understand data.
๐ Master Data Management (MDM): Managing critical data (customer, product) for a single point of reference.
✅ Data Quality: The accuracy, completeness, consistency, and timeliness of data.
๐ BI Tool: Software for creating reports/dashboards (Tableau, Power BI).
❓ Ad-hoc Query: A non-standard, one-time query created by a user.
๐พ Materialized View: A pre-computed view stored physically to improve performance for complex queries.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.