๐ง I. Core Concepts & Processes
๐ Data Analysis: The process of inspecting, cleaning, transforming, and modeling data to discover useful information, inform conclusions, and support decision-making.
๐ Data Analytics: A broader term that encompasses the entire management of data, including analysis, tools, methods, and processes.
๐งช Data Science: An interdisciplinary field using scientific methods, algorithms, and systems to extract insights from data. It often involves advanced programming and statistical modeling.
๐ผ Business Intelligence (BI): The infrastructure for collecting, storing, and analyzing business data to optimize decision-making.
๐ Data-Driven Decision Making (DDDM): Basing decisions on data analysis rather than purely on intuition or observation.
๐ II. Types of Data
๐งฑ Structured Data: Organized in a predefined format (e.g., SQL databases, Excel rows/cols).
๐ Unstructured Data: No predefined format (e.g., emails, videos, social media posts).
๐ท️ Semi-structured Data: Contains tags or markers but no formal structure (e.g., JSON, XML).
๐ Quantitative Data: Numerical data that can be measured (e.g., height, sales figures).
๐จ Qualitative Data: Descriptive, non-numerical data (e.g., interview transcripts, colors).
๐ Big Data: Extremely large datasets characterized by the 3 V's (or 5 V's):
๐ฆ Volume: The sheer amount of data.
๐ Velocity: The speed of data generation/processing.
๐งฉ Variety: The different types of data.
✅ Veracity: The quality and accuracy.
๐ Value: The usefulness of the data.
๐️ III. Data Management & Storage
๐ข️ Database: An organized collection of data stored electronically.
๐️ SQL (Structured Query Language): The language used to communicate with databases.
๐ NoSQL: Databases designed for unstructured data (e.g., MongoDB).
๐ญ Data Warehouse: A central repository of integrated data used for reporting (e.g., Snowflake, BigQuery).
๐ง Data Lake: A vast pool of raw data stored in its native format.
๐ ETL (Extract, Transform, Load): Extracting data, transforming it, and loading it into a warehouse.
๐ฅ ELT (Extract, Load, Transform): Loading data first, then transforming it within the target system.
๐ช Data Mart: A subset of a data warehouse dedicated to a specific team.
๐งน IV. Data Cleaning & Preparation
๐งผ Data Cleansing/Wrangling: Detecting and correcting corrupt or inaccurate records. Often the most time-consuming step.
❓ Missing Data: Data points that are not recorded.
๐งฉ Imputation: Replacing missing data with substituted values.
๐ Outlier: A data point that differs significantly from others (can be an error or a finding).
⚖️ Normalization: Scaling numerical data to a standard range (e.g., 0 to 1).
๐ Standardization: Rescaling data to have a mean of 0 and a standard deviation of 1.
๐งฎ V. Statistics & Mathematics
๐ Descriptive Statistics: Summarizes the main features of a dataset.
๐ฏ Mean: The average.
↔️ Median: The middle value.
๐ Mode: The most frequent value.
๐ถ Standard Deviation: Measure of variation/dispersion.
๐ต️ Inferential Statistics: Using a sample to make inferences about a population.
๐ Population: The entire set.
๐งช Sample: A subset of the population.
✅ Hypothesis Testing: Testing a hypothesis about a population using sample data.
๐ฒ P-value: Probability of results occurring by chance (≤ 0.05 usually means significant).
๐ Correlation: A measure of the relationship between two variables (Correlation ≠ Causation).
๐ Regression Analysis: Estimating relationships between variables (e.g., Linear, Logistic).
๐ฌ VI. Data Analysis & Modeling
๐ Exploratory Data Analysis (EDA): Analyzing datasets to summarize characteristics, often visually.
๐ฉบ Diagnostic Analysis: Understanding why events happened.
๐ฎ Predictive Analysis: Using algorithms to identify the likelihood of future outcomes.
๐ Prescriptive Analysis: Recommending actions to affect outcomes.
๐ค Machine Learning (ML): Computers learning without explicit programming.
๐จ๐ซ Supervised Learning: Trained on labeled data.
๐ถ️ Unsupervised Learning: Finding patterns in unlabeled data.
๐ VII. Data Visualization
๐ฅ️ Dashboard: A visual display of key information on a single screen.
๐ Metric: A standard of measurement.
๐ฏ KPI (Key Performance Indicator): Measurable value demonstrating business objective success.
๐ Charts & Graphs:
Bar Chart: Compares categories.
Line Chart: Trends over time.
Histogram: Distribution of a variable.
Scatter Plot: Relationship between two variables.
Box Plot: Distribution based on quartiles.
๐ฅ VIII. Roles & Responsibilities
๐ง Data Analyst: Interprets data to help make decisions (Reporting, Viz, Descriptive analysis).
๐ค Business Analyst: Bridges the gap between IT and business needs.
๐จ๐ฌ Data Scientist: Uses advanced ML/Stats to build predictive models.
๐ท Data Engineer: Builds the infrastructure (pipelines, warehouses) for analysts.
๐ ️ BI Developer: Specializes in designing dashboards and BI tools.
๐ ️ IX. Tools & Technologies
๐ Programming: Python (Pandas, NumPy), R.
๐ Spreadsheets: Excel, Google Sheets.
๐ BI Tools: Tableau, Power BI, Looker.
๐ข️ Databases: SQL, PostgreSQL, MySQL.
๐ Big Data: Hadoop, Spark, Kafka.