Translate

Thursday, 10 October 2024

What is star schema and explain with example

Star Schema: A Fundamental Data Warehouse Design

A star schema is a common data warehouse design pattern where a central fact table is surrounded by multiple dimension tables. This structure facilitates efficient data analysis and reporting by organizing data into a hierarchical format. 

Key Components:

  • Fact Table: Stores quantitative measurements (facts) related to a business process.

  • Dimension Tables: Provide context and descriptive information about the facts, such as time, products, customers, or locations.

  • Relationships: The fact table is connected to the dimension tables through foreign keys, creating a star-like structure.

Example: A Retail Sales Data Warehouse

Fact Table: Sales

SalesID

ProductID

CustomerID

DateID

Quantity

Price

Total

1

101

1001

20231001

2

29.99

59.98

2

102

1002

20231002

1

19.95

19.95

...

...

...

...

...

...

...

Dimension Tables:

  • Products | ProductID | ProductName | Category | Price | |---|---|---|---| | 101 | Laptop | Electronics | 29.99 | | 102 | Book | Books | 19.95 | | ... | ... | ... | ... |

  • Customers | CustomerID | CustomerName | City | State | |---|---|---|---| | 1001 | John Doe | New York | NY | | 1002 | Jane Smith | Los Angeles | CA | | ... | ... | ... | ... |

  • Time | DateID | Date | Year | Quarter | Month | Day | |---|---|---|---|---|---| | 20231001 | 2023-10-01 | 2023 | Q3 | 10 | 1 | | 20231002 | 2023-10-02 | 2023 | Q3 | 10 | 2 | | ... | ... | ... | ... | ... | ... |

Visual Representation:

+--------------------+| Sales |+--------------------+| SalesID | ProductID | CustomerID | DateID | Quantity | Price | Total ||---|---|---|---|---|---|---|| ... | ... | ... | ... | ... | ... | ... |+--------------------+ | |+-------+ +-------+ +-------+| Products | | Customers | | Time |+-------+ +-------+ +-------+| ProductID | ProductName | CustomerID | CustomerName | DateID | Date | ... ||---|---|---|---|---|---|---|| ... | ... | ... | ... | ... | ... | ... |+-------+ +-------+ +-------+

Benefits of Star Schema:

  • Simplicity: Easy to understand and implement.

  • Performance: Optimized for analytical queries, especially when using OLAP cubes.

  • Scalability: Can handle large datasets and complex analyses.

  • Flexibility: Allows for adding or removing dimensions as business requirements change.

Star schemas are a widely used and effective data warehouse design pattern, providing a solid foundation for data analysis and reporting.

Sample Data for a Sales Fact Table

Note: These are sample values. You can customize them based on your specific business requirements and data.

SalesID

ProductID

CustomerID

DateID

Quantity

Price

Total

1001

101

1001

20231001

2

29.99

59.98

1002

102

1002

20231002

1

19.95

19.95

1003

101

1003

20231003

3

29.99

89.97

1004

103

1001

20231004

2

15.99

31.98

1005

102

1003

20231005

4

19.95

79.80

1006

104

1002

20231006

3

24.99

74.97

1007

101

1004

20231007

1

29.99

29.99

1008

103

1003

20231008

5

15.99

79.95

1009

102

1004

20231009

2

19.95

39.90

1010

105

1001

20231010

3

34.99

104.97

1011

103

1002

20231011

2

15.99

31.98

1012

104

1003

20231012

4

24.99

99.96

1013

101

1004

20231013

1

29.99

29.99

1014

105

1002

20231014

3

34.99

104.97

1015

102

1001

20231015

2

19.95

39.90

1016

103

1004

20231016

5

15.99

79.95

1017

104

1002

20231017

3

24.99

74.97

1018

101

1003

20231018

1

29.99

29.99

1019

105

1001

20231019

3

34.99

104.97

1020

102

1004

20231020

2

19.95

39.90

Sample Data for the Products Dimension Table

Note: The specific columns and data in a Products dimension table will depend on your business requirements. Here's a basic example:

ProductID

ProductName

Category

Subcategory

Price

101

Laptop

Electronics

Computers

29.99

102

Book

Books

Fiction

19.95

103

Smartphone

Electronics

Mobile Devices

499.99

104

T-Shirt

Clothing

Casual

19.99

105

Jeans

Clothing

Denim

39.99

106

Coffee Maker

Appliances

Kitchen

99.99

107

Headphones

Electronics

Audio

79.99

108

Backpack

Accessories

Bags

24.99

109

Watch

Accessories

Watches

149.99

110

Shoes

Footwear

Sneakers

59.99

111

Toy Car

Toys

Vehicles

12.99

112

Board Game

Games

Board Games

24.99

113

Blender

Appliances

Kitchen

59.99

114

Camera

Electronics

Photography

399.99

115

Dress

Clothing

Formal

79.99

116

Gaming Console

Electronics

Gaming

299.99

117

Cookware Set

Appliances

Kitchen

129.99

118

Belt

Accessories

Belts

19.99

119

Sunglasses

Accessories

Eyewear

29.99

120

Mug

Kitchenware

Drinkware

9.99

Explanation:

  • ProductID: A unique identifier for each product.

  • ProductName: The name of the product.

  • Category: The broader category to which the product belongs.

  • Subcategory: A more specific category within the main category.

  • Price: The unit price of the product.

You can customize these columns and add or remove dimensions based on your specific business requirements. For example, you might include additional dimensions like brand, color, or size.

Sample Data for the Customers Dimension Table

CustomerID

CustomerName

City

State

Country

1001

John Doe

New York

NY

USA

1002

Jane Smith

Los Angeles

CA

USA

1003

David Lee

Chicago

IL

USA

1004

Emily Brown

Houston

TX

USA

1005

Michael Johnson

Phoenix

AZ

USA

1006

Sarah Miller

Dallas

TX

USA

1007

Thomas Wilson

Atlanta

GA

USA

1008

Olivia Taylor

Seattle

WA

USA

1009

James Clark

Boston

MA

USA

1010

Jennifer Davis

San Francisco

CA

USA

1011

William Hill

Detroit

MI

USA

1012

Victoria Harris

Philadelphia

PA

USA

1013

Daniel Carter

San Antonio

TX

USA

1014

Emma Jones

Indianapolis

IN

USA

1015

Christopher Lee

Jacksonville

FL

USA

1016

Ashley Williams

San Diego

CA

USA

1017

Benjamin Scott

Austin

TX

USA

1018

Abigail Baker

Columbus

OH

USA

1019

Ethan Hall

Las Vegas

NV

USA

1020

Sophia King

Memphis

TN

USA

Note: You can customize this table to include additional customer attributes as needed, such as email address, phone number, age, gender, and purchase history.

Sample Data for the Time Dimension Table

Note: This sample data includes common time attributes. You can customize it based on your specific reporting requirements.

DateID

Date

Year

Quarter

Month

DayOfWeek

WeekOfYear

20231001

2023-10-01

2023

4

10

Sunday

40

20231002

2023-10-02

2023

4

10

Monday

40

20231003

2023-10-03

2023

4

10

Tuesday

40

20231004

2023-10-04

2023

4

10

Wednesday

40

20231005

2023-10-05

2023

4

10

Thursday

40

20231006

2023-10-06

2023

4

10

Friday

40

20231007

2023-10-07

2023

4

10

Saturday

40

20231008

2023-10-08

2023

4

10

Sunday

41

20231009

2023-10-09

2023

4

10

Monday

41

20231010

2023-10-10

2023

4

10

Tuesday

41

20231011

2023-10-11

2023

4

10

Wednesday

41

20231012

2023-10-12

2023

4

10

Thursday

41

20231013

2023-10-13

2023

4

10

Friday

41

20231014

2023-10-14

2023

4

10

Saturday

41

20231015

2023-10-15

2023

4

10

Sunday

42

20231016

2023-10-16

2023

4

10

Monday

42

20231017

2023-10-17

2023

4

10

Tuesday

42

20231018

2023-10-18

2023

4

10

Wednesday

42

20231019

2023-10-19

2023

4

10

Thursday

42

20231020

2023-10-20

2023

4

10

Friday

42

Explanation:

  • DateID: A unique identifier for the date.

  • Date: The actual date in YYYY-MM-DD format.

  • Year: The year of the date.

  • Quarter: The quarter of the year (1, 2, 3, or 4).

  • Month: The month of the year (1-12).

  • DayOfWeek: The day of the week (Sunday, Monday, etc.).

  • WeekOfYear: The week of the year.

This sample data provides various time-related attributes that can be used for analysis and reporting. You can customize the dimensions and levels of detail based on your specific requirements.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.