Synthetic Beverage Sales Data Analysis

Home // Β AI-Powered Data Analysis

πŸ“‚ Table of Content

πŸ“‹ Project Overview​

This project aims to analyze sales transactions from a SQL Server database to uncover key insights related to customer purchasing behavior, revenue trends, and regional performance. The final insights will be presented through a Power BI dashboard to help stakeholders make data-driven business decisions.

Tech Stack:

  • SQL Server – Data storage, querying, and transformation.
  • Power BI – Data visualization and interactive dashboards.
  • DAX (Power BI) – Advanced calculations and performance metrics.

πŸ“Š Summary of the Dataset

Total Records: 8,999,910

Total Columns: 11

Β Key Columns:

  • Order_ID – Unique identifier for each order.
  • Customer_ID – Unique identifier for each customer.
  • Customer_Type – B2B (business) or B2C (individual).
  • Product – The name of the product sold.
  • Category – Product category (Water, Soft Drinks, Juices, Alcoholic Beverages, etc.).
  • Unit_Price – Price per unit of the product.
  • Quantity – Number of units sold.
  • Discount – Percentage discount applied.
  • Total_Price – Total revenue after discount.
  • Region – Geographic location of the customer.
  • Order_Date – Date when the order was placed.

πŸ”Ž Key Analysis Areas:

  • Customer Revenue Analysis – Identify top customers (B2B & B2C) by revenue and order frequency.
  • Product Performance – Determine the best-selling and least profitable products.
  • Regional Sales Insights – Analyze how different regions contribute to overall revenue.
  • Discount Impact Analysis – Evaluate how discounts affect sales and profitability.
  • Sales Trend Forecasting – Use SQL analytics to predict future sales trends based on historical data.

πŸ’Ύ Data Acquisition & Preparation

Create the Table and Run BULK INSERT
				
					CREATE TABLE Sales (
    Order_ID VARCHAR(50),
    Customer_ID VARCHAR(50),
    Customer_Type VARCHAR(10),
    Product VARCHAR(255),
    Category VARCHAR(255),
    Unit_Price DECIMAL(10,2),
    Quantity INT,
    Discount DECIMAL(5,2),
    Total_Price DECIMAL(10,2),
    Region VARCHAR(255),
    Order_Date DATE
);

BULK INSERT Sales
FROM 'D:\roy\roy files\Data\synthetic_beverage_sales_data.csv' 
    FORMAT = 'CSV',
    FIRSTROW = 2,  
    FIELDTERMINATOR = ',', 
    ROWTERMINATOR = '\n',
    TABLOCK
);

				
			
				
					--Check for Exact Duplicate Rows
SELECT Order_ID, Customer_ID, Product, Category, Unit_Price, Quantity, Discount, Total_Price, Region, Order_Date, COUNT(*)
FROM Sales
GROUP BY Order_ID, Customer_ID, Product, Category, Unit_Price, Quantity, Discount, Total_Price, Region, Order_Date
HAVING COUNT(*) > 1;
				
			
				
					--Check for NULL Values
SELECT 
    SUM(CASE WHEN Order_ID IS NULL THEN 1 ELSE 0 END) AS Null_Order_ID,
    SUM(CASE WHEN Customer_ID IS NULL THEN 1 ELSE 0 END) AS Null_Customer_ID,
    SUM(CASE WHEN Product IS NULL THEN 1 ELSE 0 END) AS Null_Product,
    SUM(CASE WHEN Category IS NULL THEN 1 ELSE 0 END) AS Null_Category,
    SUM(CASE WHEN Unit_Price IS NULL THEN 1 ELSE 0 END) AS Null_Unit_Price,
    SUM(CASE WHEN Quantity IS NULL THEN 1 ELSE 0 END) AS Null_Quantity,
    SUM(CASE WHEN Discount IS NULL THEN 1 ELSE 0 END) AS Null_Discount,
    SUM(CASE WHEN Total_Price IS NULL THEN 1 ELSE 0 END) AS Null_Total_Price,
    SUM(CASE WHEN Region IS NULL THEN 1 ELSE 0 END) AS Null_Region,
    SUM(CASE WHEN Order_Date IS NULL THEN 1 ELSE 0 END) AS Null_Order_Date
FROM Sales;
				
			
Null_Order_IDNull_Customer_IDNull_ProductNull_CategoryNull_Unit_PriceNull_QuantityNull_DiscountNull_Total_PriceNull_RegionNull_Order_Date
0000000000
				
					--Check for Invalid Data
SELECT * 
FROM Sales 
WHERE Unit_Price <= 0 
   OR Quantity <= 0 
   OR Total_Price < 0;
				
			
				
					--Check Data Types
SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'Sales';
				
			
COLUMN_NAMEDATA_TYPECHARACTER_MAXIMUM_LENGTH
Order_IDvarchar50
Customer_IDvarchar50
Customer_Typevarchar10
Productvarchar255
Categoryvarchar255
Unit_PricedecimalNULL
QuantityintNULL
DiscountdecimalNULL
Total_PricedecimalNULL
Regionvarchar255
Order_DatedateNULL
				
					--Apply Optimizations
ALTER TABLE Sales ALTER COLUMN Order_ID VARCHAR(20);
ALTER TABLE Sales ALTER COLUMN Customer_ID VARCHAR(20);
ALTER TABLE Sales ALTER COLUMN Customer_Type CHAR(3);
ALTER TABLE Sales ALTER COLUMN Product VARCHAR(100);
ALTER TABLE Sales ALTER COLUMN Category VARCHAR(50);
ALTER TABLE Sales ALTER COLUMN Unit_Price DECIMAL(10,2);
ALTER TABLE Sales ALTER COLUMN Quantity SMALLINT;
ALTER TABLE Sales ALTER COLUMN Discount DECIMAL(5,2);
ALTER TABLE Sales ALTER COLUMN Total_Price DECIMAL(10,2);
				
			
				
					-- Index for time-based queries
CREATE INDEX idx_sales_orderdate ON Sales (Order_Date);

-- Index for customer-based lookups
CREATE INDEX idx_sales_customer ON Sales (Customer_ID);

-- Index for product-based analysis
CREATE INDEX idx_sales_product ON Sales (Product);

-- Index for regional sales queries
CREATE INDEX idx_sales_region ON Sales (Region);

-- Index for fast revenue calculations
CREATE INDEX idx_sales_revenue ON Sales (Total_Price DESC);
				
			
				
					--Create a Partition Function
CREATE PARTITION FUNCTION pf_OrderDate (DATE)
AS RANGE RIGHT FOR VALUES ('2022-01-01', '2023-01-01');

--Create a Partition Scheme
CREATE PARTITION SCHEME ps_OrderDate
AS PARTITION pf_OrderDate ALL TO ([PRIMARY]);

--Create a Partitioned Table
CREATE TABLE Sales_Partitioned (
    Order_ID VARCHAR(20),
    Customer_ID VARCHAR(15),
    Customer_Type CHAR(3),
    Product VARCHAR(100),
    Category VARCHAR(50),
    Unit_Price DECIMAL(10,2),
    Quantity SMALLINT,
    Discount DECIMAL(5,2),
    Total_Price DECIMAL(10,2),
    Region VARCHAR(50),
    Order_Date DATE
)
ON ps_OrderDate(Order_Date);  -- Apply partitioning

--Insert Data into Partitioned Table
INSERT INTO Sales_Partitioned
SELECT * FROM Sales;

--Rename the partitioned table to Sales and Drop the old one
DROP TABLE Sales;  -- Drop old table
EXEC sp_rename 'Sales_Partitioned', 'Sales'; 

--Check the row distribution per partition
SELECT $PARTITION.pf_OrderDate(Order_Date) AS Partition_Number, COUNT(*)
FROM Sales
GROUP BY $PARTITION.pf_OrderDate(Order_Date);
				
			
Partition_Number(No column name)
12997956
23007297
32994657

πŸ“ˆ Advanced SQL Analysis

Customer Revenue Analysis

πŸ’‘ Business Question:

  • Identify the top 5 customers (B2B & B2C separately) who have contributed the highest revenue in the last 12 months, considering discounts and order frequency.
				
					WITH CustomerRevenue AS (
    SELECT 
        Customer_ID,
        Customer_Type,
        COUNT(DISTINCT Order_ID) AS Total_Orders,
        SUM((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Total_Revenue
    FROM Sales
    WHERE Order_Date >= DATEADD(YEAR, -1, '2023-01-01') 
    GROUP BY Customer_ID, Customer_Type
), 
RankedCustomers AS (
    SELECT 
        Customer_ID,
        Customer_Type,
        Total_Orders,
        Total_Revenue,
        RANK() OVER (PARTITION BY Customer_Type ORDER BY Total_Revenue DESC) AS RevenueRank
    FROM CustomerRevenue
)
SELECT 
    Customer_ID,
    Customer_Type,
    Total_Orders,
    Total_Revenue
FROM RankedCustomers
WHERE RevenueRank <= 5
ORDER BY Customer_Type, RevenueRank;
				
			
Customer_IDCustomer_TypeTotal_OrdersTotal_Revenue
CUS1256B2B243273172.4015
CUS9241B2B210261596.7225
CUS2662B2B235260785.9145
CUS9868B2B220260193.9800
CUS4834B2B222256911.5470
CUS9683B2C22852350.0500
CUS2694B2C22846306.2200
CUS5597B2C23445370.0300
CUS5210B2C21644884.7500
CUS2091B2C21944870.5500

πŸ” Key Observations

  • B2B customers generate much higher revenue than B2C customers (5x difference!).
  • Total Orders do not always correlate with Total Revenue (higher-value orders play a role).
  • B2C customers are closer in revenue, while B2B has larger revenue gaps between top customers.

πŸ’‘ Business Question:

  • Which products are the best-selling and least profitable, considering revenue, discounts, and sales volume?
  • How do product sales and profitability trends change over time (monthly, quarterly, yearly)?
				
					WITH ProductSales AS (
    SELECT 
        Product,
        Category,
        SUM(Quantity) AS Total_Quantity_Sold,
        SUM((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Total_Revenue
    FROM Sales 
    GROUP BY Product, Category
),
RankedProducts AS (
    SELECT 
        Product,
        Category,
        Total_Quantity_Sold,
        Total_Revenue,
        RANK() OVER (ORDER BY Total_Quantity_Sold DESC) AS Best_Selling_Rank,
        RANK() OVER (ORDER BY Total_Revenue ASC) AS Least_Profitable_Rank
    FROM ProductSales
)
SELECT 
    Product,
    Category,
    Total_Quantity_Sold,
    Total_Revenue,
    Best_Selling_Rank,
    Least_Profitable_Rank
FROM RankedProducts
WHERE Best_Selling_Rank <= 5 OR Least_Profitable_Rank <= 5
ORDER BY Best_Selling_Rank, Least_Profitable_Rank;

				
			
ProductCategoryTotal_Quantity_SoldTotal_RevenueBest_Selling_RankLeast_Profitable_Rank
Hohes C OrangeJuices779418915048221.7680130
Tomato JuiceJuices739532919758651.7635232
Granini AppleJuices739321212671654.7465329
Cranberry JuiceJuices738084123044834.0420439
Passion Fruit JuiceJuices737232022143149.2490536
VittelWater59210863791982.841094
VolvicWater57384793571393.6015152
Mezzo MixSoft Drinks41298093721743.4680273
KrombacherAlcoholic Beverages27226133897180.6365345
WarsteinerAlcoholic Beverages27052253548409.3180361

πŸ” Key Observations

  • Juices dominate sales volume – The top 5 best-selling products are all juices.
  • Water & Alcoholic Beverages rank high in least profitability – Products like Warsteiner (#1 least profitable) and Vittel (#4 least profitable) generate low revenue compared to sales.
  • Soft Drinks (Mezzo Mix) are neither top-selling nor high-revenue generators – Ranking #27 in best-selling but #3 in least profitable, indicating it might be a low-margin product.
				
					WITH ProductSales AS (
    SELECT 
        Product,
        Category,
        YEAR(Order_Date) AS Sales_Year,
        SUM(Quantity) AS Total_Quantity_Sold,
        SUM((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Total_Revenue
    FROM Sales WITH (NOLOCK)  
    GROUP BY Product, Category, YEAR(Order_Date)
),
RankedProducts AS (
    SELECT 
        Product,
        Category,
        Sales_Year,
        Total_Quantity_Sold,
        Total_Revenue,
        RANK() OVER (PARTITION BY Sales_Year ORDER BY Total_Quantity_Sold DESC) AS Best_Selling_Rank,
        RANK() OVER (PARTITION BY Sales_Year ORDER BY Total_Revenue ASC) AS Least_Profitable_Rank
    FROM ProductSales
)
SELECT 
    Sales_Year,
    Product,
    Category,
    Total_Quantity_Sold,
    Total_Revenue,
    Best_Selling_Rank,
    Least_Profitable_Rank
FROM RankedProducts
WHERE Best_Selling_Rank <= 5 OR Least_Profitable_Rank <= 5
ORDER BY Sales_Year DESC, Best_Selling_Rank, Least_Profitable_Rank;
				
			
Sales_YearProductCategoryTotal_Quantity_SoldTotal_RevenueBest_Selling_RankLeast_Profitable_Rank
2023Hohes C OrangeJuices25919045151293.2435130
2023Granini AppleJuices24650644355558.2705229
2023Passion Fruit JuiceJuices24549387596955.2960336
2023Rauch MultivitaminJuices24501706201396.3390431
2023Mango JuiceJuices24494977689154.2705537
2023VittelWater19655351292822.030094
2023VolvicWater19063511220434.5830152
2023Mezzo MixSoft Drinks13711631272048.0975283
2023KrombacherAlcoholic Beverages9093401337450.7410355
2023WarsteinerAlcoholic Beverages9010951216468.2340361
2022Hohes C OrangeJuices25988415018407.0090130
2022Tomato JuiceJuices24870246641356.8905232
2022Cranberry JuiceJuices24709747704555.9210339
2022Passion Fruit JuiceJuices24702817412354.2290436
2022Granini AppleJuices24679754225958.7500529
2022VittelWater19998451282619.389084
2022VolvicWater19386891207813.8385102
2022Mezzo MixSoft Drinks13898411253588.7695243
2022KrombacherAlcoholic Beverages9147291311510.7020345
2022WarsteinerAlcoholic Beverages9020191183765.6715361
2021Hohes C OrangeJuices26034444878521.5155130
2021Tomato JuiceJuices24662616391263.6040232
2021Cranberry JuiceJuices24607227461822.0065339
2021Granini AppleJuices24601734090137.7260429
2021Passion Fruit JuiceJuices24471017133839.7240536
2021VittelWater19557061216541.422094
2021VolvicWater18934391143145.1800151
2021Mezzo MixSoft Drinks13688051196106.6010283
2021WarsteinerAlcoholic Beverages9021111148175.4125342
2021KrombacherAlcoholic Beverages8985441248219.1935365

πŸ” Key Observations

  • Juices dominate the best-selling category across all years.
  • Water & Alcoholic Beverages consistently rank among the least profitable – Warsteiner, Volvic, and Vittel struggle in revenue every year.
  • Year-over-year product trends remain stable, indicating consistent customer preferences.
				
					WITH MonthlySales AS (
    SELECT 
        YEAR(Order_Date) AS Sales_Year,
        MONTH(Order_Date) AS Sales_Month,
        SUM((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Total_Revenue
    FROM Sales WITH (NOLOCK)
    GROUP BY YEAR(Order_Date), MONTH(Order_Date)
),
MovingAverage AS (
    SELECT 
        Sales_Year,
        Sales_Month,
        Total_Revenue,
        AVG(Total_Revenue) OVER (ORDER BY Sales_Year, Sales_Month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS Moving_Avg_3Months
    FROM MonthlySales
)
SELECT * FROM MovingAverage
ORDER BY Sales_Year DESC, Sales_Month DESC;

				
			

πŸ’‘ Business Question:

  • Which regions contribute the most to total revenue, and how do sales trends vary across regions?
				
					WITH RegionalSales AS (
    SELECT 
        Region,
        COUNT(DISTINCT Order_ID) AS Total_Orders,
        SUM((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Total_Revenue
    FROM Sales WITH (NOLOCK)  
    GROUP BY Region
)
SELECT 
    Region,
    Total_Orders,
    Total_Revenue,
    RANK() OVER (ORDER BY Total_Revenue DESC) AS Revenue_Rank
FROM RegionalSales
ORDER BY Total_Revenue DESC;
				
			
RegionTotal_OrdersTotal_RevenueRevenue_Rank
Hamburg20145782470780.22851
Hessen18234378400118.31552
Saarland19127378390596.47253
Rheinland-Pfalz19268775838685.71704
Mecklenburg-Vorpommern18163775517255.08455
Thηœ‰ringen18785975324875.77256
Berlin18256974567936.48607
Bayern18535272825405.83208
Niedersachsen19229871959057.65809
Sachsen19081371947001.666510
Baden-Wηœ‰rttemberg18421371594846.246011
Nordrhein-Westfalen18971971393809.710012
Brandenburg17761671349037.905013
Schleswig-Holstein18868570317836.279014
Sachsen-Anhalt18209169765550.408015
Bremen18938865018497.414016

πŸ” Key Observations

  • Hamburg leads in total revenue – Highest revenue ($82M+) despite having similar order volume to other regions.
  • Some regions generate lower revenue despite high order count – Example: Bremen ranks #16 in revenue but has 189k+ orders.
  • Revenue distribution is competitive – The top 10 regions have revenues above $70M, showing balanced regional sales.

πŸ’‘ Business Question:

  • How do discounts impact total revenue, profit margins, and sales volume? Are discounts helping or hurting profitability?
				
					WITH DiscountAnalysis AS (
    SELECT 
        CASE 
            WHEN Discount = 0 THEN 'No Discount'
            WHEN Discount BETWEEN 0.01 AND 0.05 THEN 'Low Discount (1-5%)'
            WHEN Discount BETWEEN 0.06 AND 0.10 THEN 'Medium Discount (6-10%)'
            ELSE 'High Discount (>10%)'
        END AS Discount_Category,
        COUNT(DISTINCT Order_ID) AS Total_Orders,
        SUM(Quantity) AS Total_Quantity_Sold,
        SUM((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Total_Revenue,
        AVG((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Avg_Revenue_Per_Order
    FROM Sales WITH (NOLOCK)
    GROUP BY 
        CASE 
            WHEN Discount = 0 THEN 'No Discount'
            WHEN Discount BETWEEN 0.01 AND 0.05 THEN 'Low Discount (1-5%)'
            WHEN Discount BETWEEN 0.06 AND 0.10 THEN 'Medium Discount (6-10%)'
            ELSE 'High Discount (>10%)'
        END
)
SELECT 
    Discount_Category,
    Total_Orders,
    Total_Quantity_Sold,
    Total_Revenue,
    Avg_Revenue_Per_Order
FROM DiscountAnalysis
ORDER BY Total_Revenue DESC;

				
			
Discount_CategoryTotal_OrdersTotal_Quantity_SoldTotal_RevenueAvg_Revenue_Per_Order
Medium Discount (6-10%)80421785916100492967690.5150356.039005
No Discount193119246360564275061551.080047.462006
Low Discount (1-5%)81895648815077246116475.5090171.044997
High Discount (>10%)32569027149380162535574.0910426.583522

πŸ” Key Observations

  • Medium Discount (6-10%) generates the highest total revenue – Contributing $492M with a strong order volume (804K+ orders).
  • High Discounts (>10%) drive the highest revenue per order ($426), but lowest sales volume – Fewer orders, but customers spend more.
  • No Discount orders are the highest (1.9M orders), but the lowest revenue per order ($47.46) – Customers still buy, but average spending is lower.
  • Low Discount (1-5%) doesn’t boost revenue significantly compared to medium discounts – May indicate 1-5% discounts aren’t very effective.

πŸ’‘ Business Question:

  • How can we predict future sales trends based on historical data?
				
					WITH MonthlySales AS (
    SELECT 
        YEAR(Order_Date) AS Sales_Year,
        MONTH(Order_Date) AS Sales_Month,
        SUM((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Total_Revenue
    FROM Sales WITH (NOLOCK)
    GROUP BY YEAR(Order_Date), MONTH(Order_Date)
),
MovingAverage AS (
    SELECT 
        Sales_Year,
        Sales_Month,
        Total_Revenue,
        AVG(Total_Revenue) OVER (ORDER BY Sales_Year, Sales_Month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS Moving_Avg_3Months
    FROM MonthlySales
)
SELECT * FROM MovingAverage
ORDER BY Sales_Year DESC, Sales_Month DESC;

				
			
				
					WITH YearlySales AS (
    SELECT 
        YEAR(Order_Date) AS Sales_Year,
        SUM((Unit_Price * Quantity) - (Discount * Unit_Price * Quantity)) AS Total_Revenue
    FROM Sales WITH (NOLOCK)
    GROUP BY YEAR(Order_Date)
),
GrowthAnalysis AS (
    SELECT 
        Sales_Year,
        Total_Revenue,
        LAG(Total_Revenue) OVER (ORDER BY Sales_Year) AS Previous_Year_Revenue,
        (Total_Revenue - LAG(Total_Revenue) OVER (ORDER BY Sales_Year)) / NULLIF(LAG(Total_Revenue) OVER (ORDER BY Sales_Year), 0) * 100 AS Yearly_Growth_Percentage
    FROM YearlySales
)
SELECT * FROM GrowthAnalysis
ORDER BY Sales_Year DESC;

				
			
Sales_YearSales_MonthTotal_RevenueMoving_Avg_3Months
20231233766725.513533896828.012666
20231133712861.610033635810.899666
20231034210896.914533893958.163333
2023932983674.174533909750.330000
2023834487303.401033981349.161166
2023734258273.414533659205.189000
2023633198470.668033027256.722833
2023533520871.484533202760.579333
2023432362428.016032554577.952500
2023333724982.237533091092.853500
2023231576323.604033334820.229833
2023133971972.719033602645.149500
20221234456164.366533457520.148000
20221132379798.363032806685.140333
20221033536597.714533185142.770500
2022932503659.343533140252.686833
2022833515171.253533155876.733000
2022733401927.463533073539.599000
2022632550531.482032734477.748000
2022533268159.851533152078.151500
2022432384741.910532137655.949666
2022333803332.692532287689.647500
2022230224893.246032167348.150500
2022132834843.004032418190.189500
20211233442308.201532095819.657666
20211130977419.363031360941.113166
20211031867731.408531677002.079833
2021931237672.568031758446.030833
2021831925602.263031676985.205000
2021732112063.261531653615.765333
2021630993290.090531330281.203000
2021531855493.944031984997.909166
2021431142059.574531069890.608666
2021332957440.209031497682.024166
2021229110172.042530767802.931750
2021132425433.821032425433.821000
Sales_YearTotal_RevenuePrevious_Year_RevenueYearly_Growth_Percentage
2023401774783.7570394859820.69101.751200
2022394859820.6910380046686.74703.897700
2021380046686.7470NULLNULL

πŸ” Key Observations

  • Sales have consistently grown year-over-year, but the growth rate is slowing down (3.89% in 2022 vs. 1.75% in 2023).
  • 2023’s growth is lower than 2022’s, suggesting potential market saturation or lower demand.
  • If this trend continues, 2024’s projected growth might be even lower.
  • Sales fluctuate seasonally, but the 3-month moving average smooths the trend.
  • Revenue dips slightly in some months (e.g., Sept 2023 – $32.98M), showing possible seasonal variation.
  • Peak months (e.g., Aug 2023 – $34.48M) show potential high-demand periods (promotions, events?).

🎨 Power BI Dashboard