Walmart Sales & Marketing Analytics​
Home / Projects – walmart

📂 Table of Content

🔎 Introduction​

This project explores Walmart’s sales trends, promotional effectiveness, and customer behavior using real transaction data. The goal is to uncover data-driven insights that help optimize revenue, improve marketing strategies, and enhance inventory management. By analyzing key factors such as best-selling products, promotion impact, customer spending patterns, and demand forecasting accuracy, aim to provide actionable recommendations for Walmart’s business growth.

📋 Project Overview​

This analysis focuses on three key areas:

📊 Sales Performance

  • Identify top-selling and underperforming products.
  • Analyze regional store performance and revenue trends.

🎯 Marketing & Promotions Impact

  • Measure the effectiveness of different promotions on sales.
  • Understand customer response to discounts and loyalty programs.

📦 Demand Forecasting & Inventory Management

  • Compare forecasted vs. actual demand to assess prediction accuracy.
  • Identify stockout trends and recommend inventory optimizations.

Key Deliverables:

  • Best & worst-selling product insights
  • Promotion effectiveness analysis
  • Customer segmentation insights
  • Demand forecasting accuracy assessment

📊 Summary of the Dataset​

📂 Key Dataset Attributes

  • Total Records: 5,000 transactions
  • Total Columns: 28 features
  • Time Period Covered: Includes sales data over multiple dates

🛒 Sales & Transaction Data

  • Transaction ID: Unique identifier for each sale
  • Product Details: Product ID, Name, Category, Unit Price, Quantity Sold
  • Store Information: Store ID, Store Location

👥 Customer Demographics

  • Customer ID: Unique identifier per customer
  • Age & Gender: Customer’s age and gender distribution
  • Income Level: Customer’s financial profile
  • Loyalty Level: Bronze, Silver, Gold, Platinum

📣 Marketing & Promotions

  • Promotions Applied: Whether a discount was used
  • Promotion Type: Percentage discount, Buy 1 Get 1, etc.
  • Payment Method: Cash, Credit Card, Digital Wallet

📦 Inventory & Demand Forecasting

  • Inventory Level: Stock availability before a transaction
  • Stockout Indicator: Whether an item was out of stock
  • Forecasted vs. Actual Demand: Accuracy of sales predictions

🌦 External Influences

  • Weather Conditions: Rainy, Sunny, Stormy, etc.
  • Holiday Indicator: Whether the transaction occurred during a holiday
  • Weekday: Day of the transaction

🔧 Phase 1: Data Cleaning & Preparation

Importing Libraries​
				
					# Data Manipulation
import pandas as pd
import numpy as np

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical Analysis
from scipy import stats

# Display settings for better readability
pd.set_option('display.float_format', '{:.2f}'.format)



				
			
Loading the Dataset & Basic Data Check​
				
					df = pd.read_csv(filepath_or_buffer=r'D:\roy\roy files\data\data project\Walmart.csv')
df.head()
				
			
transaction_idcustomer_idproduct_idproduct_namecategoryquantity_soldunit_pricetransaction_datestore_idstore_locationinventory_levelreorder_pointreorder_quantitysupplier_idsupplier_lead_timecustomer_agecustomer_gendercustomer_incomecustomer_loyalty_levelpayment_methodpromotion_appliedpromotion_typeweather_conditionsholiday_indicatorweekdaystockout_indicatorforecasted_demandactual_demand
12824843FridgeElectronics3188.463/31/2024 21:463Miami, FL246116170474829Other98760.83SilverCredit CardTrueStormyFalseFridayTrue172179
21409135TVElectronics41912.047/28/2024 12:455Dallas, TX4370212135434Other69781.93GoldCashTruePercentage DiscountRainyFalseMondayTrue109484
35506391FridgeElectronics41377.756/10/2024 4:551Los Angeles, CA41194252475269Other77373.1PlatinumCashFalseSunnyFalseTuesdayTrue289416
45012710SmartphoneElectronics5182.318/15/2024 1:035Miami, FL45287112118747Other33383.04SilverCashTruePercentage DiscountSunnyTrueSundayFalse174446
54657116LaptopElectronics3499.289/13/2024 0:456Chicago, IL41299177155470Female108999.41BronzeDigital WalletFalseSunnyFalseThursdayTrue287469
				
					pd.set_option('display.max_columns', None)
df.info()
				
			
				
					<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 28 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   transaction_id          5000 non-null   int64  
 1   customer_id             5000 non-null   int64  
 2   product_id              5000 non-null   int64  
 3   product_name            5000 non-null   object 
 4   category                5000 non-null   object 
 5   quantity_sold           5000 non-null   int64  
 6   unit_price              5000 non-null   float64
 7   transaction_date        5000 non-null   object 
 8   store_id                5000 non-null   int64  
 9   store_location          5000 non-null   object 
 10  inventory_level         5000 non-null   int64  
 11  reorder_point           5000 non-null   int64  
 12  reorder_quantity        5000 non-null   int64  
 13  supplier_id             5000 non-null   int64  
 14  supplier_lead_time      5000 non-null   int64  
 15  customer_age            5000 non-null   int64  
 16  customer_gender         5000 non-null   object 
 17  customer_income         5000 non-null   float64
 18  customer_loyalty_level  5000 non-null   object 
 19  payment_method          5000 non-null   object 
 20  promotion_applied       5000 non-null   bool   
 21  promotion_type          1593 non-null   object 
 22  weather_conditions      5000 non-null   object 
 23  holiday_indicator       5000 non-null   bool   
 24  weekday                 5000 non-null   object 
 25  stockout_indicator      5000 non-null   bool   
 26  forecasted_demand       5000 non-null   int64  
 27  actual_demand           5000 non-null   int64  
dtypes: bool(3), float64(2), int64(13), object(10)
memory usage: 991.3+ KB
				
			
Data Cleaning
  • Check for Missing Values
  • Check & Remove Duplicates
  • Fix Data Types (Dates, Categories, Numbers, Booleans)
				
					# Checking for missing values in each column
df.isna().sum()
				
			
				
					transaction_id               0
customer_id                  0
product_id                   0
product_name                 0
category                     0
quantity_sold                0
unit_price                   0
transaction_date             0
store_id                     0
store_location               0
inventory_level              0
reorder_point                0
reorder_quantity             0
supplier_id                  0
supplier_lead_time           0
customer_age                 0
customer_gender              0
customer_income              0
customer_loyalty_level       0
payment_method               0
promotion_applied            0
promotion_type            3407
weather_conditions           0
holiday_indicator            0
weekday                      0
stockout_indicator           0
forecasted_demand            0
actual_demand                0
dtype: int64
				
			
				
					# replace NaN with "No Promotion"
df['promotion_type'].fillna('No Promotion',inplace=True)
				
			
				
					# Count duplicate rows
df.duplicated().sum()
				
			
				
					np.int64(0)
				
			
				
					# Convert transaction_date to datetime format
df['transaction_date'] = pd.to_datetime(df['transaction_date'],errors='coerce')

# Convert categorical columns to category type
categorical_cols = ['category', 'store_location', 'customer_loyalty_level', 'payment_method', 
                    'promotion_type', 'weather_conditions', 'customer_gender', 'weekday']
df[categorical_cols] = df[categorical_cols].astype('category')
				
			
Validating and Ensuring Data Consistency
  • Check for Unexpected Values in Categorical Columns
  • Check for Negative or Illogical Numeric Values
  • Verify Date Ranges
  • Check for Unusual or Duplicated Entries in Key Columns
				
					# Display unique values in categorical columns
for col in categorical_cols:
    print(f"Unique values in '{col}':\n", df[col].unique(), "\n")

				
			
				
					Unique values in 'category':
 ['Electronics', 'Appliances']
Categories (2, object): ['Appliances', 'Electronics'] 

Unique values in 'store_location':
 ['Miami, FL', 'Dallas, TX', 'Los Angeles, CA', 'Chicago, IL', 'New York, NY']
Categories (5, object): ['Chicago, IL', 'Dallas, TX', 'Los Angeles, CA', 'Miami, FL', 'New York, NY'] 

Unique values in 'customer_loyalty_level':
 ['Silver', 'Gold', 'Platinum', 'Bronze']
Categories (4, object): ['Bronze', 'Gold', 'Platinum', 'Silver'] 

Unique values in 'payment_method':
 ['Credit Card', 'Cash', 'Digital Wallet', 'Debit Card']
Categories (4, object): ['Cash', 'Credit Card', 'Debit Card', 'Digital Wallet'] 

Unique values in 'promotion_type':
 ['No Promotion', 'Percentage Discount', 'BOGO']
Categories (3, object): ['BOGO', 'No Promotion', 'Percentage Discount'] 

Unique values in 'weather_conditions':
 ['Stormy', 'Rainy', 'Sunny', 'Cloudy']
Categories (4, object): ['Cloudy', 'Rainy', 'Stormy', 'Sunny'] 

Unique values in 'customer_gender':
 ['Other', 'Female', 'Male']
Categories (3, object): ['Female', 'Male', 'Other'] 

Unique values in 'weekday':
 ['Friday', 'Monday', 'Tuesday', 'Sunday', 'Thursday', 'Saturday', 'Wednesday']
Categories (7, object): ['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday', 'Wednesday']
				
			
				
					# List of numeric columns to check
numeric_cols_check = ['unit_price', 'quantity_sold', 'customer_income', 'forecasted_demand', 'actual_demand']

# Check for negative values
for col in numeric_cols_check:
    invalid_values = df[df[col] < 0]
    if not invalid_values.empty:
        print(f"⚠️ Warning: Negative values found in '{col}':")
        print(invalid_values[[col]].head(), "\n")
    else:
        print(f"✅ '{col}' has no negative values.")

				
			
				
					✅ 'unit_price' has no negative values.
✅ 'quantity_sold' has no negative values.
✅ 'customer_income' has no negative values.
✅ 'forecasted_demand' has no negative values.
✅ 'actual_demand' has no negative values.
				
			
				
					# Display the earliest and latest transaction dates
min_date = df['transaction_date'].min()
max_date = df['transaction_date'].max()

print(f"Earliest Date: {min_date}")
print(f"Latest Date: {max_date}")

# Check for any future dates beyond today
from datetime import datetime

today = datetime.today()
future_dates = df[df['transaction_date'] > today]

if not future_dates.empty:
    print("Warning: There are transactions with futre dates!")
    print(future_dates[['transaction_date']].head())
else:
    print("No future dates found in dataset.")

				
			
				
					📅 Earliest Date: 2024-01-01 00:31:00
📅 Latest Date: 2024-09-16 20:22:00
No future dates found in dataset.
				
			
				
					# Check for duplicate product entries within the same transaction (if applicable)
duplicate_products = df[df.duplicated(subset=['transaction_id', 'product_id'],keep=False)]
print(f"Duplicate Product Entries Within Transactions: {duplicate_products.shape[0]}")

				
			
				
					Duplicate Product Entries Within Transactions: 0
				
			
Summary of Data Cleaning & Preparation
StepTaskStatus
1. Importing LibrariesLoaded Pandas, NumPy, Matplotlib, etc.✅ Completed
2. Loading the DatasetRead the dataset and checked structure✅ Completed
3. Data CleaningFixed missing values, handled duplicates✅ Completed
4. Fixing Data TypesConverted dates, numerical, categorical fields✅ Completed
5. Validating Data ConsistencyChecked for anomalies, duplicate records, incorrect values✅ Completed

📈 Phase 2: Exploratory Data Analysis (EDA)

Data Distribution
				
					# Summary statistics for numerical columns
df.describe().round(2)
				
			
transaction_idcustomer_idproduct_idquantity_soldunit_pricetransaction_datestore_idinventory_levelreorder_pointreorder_quantitysupplier_idsupplier_lead_timecustomer_agecustomer_incomeforecasted_demandactual_demand
count5000.05000.05000.05000.05000.050005000.05000.05000.05000.05000.05000.05000.05000.05000.05000.0
mean2500.55542.5551.232.981023.472024-05-11 05:10:22.74000010.52253.1299.79200.52300.135.5244.1270041.63297.13299.09
min1.01001.0100.01.050.12024-01-01 00:31:001.00.050.0100.0100.01.018.020005.34100.090.0
25%1250.753279.0322.02.0537.782024-03-07 19:21:155.0130.075.0150.75199.03.031.044865.42195.0194.0
50%2500.55558.0559.03.01029.182024-05-11 20:02:3011.0253.0100.0200.5299.06.044.070188.29297.5299.0
75%3750.257767.25776.04.01506.312024-07-16 08:58:1516.0377.25125.0251.0405.08.058.095395.87395.0404.0
max5000.09998.0999.05.01999.852024-09-16 20:22:0020.0500.0150.0300.0500.010.070.0119999.78500.0510.0
std1443.522582.13258.831.42559.615.79142.8929.1358.26116.392.8615.3329053.37115.57121.68
				
					# Summary of categorical columns 
df.describe(include=['category'])
				
			
categorystore_locationcustomer_gendercustomer_loyalty_levelpayment_methodpromotion_typeweather_conditionsweekday
count50005000500050005000500050005000
unique25344347
topElectronicsLos Angeles, CAMalePlatinumCredit CardNo PromotionCloudyMonday
freq2585103816821299128134071281775
Outlier Detection & Treatment
  • Detect Outliers – Boxplot Visualization
  • Z-Score & IQR Methods
				
					import matplotlib.pyplot as plt
import seaborn as sns

# List of numerical columns to check for outliers
numeric_cols = ['unit_price', 'quantity_sold', 'customer_income', 'forecasted_demand', 'actual_demand']

# Apply log transformation to adjust scaling
df_log = df.copy()
df_log[numeric_cols] = np.log1p(df[numeric_cols])

# Create a new boxplot with adjusted scaling
plt.figure(figsize=(12, 6))
sns.boxplot(data=df_log[numeric_cols])
plt.xticks(rotation=45)
plt.title("Boxplot for Outlier Detection (Log-Scaled)")
plt.show()
				
			

🔍 Key Observations:
✅ Customer Income is now well-balanced (log-transformed).
✅ Unit Price shows some lower-end outliers (small dots below whiskers).
✅ Other variables (quantity_sold, forecasted_demand, actual_demand) seem normally distributed (no major outliers).

 

				
					from scipy import stats

# Define threshold for Z-score method (values beyond ±3 standard deviations are considered outliers)
z_threshold = 3 

# Identify outliers in 'unit_price' using Z-score
z_scores_unit_price = np.abs(stats.zscore(df['unit_price']))  # Compute Z-score
outliers_z_unit_price = df[z_scores_unit_price > z_threshold]  # Filter outliers

if not outliers_z_unit_price.empty:
    print("There are some streme outliers in unit_price based on the Z-score method.")
else:
    print("There are no extreme outliers in unit_price based on the Z-score method.")
				
			
				
					There are no extreme outliers in unit_price based on the Z-score method.
				
			
				
					# Interquartile Range (IQR) method for outlier detection
Q1 = df['unit_price'].quantile(0.25)
Q3 = df['unit_price'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Identify outliers in 'unit_price' using IQR
outliers_iqr_unit_price = df[(df['unit_price'] < lower_bound) | (df['unit_price'] > upper_bound)]

if not outliers_iqr_unit_price.empty:
    print("There are some streme outliers in unit_price based on the IQR method method.")
else:
    print("There are no extreme outliers in unit_price based on the IQR method method.")

				
			
				
					There are no extreme outliers in unit_price based on the IQR method method.
				
			

📢 Conclusion:

✅ No extreme outliers detected in any numerical variables, confirming that the dataset is clean for further analysis.

 

Customer Analysis
  • Customer Demographics
  • Customer Loyalty Trends
  • Payment Method Preferences
  • Customer Spending Behavior
				
					'''Customer Demographics Analysis'''

# Plot Age Distribution
plt.figure(figsize=(8, 5))
sns.histplot(df['customer_age'], bins=20, kde=True, color='royalblue')
plt.title("Customer Age Distribution")
plt.xlabel("Age")
plt.ylabel("Count")
plt.show()
				
			
				
					# Plot Gender Distribution
plt.figure(figsize=(6, 5))
bars = sns.countplot(data=df, x='customer_gender', palette='Set2')
plt.title("Customer Gender Distribution")
plt.xlabel("Gender")
plt.ylabel("Count")

# Add annotations
for bar in bars.patches:  
    yval = bar.get_height()
    plt.text(
        bar.get_x() + bar.get_width()/2, yval, 
        f'{yval:.0f} ',  
        ha='center', va='bottom', fontsize=10
    )

plt.show()
				
			
				
					# Plot Income Distribution
plt.figure(figsize=(8, 5))
sns.histplot(df['customer_income'], bins=20, kde=True, palette='set1')
plt.title("Customer Income Distribution")
plt.xlabel("Income ($)")
plt.ylabel("Count")
plt.show()
				
			

📢 Key Insights:

✅Age Distribution:

    • Customers are evenly distributed across different age groups, ranging from early 20s to late 60s.
    • No dominant age group, indicating a broad customer base.

✅Gender Distribution:

    • The dataset has a balanced representation of Male, Female, and Other customers.
    • No significant gender bias, meaning marketing strategies should target all genders equally.

✅Income Distribution:

    • Customers have a wide range of incomes, from $20,000 to $120,000.
    • The distribution is relatively even, meaning Walmart serves both low-income and high-income segments.
				
					'''Customer Loyalty Level Analysis'''

# Get loyalty level distribution counts
loyalty_counts = df['customer_loyalty_level'].value_counts()

# Create Pie Chart
plt.figure(figsize=(7, 7))
plt.pie(loyalty_counts, labels=loyalty_counts.index, autopct='%1.1f%%', 
        colors=['#66c2a5', '#2ACCE7', '#8da0cb', '#e78ac3'], 
        startangle=140, wedgeprops={'edgecolor': 'grey'})

# Set title
plt.title("Customer Loyalty Level Distribution")
plt.show()
				
			

📢 Key Insights:: 

✅ Platinum (26%) & Silver (25.8%) customers dominate, indicating strong engagement.
✅ Bronze (25.1%) customers have potential for upgrades with better incentives.
✅ Gold (23.2%) is the least preferred tier, suggesting a need for improved benefits.

 

				
					'''Payment Method Preferences Analysis'''

# Plot Payment Method Distribution
plt.figure(figsize=(7, 5))
bars = sns.countplot(data=df, x='payment_method', order=df['payment_method'].value_counts().index, palette='pastel')
plt.title("Customer Payment Method Preferences")
plt.xlabel("Payment Method")
plt.ylabel("Count")

# Add annotations to bars
for bar in bars.patches:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval, f'{yval:.0f}', ha='center', va='bottom', fontsize=10)

plt.show()
				
			

📢 Key Insights:

Credit Card (1,281) is the most preferred payment method, showing that many customers rely on credit-based transactions.
Digital Wallets (1,263) are nearly as popular, indicating a shift towards cashless and mobile payments.
Cash (1,246) remains a strong option, meaning traditional payment methods are still widely used.

Debit Card (1,210) is the least used method, but the difference is minimal, showing all payment options are well-balanced.

 

				
					'''Customer Spending Behavior Analysis'''

# Create a new column for total spending
df['total_price'] = df['unit_price'] * df['quantity_sold']

# Create a boxplot for total spending based on loyalty level
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x='customer_loyalty_level', y='total_price', palette='coolwarm')
plt.title("Customer Spending Behavior by Loyalty Level")
plt.xlabel("Loyalty Level")
plt.ylabel("Total Spending ($)")
plt.show()

				
			

📢 Key Insights:

✅ All loyalty levels have similar median spending (~$2,000).

    • This suggests that spending habits are relatively consistent across customer tiers.

✅ Wide range of spending within each group.

    • Some customers spend as low as $0, while others exceed $10,000, indicating a mix of low- and high-spending customers in all tiers.

✅ Presence of high-spending outliers.

    • A few customers in each loyalty tier make very large purchases ($10,000+), likely bulk or high-value item buyers.
Business Recommendations:

Segment marketing campaigns based on income, age, and loyalty level.
Enhance Gold-tier benefits to increase participation.
Encourage debit card usage through cashback or loyalty perks.
Target low-spending customers with promotional incentives to boost spending.
Create personalized offers for high-spending customers to retain premium buyers.

📈 Phase 3: Business Insights & Reporting

Sales Trends Analysis
				
					'''Sales Trends Analysis'''

# Aggregate total sales per week
sales_trend = df.groupby(pd.Grouper(key='transaction_date', freq='W'))['total_price'].sum()

plt.figure(figsize=(12, 5))

# Plot the weekly sales trend
sns.lineplot(x=sales_trend.index, y=sales_trend.values, marker='o', color='royalblue', label="Weekly Sales")

# Add a trend line using numpy polyfit 
x_values = np.arange(len(sales_trend))  # Convert index to numerical values
y_values = sales_trend.values
z = np.polyfit(x_values, y_values, 1) 
p = np.poly1d(z)

# Plot the trend line
plt.plot(sales_trend.index, p(x_values), color='red', linestyle='dashed', label="Trend Line")
plt.title("Weekly Sales Trends Over Time with Trend Line")
plt.xlabel("Date")
plt.ylabel("Total Sales ($)")
plt.xticks(rotation=45)
plt.legend()
plt.grid(True)

plt.show()
				
			

📢 Key Findings:

 Weekly sales fluctuate significantly, with peaks exceeding $500,000 and dips below $300,000.
✅ A sharp drop at the end of the period, indicating a potential sales issue (e.g., seasonal drop, inventory shortage, or data error).
✅ The red trend line shows a slight downward trend, meaning overall sales might be declining over time.

 

Best-Selling Products Analysis
				
					'''Best-Selling Products Analysis'''

# Aggregate total sales per product
top_products = df.groupby('product_name')['total_price'].sum().sort_values(ascending=False).head(10)

# Plot Best-Selling Products
plt.figure(figsize=(10, 5))
sns.barplot(x=top_products.values, y=top_products.index, palette='viridis')
plt.title("Top 10 Best-Selling Products")
plt.xlabel("Total Sales ($)")
plt.ylabel("Product Name")

# Add values on bars
for index, value in enumerate(top_products.values):
    plt.text(value, index, f"${value:,.0f}", va='center', fontsize=10)

plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

				
			

📢 Key Findings:

✅ TVs are the top-selling product, generating over $2 million in total sales.
✅ Tablets, Fridges, and Smartphones also drive high revenue, indicating strong demand.
✅ Washing Machines and Cameras are in the mid-range but still contribute significantly.
✅ Laptops rank the lowest among the top 8, but still exceed $1.7 million in sales.

 

Regional Store Performance Analysis
				
					'''Regional Store Performance Analysis'''

# Aggregate total sales per store location
store_performance = df.groupby('store_location')['total_price'].sum().sort_values(ascending=False).head(10)

# Plot Top 10 Best-Performing Stores
plt.figure(figsize=(10, 5))
sns.barplot(x=store_performance.values, y=store_performance.index, order=store_performance.index, palette='magma')
plt.title("Top 10 Best-Performing Store Locations")
plt.xlabel("Total Sales ($)")
plt.ylabel("Store Location")

# Add values on bars
for index, value in enumerate(store_performance.values):
    plt.text(value, index, f"${value:,.0f}", va='center', fontsize=10)

plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

				
			

📢 Key Findings:

✅ Los Angeles, CA is the top-performing store with $3.27 million in total sales.
✅ Chicago, IL follows closely at $3.15 million, showing strong sales demand.
✅ New York, Miami, and Dallas are all in the $2.9 million range, contributing significantly.
✅ The top 5 stores generate a major share of total revenue, indicating high footfall and demand in these regions.

 

Promotions Impact Analysis
				
					'''Promotions Impact Analysis'''

# Compare total sales between promoted and non-promoted transactions
promotion_impact = df.groupby('promotion_applied')['total_price'].sum()

# Plot Promotion Impact
plt.figure(figsize=(7, 5))
sns.barplot(x=promotion_impact.index, y=promotion_impact.values, palette='cool')

# Formatting
plt.title("Impact of Promotions on Total Sales")
plt.xlabel("Promotion Applied")
plt.ylabel("Total Sales ($)")
plt.xticks(ticks=[0, 1], labels=['No Promotion', 'Promotion Applied'])

# Add values on bars
for index, value in enumerate(promotion_impact.values):
    plt.text(index, value, f"${value:,.0f}", ha='center', va='bottom', fontsize=10)

plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

				
			

📢 Key Findings:

✅ Promotions drive higher total sales, reaching $8.06 million compared to $7.2 million without promotions.
Sales increased by approximately 12% when promotions were applied.

 

Demand Forecasting Accuracy Analysis
				
					'''Demand Forecasting Accuracy Analysis'''

# Calculate absolute error between forecasted and actual demand
df['demand_error'] = abs(df['forecasted_demand'] - df['actual_demand'])

# Calculate Mean Absolute Error (MAE)
mae = df['demand_error'].mean()

# Plot Forecasted vs Actual Demand
plt.figure(figsize=(10, 5))
sns.scatterplot(x=df['forecasted_demand'], y=df['actual_demand'], alpha=0.6)
plt.plot([df['forecasted_demand'].min(), df['forecasted_demand'].max()],
         [df['forecasted_demand'].min(), df['forecasted_demand'].max()], 'r--', label="Perfect Prediction")

plt.title(f"Forecasted vs Actual Demand (MAE: {mae:.2f})")
plt.xlabel("Forecasted Demand")
plt.ylabel("Actual Demand")
plt.legend()
plt.grid(alpha=0.5)

plt.show()

				
			

📢 Key Findings:

✅ Mean Absolute Error (MAE) is 137.52, meaning that on average, the forecasted demand deviates by 137 units from actual demand.
✅ The scatter plot shows high variability, with actual demand widely dispersed around the red dashed “Perfect Prediction” line.
✅ Some predictions are fairly close to the actual values, but many data points indicate over- or under-estimations.

Summary of Business Insights & Reporting

🎯 Goal: Transform exploratory findings into actionable insights to drive business decisions.


✅ Sales Trends Analysis : Identified fluctuations in weekly sales, highlighting seasonality and demand shifts.
✅ Best-Selling Products Analysis : TVs, Tablets, and Fridges lead in revenue generation.
✅ Regional Store Performance : Los Angeles, Chicago, and New York stores outperform other locations.
✅ Promotions Impact Analysis : Sales increased by ~12% with promotions, proving their effectiveness.
✅ Demand Forecasting Accuracy : Forecasting error (MAE: 137.52) indicates inconsistencies, suggesting room for improvement.


🔍 Key Takeaways:

✅ Targeted marketing & promotions can further boost sales.
✅ Inventory adjustments needed to match demand patterns.
✅ Enhancing demand forecasting models can reduce stockouts and overstocking.

💡 Recommendations

📊 Sales & Revenue Growth Strategies

Optimize Inventory for Best-Selling Products

      • TVs, Tablets, and Fridges drive the most revenue—maintain priority stock levels.
      • Use predictive analytics to prevent stockouts & excess inventory.

Regional Store Strategy

      • Los Angeles, Chicago, & New York lead in sales—expand marketing efforts & premium product placements in these stores.
      • Tailor promotions regionally based on demand patterns.

Price & Demand Optimization

      • Implement dynamic pricing strategies to adjust unit prices based on real-time demand trends.
      • Use customer segmentation to personalize discounts for high-value customers.

📣 Marketing & Customer Engagement

Expand Promotions Based on ROI

      • Sales increased +12% when promotions were applied—strategically increase promo campaigns during peak demand periods.
      • Focus on seasonal discounts, flash sales, and loyalty-based discounts.

Enhance Customer Loyalty Programs

      • Platinum & Silver customers have higher spending patterns—introduce exclusive membership perks, personalized discounts, and early access deals.
      • Use AI-driven segmentation to target high-value customers with personalized offers.

Leverage Payment Preferences for More Conversions

      • Customers prefer Credit Cards & Digital Wallets—collaborate with banks & fintech companies for cashback rewards & flexible financing options.
      • Promote “Buy Now, Pay Later” (BNPL) services to encourage larger purchases.

📦 Inventory & Demand Forecasting Optimization

Strengthen Demand Forecasting Models

      • Current forecasting shows an MAE of 137.52, indicating a gap in prediction accuracy.
      • Implement machine learning models to refine demand forecasts & reduce errors.

Reduce Stockouts & Overstock Situations

      • Align supply chain planning with actual demand trends.
      • Use real-time monitoring to adjust reorder levels dynamically.

Weather-Driven Stock Adjustments

      • Consider weather trends in inventory planning (e.g., more air conditioners in summer, heaters in winter).
      • Automate supply chain alerts based on environmental & seasonal data.