Projects – walmart

Walmart Sales & Marketing Analytics

Home Projects – walmart

🔎 Introduction

This project explores Walmart’s sales trends, promotional effectiveness, and customer behavior using real transaction data. The goal is to uncover data-driven insights that help optimize revenue, improve marketing strategies, and enhance inventory management. By analyzing key factors such as best-selling products, promotion impact, customer spending patterns, and demand forecasting accuracy, aim to provide actionable recommendations for Walmart’s business growth.

📋 Project Overview

This analysis focuses on three key areas:

📊 Sales Performance

Identify top-selling and underperforming products.
Analyze regional store performance and revenue trends.

🎯 Marketing & Promotions Impact

Measure the effectiveness of different promotions on sales.
Understand customer response to discounts and loyalty programs.

📦 Demand Forecasting & Inventory Management

Compare forecasted vs. actual demand to assess prediction accuracy.
Identify stockout trends and recommend inventory optimizations.

✅ Key Deliverables:

Best & worst-selling product insights
Promotion effectiveness analysis
Customer segmentation insights
Demand forecasting accuracy assessment

📊 Summary of the Dataset

📂 Key Dataset Attributes

Total Records: 5,000 transactions
Total Columns: 28 features
Time Period Covered: Includes sales data over multiple dates

🛒 Sales & Transaction Data

Transaction ID: Unique identifier for each sale
Product Details: Product ID, Name, Category, Unit Price, Quantity Sold
Store Information: Store ID, Store Location

👥 Customer Demographics

Customer ID: Unique identifier per customer
Age & Gender: Customer’s age and gender distribution
Income Level: Customer’s financial profile
Loyalty Level: Bronze, Silver, Gold, Platinum

📣 Marketing & Promotions

Promotions Applied: Whether a discount was used
Promotion Type: Percentage discount, Buy 1 Get 1, etc.
Payment Method: Cash, Credit Card, Digital Wallet

📦 Inventory & Demand Forecasting

Inventory Level: Stock availability before a transaction
Stockout Indicator: Whether an item was out of stock
Forecasted vs. Actual Demand: Accuracy of sales predictions

🌦 External Influences

Weather Conditions: Rainy, Sunny, Stormy, etc.
Holiday Indicator: Whether the transaction occurred during a holiday
Weekday: Day of the transaction

🔧 Phase 1: Data Cleaning & Preparation

Importing Libraries

				
					# Data Manipulation
import pandas as pd
import numpy as np

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical Analysis
from scipy import stats

# Display settings for better readability
pd.set_option('display.float_format', '{:.2f}'.format)

Loading the Dataset & Basic Data Check

				
					df = pd.read_csv(filepath_or_buffer=r'D:\roy\roy files\data\data project\Walmart.csv')
df.head()

transaction_id	customer_id	product_id	product_name	category	quantity_sold	unit_price	transaction_date	store_id	store_location	inventory_level	reorder_point	reorder_quantity	supplier_id	supplier_lead_time	customer_age	customer_gender	customer_income	customer_loyalty_level	payment_method	promotion_applied	promotion_type	weather_conditions	holiday_indicator	weekday	stockout_indicator	forecasted_demand	actual_demand
1	2824	843	Fridge	Electronics	3	188.46	3/31/2024 21:46	3	Miami, FL	246	116	170	474	8	29	Other	98760.83	Silver	Credit Card	True		Stormy	False	Friday	True	172	179
2	1409	135	TV	Electronics	4	1912.04	7/28/2024 12:45	5	Dallas, TX	43	70	212	135	4	34	Other	69781.93	Gold	Cash	True	Percentage Discount	Rainy	False	Monday	True	109	484
3	5506	391	Fridge	Electronics	4	1377.75	6/10/2024 4:55	1	Los Angeles, CA	411	94	252	475	2	69	Other	77373.1	Platinum	Cash	False		Sunny	False	Tuesday	True	289	416
4	5012	710	Smartphone	Electronics	5	182.31	8/15/2024 1:03	5	Miami, FL	452	87	112	118	7	47	Other	33383.04	Silver	Cash	True	Percentage Discount	Sunny	True	Sunday	False	174	446
5	4657	116	Laptop	Electronics	3	499.28	9/13/2024 0:45	6	Chicago, IL	412	99	177	155	4	70	Female	108999.41	Bronze	Digital Wallet	False		Sunny	False	Thursday	True	287	469

				
					pd.set_option('display.max_columns', None)
df.info()

				
					<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 28 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   transaction_id          5000 non-null   int64  
 1   customer_id             5000 non-null   int64  
 2   product_id              5000 non-null   int64  
 3   product_name            5000 non-null   object 
 4   category                5000 non-null   object 
 5   quantity_sold           5000 non-null   int64  
 6   unit_price              5000 non-null   float64
 7   transaction_date        5000 non-null   object 
 8   store_id                5000 non-null   int64  
 9   store_location          5000 non-null   object 
 10  inventory_level         5000 non-null   int64  
 11  reorder_point           5000 non-null   int64  
 12  reorder_quantity        5000 non-null   int64  
 13  supplier_id             5000 non-null   int64  
 14  supplier_lead_time      5000 non-null   int64  
 15  customer_age            5000 non-null   int64  
 16  customer_gender         5000 non-null   object 
 17  customer_income         5000 non-null   float64
 18  customer_loyalty_level  5000 non-null   object 
 19  payment_method          5000 non-null   object 
 20  promotion_applied       5000 non-null   bool   
 21  promotion_type          1593 non-null   object 
 22  weather_conditions      5000 non-null   object 
 23  holiday_indicator       5000 non-null   bool   
 24  weekday                 5000 non-null   object 
 25  stockout_indicator      5000 non-null   bool   
 26  forecasted_demand       5000 non-null   int64  
 27  actual_demand           5000 non-null   int64  
dtypes: bool(3), float64(2), int64(13), object(10)
memory usage: 991.3+ KB

Data Cleaning

Check for Missing Values
Check & Remove Duplicates
Fix Data Types (Dates, Categories, Numbers, Booleans)

				
					# Checking for missing values in each column
df.isna().sum()

				
					transaction_id               0
customer_id                  0
product_id                   0
product_name                 0
category                     0
quantity_sold                0
unit_price                   0
transaction_date             0
store_id                     0
store_location               0
inventory_level              0
reorder_point                0
reorder_quantity             0
supplier_id                  0
supplier_lead_time           0
customer_age                 0
customer_gender              0
customer_income              0
customer_loyalty_level       0
payment_method               0
promotion_applied            0
promotion_type            3407
weather_conditions           0
holiday_indicator            0
weekday                      0
stockout_indicator           0
forecasted_demand            0
actual_demand                0
dtype: int64

				
					# replace NaN with "No Promotion"
df['promotion_type'].fillna('No Promotion',inplace=True)

				
					# Count duplicate rows
df.duplicated().sum()

				
					np.int64(0)

				
					# Convert transaction_date to datetime format
df['transaction_date'] = pd.to_datetime(df['transaction_date'],errors='coerce')

# Convert categorical columns to category type
categorical_cols = ['category', 'store_location', 'customer_loyalty_level', 'payment_method', 
                    'promotion_type', 'weather_conditions', 'customer_gender', 'weekday']
df[categorical_cols] = df[categorical_cols].astype('category')

Validating and Ensuring Data Consistency

Check for Unexpected Values in Categorical Columns
Check for Negative or Illogical Numeric Values
Verify Date Ranges
Check for Unusual or Duplicated Entries in Key Columns

				
					# Display unique values in categorical columns
for col in categorical_cols:
    print(f"Unique values in '{col}':\n", df[col].unique(), "\n")

				
					Unique values in 'category':
 ['Electronics', 'Appliances']
Categories (2, object): ['Appliances', 'Electronics'] 

Unique values in 'store_location':
 ['Miami, FL', 'Dallas, TX', 'Los Angeles, CA', 'Chicago, IL', 'New York, NY']
Categories (5, object): ['Chicago, IL', 'Dallas, TX', 'Los Angeles, CA', 'Miami, FL', 'New York, NY'] 

Unique values in 'customer_loyalty_level':
 ['Silver', 'Gold', 'Platinum', 'Bronze']
Categories (4, object): ['Bronze', 'Gold', 'Platinum', 'Silver'] 

Unique values in 'payment_method':
 ['Credit Card', 'Cash', 'Digital Wallet', 'Debit Card']
Categories (4, object): ['Cash', 'Credit Card', 'Debit Card', 'Digital Wallet'] 

Unique values in 'promotion_type':
 ['No Promotion', 'Percentage Discount', 'BOGO']
Categories (3, object): ['BOGO', 'No Promotion', 'Percentage Discount'] 

Unique values in 'weather_conditions':
 ['Stormy', 'Rainy', 'Sunny', 'Cloudy']
Categories (4, object): ['Cloudy', 'Rainy', 'Stormy', 'Sunny'] 

Unique values in 'customer_gender':
 ['Other', 'Female', 'Male']
Categories (3, object): ['Female', 'Male', 'Other'] 

Unique values in 'weekday':
 ['Friday', 'Monday', 'Tuesday', 'Sunday', 'Thursday', 'Saturday', 'Wednesday']
Categories (7, object): ['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday', 'Wednesday']

				
					# List of numeric columns to check
numeric_cols_check = ['unit_price', 'quantity_sold', 'customer_income', 'forecasted_demand', 'actual_demand']

# Check for negative values
for col in numeric_cols_check:
    invalid_values = df[df[col] < 0]
    if not invalid_values.empty:
        print(f"⚠️ Warning: Negative values found in '{col}':")
        print(invalid_values[[col]].head(), "\n")
    else:
        print(f"✅ '{col}' has no negative values.")

				
					✅ 'unit_price' has no negative values.
✅ 'quantity_sold' has no negative values.
✅ 'customer_income' has no negative values.
✅ 'forecasted_demand' has no negative values.
✅ 'actual_demand' has no negative values.

				
					# Display the earliest and latest transaction dates
min_date = df['transaction_date'].min()
max_date = df['transaction_date'].max()

print(f"Earliest Date: {min_date}")
print(f"Latest Date: {max_date}")

# Check for any future dates beyond today
from datetime import datetime

today = datetime.today()
future_dates = df[df['transaction_date'] > today]

if not future_dates.empty:
    print("Warning: There are transactions with futre dates!")
    print(future_dates[['transaction_date']].head())
else:
    print("No future dates found in dataset.")

				
					📅 Earliest Date: 2024-01-01 00:31:00
📅 Latest Date: 2024-09-16 20:22:00
No future dates found in dataset.

				
					# Check for duplicate product entries within the same transaction (if applicable)
duplicate_products = df[df.duplicated(subset=['transaction_id', 'product_id'],keep=False)]
print(f"Duplicate Product Entries Within Transactions: {duplicate_products.shape[0]}")

				
					Duplicate Product Entries Within Transactions: 0

Summary of Data Cleaning & Preparation

Step	Task	Status
1. Importing Libraries	Loaded Pandas, NumPy, Matplotlib, etc.	Completed
2. Loading the Dataset	Read the dataset and checked structure	Completed
3. Data Cleaning	Fixed missing values, handled duplicates	Completed
4. Fixing Data Types	Converted dates, numerical, categorical fields	Completed
5. Validating Data Consistency	Checked for anomalies, duplicate records, incorrect values	Completed

📈 Phase 2: Exploratory Data Analysis (EDA)

Data Distribution

				
					# Summary statistics for numerical columns
df.describe().round(2)

	transaction_id	customer_id	product_id	quantity_sold	unit_price	transaction_date	store_id	inventory_level	reorder_point	reorder_quantity	supplier_id	supplier_lead_time	customer_age	customer_income	forecasted_demand	actual_demand
count	5000.0	5000.0	5000.0	5000.0	5000.0	5000	5000.0	5000.0	5000.0	5000.0	5000.0	5000.0	5000.0	5000.0	5000.0	5000.0
mean	2500.5	5542.5	551.23	2.98	1023.47	2024-05-11 05:10:22.740000	10.52	253.12	99.79	200.52	300.13	5.52	44.12	70041.63	297.13	299.09
min	1.0	1001.0	100.0	1.0	50.1	2024-01-01 00:31:00	1.0	0.0	50.0	100.0	100.0	1.0	18.0	20005.34	100.0	90.0
25%	1250.75	3279.0	322.0	2.0	537.78	2024-03-07 19:21:15	5.0	130.0	75.0	150.75	199.0	3.0	31.0	44865.42	195.0	194.0
50%	2500.5	5558.0	559.0	3.0	1029.18	2024-05-11 20:02:30	11.0	253.0	100.0	200.5	299.0	6.0	44.0	70188.29	297.5	299.0
75%	3750.25	7767.25	776.0	4.0	1506.31	2024-07-16 08:58:15	16.0	377.25	125.0	251.0	405.0	8.0	58.0	95395.87	395.0	404.0
max	5000.0	9998.0	999.0	5.0	1999.85	2024-09-16 20:22:00	20.0	500.0	150.0	300.0	500.0	10.0	70.0	119999.78	500.0	510.0
std	1443.52	2582.13	258.83	1.42	559.61		5.79	142.89	29.13	58.26	116.39	2.86	15.33	29053.37	115.57	121.68

				
					# Summary of categorical columns 
df.describe(include=['category'])

	category	store_location	customer_gender	customer_loyalty_level	payment_method	promotion_type	weather_conditions	weekday
count	5000	5000	5000	5000	5000	5000	5000	5000
unique	2	5	3	4	4	3	4	7
top	Electronics	Los Angeles, CA	Male	Platinum	Credit Card	No Promotion	Cloudy	Monday
freq	2585	1038	1682	1299	1281	3407	1281	775

Outlier Detection & Treatment

Detect Outliers – Boxplot Visualization
Z-Score & IQR Methods

				
					import matplotlib.pyplot as plt
import seaborn as sns

# List of numerical columns to check for outliers
numeric_cols = ['unit_price', 'quantity_sold', 'customer_income', 'forecasted_demand', 'actual_demand']

# Apply log transformation to adjust scaling
df_log = df.copy()
df_log[numeric_cols] = np.log1p(df[numeric_cols])

# Create a new boxplot with adjusted scaling
plt.figure(figsize=(12, 6))
sns.boxplot(data=df_log[numeric_cols])
plt.xticks(rotation=45)
plt.title("Boxplot for Outlier Detection (Log-Scaled)")
plt.show()

🔍 Key Observations:
✅ Customer Income is now well-balanced (log-transformed).
✅ Unit Price shows some lower-end outliers (small dots below whiskers).
✅ Other variables (quantity_sold, forecasted_demand, actual_demand) seem normally distributed (no major outliers).

				
					from scipy import stats

# Define threshold for Z-score method (values beyond ±3 standard deviations are considered outliers)
z_threshold = 3 

# Identify outliers in 'unit_price' using Z-score
z_scores_unit_price = np.abs(stats.zscore(df['unit_price']))  # Compute Z-score
outliers_z_unit_price = df[z_scores_unit_price > z_threshold]  # Filter outliers

if not outliers_z_unit_price.empty:
    print("There are some streme outliers in unit_price based on the Z-score method.")
else:
    print("There are no extreme outliers in unit_price based on the Z-score method.")

				
					There are no extreme outliers in unit_price based on the Z-score method.

				
					# Interquartile Range (IQR) method for outlier detection
Q1 = df['unit_price'].quantile(0.25)
Q3 = df['unit_price'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Identify outliers in 'unit_price' using IQR
outliers_iqr_unit_price = df[(df['unit_price'] < lower_bound) | (df['unit_price'] > upper_bound)]

if not outliers_iqr_unit_price.empty:
    print("There are some streme outliers in unit_price based on the IQR method method.")
else:
    print("There are no extreme outliers in unit_price based on the IQR method method.")

				
					There are no extreme outliers in unit_price based on the IQR method method.

📢 Conclusion:

✅ No extreme outliers detected in any numerical variables, confirming that the dataset is clean for further analysis.

Customer Analysis

Customer Demographics
Customer Loyalty Trends
Payment Method Preferences
Customer Spending Behavior

				
					'''Customer Demographics Analysis'''

# Plot Age Distribution
plt.figure(figsize=(8, 5))
sns.histplot(df['customer_age'], bins=20, kde=True, color='royalblue')
plt.title("Customer Age Distribution")
plt.xlabel("Age")
plt.ylabel("Count")
plt.show()

				
					# Plot Gender Distribution
plt.figure(figsize=(6, 5))
bars = sns.countplot(data=df, x='customer_gender', palette='Set2')
plt.title("Customer Gender Distribution")
plt.xlabel("Gender")
plt.ylabel("Count")

# Add annotations
for bar in bars.patches:  
    yval = bar.get_height()
    plt.text(
        bar.get_x() + bar.get_width()/2, yval, 
        f'{yval:.0f} ',  
        ha='center', va='bottom', fontsize=10
    )

plt.show()

				
					# Plot Income Distribution
plt.figure(figsize=(8, 5))
sns.histplot(df['customer_income'], bins=20, kde=True, palette='set1')
plt.title("Customer Income Distribution")
plt.xlabel("Income ($)")
plt.ylabel("Count")
plt.show()

📢 Key Insights:

✅Age Distribution:

- Customers are evenly distributed across different age groups, ranging from early 20s to late 60s.
- No dominant age group, indicating a broad customer base.

✅Gender Distribution:

- The dataset has a balanced representation of Male, Female, and Other customers.
- No significant gender bias, meaning marketing strategies should target all genders equally.

✅Income Distribution:

- Customers have a wide range of incomes, from $20,000 to $120,000.
- The distribution is relatively even, meaning Walmart serves both low-income and high-income segments.

				
					'''Customer Loyalty Level Analysis'''

# Get loyalty level distribution counts
loyalty_counts = df['customer_loyalty_level'].value_counts()

# Create Pie Chart
plt.figure(figsize=(7, 7))
plt.pie(loyalty_counts, labels=loyalty_counts.index, autopct='%1.1f%%', 
        colors=['#66c2a5', '#2ACCE7', '#8da0cb', '#e78ac3'], 
        startangle=140, wedgeprops={'edgecolor': 'grey'})

# Set title
plt.title("Customer Loyalty Level Distribution")
plt.show()

📢 Key Insights::

✅ Platinum (26%) & Silver (25.8%) customers dominate, indicating strong engagement.
✅ Bronze (25.1%) customers have potential for upgrades with better incentives.
✅ Gold (23.2%) is the least preferred tier, suggesting a need for improved benefits.

				
					'''Payment Method Preferences Analysis'''

# Plot Payment Method Distribution
plt.figure(figsize=(7, 5))
bars = sns.countplot(data=df, x='payment_method', order=df['payment_method'].value_counts().index, palette='pastel')
plt.title("Customer Payment Method Preferences")
plt.xlabel("Payment Method")
plt.ylabel("Count")

# Add annotations to bars
for bar in bars.patches:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval, f'{yval:.0f}', ha='center', va='bottom', fontsize=10)

plt.show()

📢 Key Insights:

✅ Credit Card (1,281) is the most preferred payment method, showing that many customers rely on credit-based transactions.
✅ Digital Wallets (1,263) are nearly as popular, indicating a shift towards cashless and mobile payments.
✅ Cash (1,246) remains a strong option, meaning traditional payment methods are still widely used.

✅ Debit Card (1,210) is the least used method, but the difference is minimal, showing all payment options are well-balanced.

				
					'''Customer Spending Behavior Analysis'''

# Create a new column for total spending
df['total_price'] = df['unit_price'] * df['quantity_sold']

# Create a boxplot for total spending based on loyalty level
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x='customer_loyalty_level', y='total_price', palette='coolwarm')
plt.title("Customer Spending Behavior by Loyalty Level")
plt.xlabel("Loyalty Level")
plt.ylabel("Total Spending ($)")
plt.show()

📢 Key Insights:

✅ All loyalty levels have similar median spending (~$2,000).

- This suggests that spending habits are relatively consistent across customer tiers.

✅ Wide range of spending within each group.

- Some customers spend as low as $0, while others exceed $10,000, indicating a mix of low- and high-spending customers in all tiers.

✅ Presence of high-spending outliers.

- A few customers in each loyalty tier make very large purchases ($10,000+), likely bulk or high-value item buyers.

Business Recommendations:

✅ Segment marketing campaigns based on income, age, and loyalty level.
✅ Enhance Gold-tier benefits to increase participation.
✅ Encourage debit card usage through cashback or loyalty perks.
✅ Target low-spending customers with promotional incentives to boost spending.
✅ Create personalized offers for high-spending customers to retain premium buyers.

📈 Phase 3: Business Insights & Reporting

Sales Trends Analysis

				
					'''Sales Trends Analysis'''

# Aggregate total sales per week
sales_trend = df.groupby(pd.Grouper(key='transaction_date', freq='W'))['total_price'].sum()

plt.figure(figsize=(12, 5))

# Plot the weekly sales trend
sns.lineplot(x=sales_trend.index, y=sales_trend.values, marker='o', color='royalblue', label="Weekly Sales")

# Add a trend line using numpy polyfit 
x_values = np.arange(len(sales_trend))  # Convert index to numerical values
y_values = sales_trend.values
z = np.polyfit(x_values, y_values, 1) 
p = np.poly1d(z)

# Plot the trend line
plt.plot(sales_trend.index, p(x_values), color='red', linestyle='dashed', label="Trend Line")
plt.title("Weekly Sales Trends Over Time with Trend Line")
plt.xlabel("Date")
plt.ylabel("Total Sales ($)")
plt.xticks(rotation=45)
plt.legend()
plt.grid(True)

plt.show()

📢 Key Findings:

✅ Weekly sales fluctuate significantly, with peaks exceeding $500,000 and dips below $300,000.
✅ A sharp drop at the end of the period, indicating a potential sales issue (e.g., seasonal drop, inventory shortage, or data error).
✅ The red trend line shows a slight downward trend, meaning overall sales might be declining over time.

Best-Selling Products Analysis

				
					'''Best-Selling Products Analysis'''

# Aggregate total sales per product
top_products = df.groupby('product_name')['total_price'].sum().sort_values(ascending=False).head(10)

# Plot Best-Selling Products
plt.figure(figsize=(10, 5))
sns.barplot(x=top_products.values, y=top_products.index, palette='viridis')
plt.title("Top 10 Best-Selling Products")
plt.xlabel("Total Sales ($)")
plt.ylabel("Product Name")

# Add values on bars
for index, value in enumerate(top_products.values):
    plt.text(value, index, f"${value:,.0f}", va='center', fontsize=10)

plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

📢 Key Findings:

✅ TVs are the top-selling product, generating over $2 million in total sales.
✅ Tablets, Fridges, and Smartphones also drive high revenue, indicating strong demand.
✅ Washing Machines and Cameras are in the mid-range but still contribute significantly.
✅ Laptops rank the lowest among the top 8, but still exceed $1.7 million in sales.

Regional Store Performance Analysis

				
					'''Regional Store Performance Analysis'''

# Aggregate total sales per store location
store_performance = df.groupby('store_location')['total_price'].sum().sort_values(ascending=False).head(10)

# Plot Top 10 Best-Performing Stores
plt.figure(figsize=(10, 5))
sns.barplot(x=store_performance.values, y=store_performance.index, order=store_performance.index, palette='magma')
plt.title("Top 10 Best-Performing Store Locations")
plt.xlabel("Total Sales ($)")
plt.ylabel("Store Location")

# Add values on bars
for index, value in enumerate(store_performance.values):
    plt.text(value, index, f"${value:,.0f}", va='center', fontsize=10)

plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

📢 Key Findings:

✅ Los Angeles, CA is the top-performing store with $3.27 million in total sales.
✅ Chicago, IL follows closely at $3.15 million, showing strong sales demand.
✅ New York, Miami, and Dallas are all in the $2.9 million range, contributing significantly.
✅ The top 5 stores generate a major share of total revenue, indicating high footfall and demand in these regions.

Promotions Impact Analysis

				
					'''Promotions Impact Analysis'''

# Compare total sales between promoted and non-promoted transactions
promotion_impact = df.groupby('promotion_applied')['total_price'].sum()

# Plot Promotion Impact
plt.figure(figsize=(7, 5))
sns.barplot(x=promotion_impact.index, y=promotion_impact.values, palette='cool')

# Formatting
plt.title("Impact of Promotions on Total Sales")
plt.xlabel("Promotion Applied")
plt.ylabel("Total Sales ($)")
plt.xticks(ticks=[0, 1], labels=['No Promotion', 'Promotion Applied'])

# Add values on bars
for index, value in enumerate(promotion_impact.values):
    plt.text(index, value, f"${value:,.0f}", ha='center', va='bottom', fontsize=10)

plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

📢 Key Findings:

✅ Promotions drive higher total sales, reaching $8.06 million compared to $7.2 million without promotions.
✅ Sales increased by approximately 12% when promotions were applied.

Demand Forecasting Accuracy Analysis

				
					'''Demand Forecasting Accuracy Analysis'''

# Calculate absolute error between forecasted and actual demand
df['demand_error'] = abs(df['forecasted_demand'] - df['actual_demand'])

# Calculate Mean Absolute Error (MAE)
mae = df['demand_error'].mean()

# Plot Forecasted vs Actual Demand
plt.figure(figsize=(10, 5))
sns.scatterplot(x=df['forecasted_demand'], y=df['actual_demand'], alpha=0.6)
plt.plot([df['forecasted_demand'].min(), df['forecasted_demand'].max()],
         [df['forecasted_demand'].min(), df['forecasted_demand'].max()], 'r--', label="Perfect Prediction")

plt.title(f"Forecasted vs Actual Demand (MAE: {mae:.2f})")
plt.xlabel("Forecasted Demand")
plt.ylabel("Actual Demand")
plt.legend()
plt.grid(alpha=0.5)

plt.show()

📢 Key Findings:

✅ Mean Absolute Error (MAE) is 137.52, meaning that on average, the forecasted demand deviates by 137 units from actual demand.
✅ The scatter plot shows high variability, with actual demand widely dispersed around the red dashed “Perfect Prediction” line.
✅ Some predictions are fairly close to the actual values, but many data points indicate over- or under-estimations.

Summary of Business Insights & Reporting

Goal: Transform exploratory findings into actionable insights to drive business decisions.

Sales Trends Analysis : Identified fluctuations in weekly sales, highlighting seasonality and demand shifts.
Best-Selling Products Analysis : TVs, Tablets, and Fridges lead in revenue generation.
Regional Store Performance : Los Angeles, Chicago, and New York stores outperform other locations.
Promotions Impact Analysis : Sales increased by ~12% with promotions, proving their effectiveness.
Demand Forecasting Accuracy : Forecasting error (MAE: 137.52) indicates inconsistencies, suggesting room for improvement.

Key Takeaways:

Targeted marketing & promotions can further boost sales.
Inventory adjustments needed to match demand patterns.
Enhancing demand forecasting models can reduce stockouts and overstocking.

💡 Recommendations

📊 Sales & Revenue Growth Strategies

✅ Optimize Inventory for Best-Selling Products

- - TVs, Tablets, and Fridges drive the most revenue—maintain priority stock levels.
  - Use predictive analytics to prevent stockouts & excess inventory.

✅ Regional Store Strategy

- - Los Angeles, Chicago, & New York lead in sales—expand marketing efforts & premium product placements in these stores.
  - Tailor promotions regionally based on demand patterns.

✅ Price & Demand Optimization

- - Implement dynamic pricing strategies to adjust unit prices based on real-time demand trends.
  - Use customer segmentation to personalize discounts for high-value customers.

📣 Marketing & Customer Engagement

✅ Expand Promotions Based on ROI

- - Sales increased +12% when promotions were applied—strategically increase promo campaigns during peak demand periods.
  - Focus on seasonal discounts, flash sales, and loyalty-based discounts.

✅ Enhance Customer Loyalty Programs

- - Platinum & Silver customers have higher spending patterns—introduce exclusive membership perks, personalized discounts, and early access deals.
  - Use AI-driven segmentation to target high-value customers with personalized offers.

✅ Leverage Payment Preferences for More Conversions

- - Customers prefer Credit Cards & Digital Wallets—collaborate with banks & fintech companies for cashback rewards & flexible financing options.
  - Promote “Buy Now, Pay Later” (BNPL) services to encourage larger purchases.

📦 Inventory & Demand Forecasting Optimization

✅ Strengthen Demand Forecasting Models

- - Current forecasting shows an MAE of 137.52, indicating a gap in prediction accuracy.
  - Implement machine learning models to refine demand forecasts & reduce errors.

✅ Reduce Stockouts & Overstock Situations

- - Align supply chain planning with actual demand trends.
  - Use real-time monitoring to adjust reorder levels dynamically.

✅ Weather-Driven Stock Adjustments

- - Consider weather trends in inventory planning (e.g., more air conditioners in summer, heaters in winter).
  - Automate supply chain alerts based on environmental & seasonal data.

Walmart Sales & Marketing Analytics

📂 Table of Content

🔎 Introduction

📋 Project Overview

📊 Summary of the Dataset

🔧 Phase 1: Data Cleaning & Preparation

Importing Libraries

Loading the Dataset & Basic Data Check

Data Cleaning

Validating and Ensuring Data Consistency

Summary of Data Cleaning & Preparation

📈 Phase 2: Exploratory Data Analysis (EDA)

Data Distribution

Outlier Detection & Treatment

Customer Analysis

Business Recommendations:

📈 Phase 3: Business Insights & Reporting

Sales Trends Analysis

Best-Selling Products Analysis

Regional Store Performance Analysis

Promotions Impact Analysis

Demand Forecasting Accuracy Analysis

Summary of Business Insights & Reporting

💡 Recommendations

Quick Links

Get In Touch

Walmart Sales & Marketing Analytics​

📂 Table of Content

🔎 Introduction​

📋 Project Overview​

📊 Summary of the Dataset​

🔧 Phase 1: Data Cleaning & Preparation

Importing Libraries​

Loading the Dataset & Basic Data Check​

Data Cleaning

Validating and Ensuring Data Consistency

Summary of Data Cleaning & Preparation

📈 Phase 2: Exploratory Data Analysis (EDA)

Data Distribution

Outlier Detection & Treatment

Customer Analysis

Business Recommendations:

📈 Phase 3: Business Insights & Reporting

Sales Trends Analysis

Best-Selling Products Analysis

Regional Store Performance Analysis

Promotions Impact Analysis

Demand Forecasting Accuracy Analysis

Summary of Business Insights & Reporting

💡 Recommendations

Quick Links

Get In Touch

Walmart Sales & Marketing Analytics

🔎 Introduction

📋 Project Overview

📊 Summary of the Dataset

Importing Libraries

Loading the Dataset & Basic Data Check