Azure Data Pipeline
Churn Analysis & Power BI Visualization
🔎 Introduction
This project represents a full end-to-end data pipeline that ingests customer churn data, processes it in Azure SQL, and visualizes insights using Power BI.
📌 Problem Statement:
Customer churn is a critical challenge for subscription-based businesses. Understanding why customers leave and identifying patterns in churn behavior can help businesses reduce churn, enhance customer retention, and increase revenue.
🔎 Objective:
- Extract and process customer data from Azure Data Lake Storage
- Load and transform the data into Azure SQL Database
- Conduct exploratory data analysis (EDA) and statistical insights
- Develop interactive Power BI dashboards to track churn trends and key customer attributes
📋 Project Overview
- Data Source: CSV file (Customer Churn Data)
- Storage: Azure Data Lake Storage Gen2
- Processing: Azure Data Factory (ETL process)
- Database: Azure SQL Database
- Data Analysis: Query execution via VS Code
- Visualization: Power BI Dashboard with live SQL connectivity
🔒 Key Features
- Automated Data Pipeline → Ingests raw churn data from Data Lake to Azure SQL.
- Data Transformation & Cleaning → Handles missing values, data types, and duplicates.
- Query Execution & Insights → Runs SQL queries in VS Code for data analysis.
- Seamless Power BI Integration → Real-time visualization & business insights.
- Scalable & Secure → Uses cloud-based resources with firewall and authentication setup.
🛠 Tools Used
- Azure Data Factory – ETL Pipeline
- Azure Data Lake Storage – Cloud Storage
- Azure SQL Database – Data Processing & Storage
- VS Code (MSSQL Extension) – Query Execution
- Power BI – Visualization & Reporting
📊 Summary of the Dataset
🔑 Key Dataset Attributes
- Total Records: Number of customer transactions or records in the dataset.
- Total Columns: 30+ features related to customer demographics, services, and churn.
- Data Coverage: Includes customer subscription data, services used, billing, and churn details.
👤 Customer Demographics
- Customer ID: Unique identifier for each customer.
- Gender & Age: Customer’s demographic details.
- Marital Status: Whether the customer is married or not.
- State: The location of the customer.
📡 Service & Subscription Details
- Number of Referrals: Number of customers referred by the user.
- Tenure in Months: Duration of customer subscription.
- Value Deal: Special deals or discounts availed by customers.
- Phone & Internet Services: Includes details on phone service, multiple lines, and internet plans.
- Internet Type: Fiber optic, DSL, or other types of internet services.
- Additional Services: Includes online security, device protection, premium support, streaming services, and unlimited data.
💳 Billing & Payment Information
- Contract Type: Monthly, yearly, or other contract agreements.
- Paperless Billing: Indicates if the customer opts for digital invoicing.
- Payment Method: Different payment types like credit card, electronic check, or bank transfer.
- Monthly & Total Charges: Amount paid monthly and overall billing history.
- Total Refunds & Extra Charges: Records of refunds, extra data charges, and long-distance call costs.
📉 Churn & Customer Retention
- Customer Status: Indicates whether a customer is active, churned, or inactive.
- Churn Category: The reason for customer churn, such as service issues, pricing, or competition.
- Churn Reason: Specific reasons why customers decided to leave, helping in retention analysis.
🔑 Key questions
Customer Demographics & Segmentation
- What is the distribution of customers by gender and age group?
- Does marital status impact customer churn?
- Are certain states experiencing a higher churn rate than others?
- How does the number of referrals correlate with customer retention?
Subscription & Service Analysis
- What is the most common internet type among churned customers?
- Are customers with phone service more likely to retain than those with only internet services?
- How does having multiple lines impact churn rates?
- Do customers with premium support have lower churn rates compared to those without?
Churn Behavior & Reasons
- What are the top churn reasons among customers?
- Which contract type (e.g., monthly, annual) is most associated with churn?
- Do customers with paperless billing churn at a higher or lower rate than others?
- How many customers left due to service-related reasons vs. pricing-related reasons?
Financial & Revenue Impact
- What is the average monthly charge of churned vs. non-churned customers?
- How does total revenue compare between retained and churned customers?
- Do customers with higher total charges tend to stay longer?
- Are there any patterns in total refunds for churned customers?
- How much revenue was lost due to churned customers’ total long-distance charges?
Predictive Insights & Retention Opportunities
- Can we predict churn based on tenure in months?
- Do customers with unlimited data plans churn less often?
- Are there specific combinations of services that lead to lower churn rates?
⚙ Phase 1: Extract, Load & Connect (ETL Process)
ETL & Data Integration Flow (chart): Azure Data Factory & SQL
In this phase, data is extracted from an external source (CSV), loaded into Azure Data Lake Storage, processed using Azure Data Factory, and finally stored in Azure SQL Database for further analysis.
-
- Data Extraction: Uploading raw CSV to Azure Data Lake Storage Gen2
- Data Transformation: Using Azure Data Factory to clean and map schemas
- Data Loading: Ingesting data into Azure SQL Database
- Connectivity: Establishing SQL Server connections for querying
📊 Phase 2: Exploratory & Statistical Data Analysis
Data Cleaning & Preprocessing
- Identify & handle missing values
- Convert data types
- Remove duplicates
- Standardize categorical values
- Save as SQL Table (Direct to Azure SQL)
# Check for missing values
missing_values = df.isnull().sum()
# Display only columns with missing values
missing_values = missing_values[missing_values > 0]
print(missing_values)
value_deal 3297
internet_type 1223
dtype: int64
# Fill missing values
df.loc[df['internet_service'] == 'No', 'internet_type'] = 'No Internet'
df['internet_type'].fillna('Unknown', inplace=True) # Fill remaining missing values
df['value_deal'].fillna('No Deal', inplace=True)
# Convert value_deal to numeric format
import numpy as np
# Convert 'No Deal' to 0 and extract numbers from 'Deal X'
df['value_deal'] = df['value_deal'].replace('No Deal', '0') # Replace 'No Deal' with '0'
df['value_deal'] = df['value_deal'].str.extract('(\d+)').fillna(0).astype(int) # Extract numbers
df['value_deal'].unique()
array([0, 5, 4, 1, 2, 3])
# Convert Yes/No columns to Boolean (True/False)
bool_columns = [
'phone_service', 'multiple_lines', 'internet_service', 'online_security',
'online_backup', 'device_protection_plan', 'premium_support', 'streaming_tv',
'streaming_movies', 'streaming_music', 'unlimited_data', 'paperless_billing'
]
df[bool_columns] = df[bool_columns].applymap(lambda x: True if x == 'Yes' else False)
# Convert categorical columns to 'category' type
category_columns = [
'gender', 'married', 'state', 'contract', 'payment_method',
'customer_status', 'churn_category', 'churn_reason'
]
df[category_columns] = df[category_columns].astype('category')
df.dtypes
customer_id object
gender category
age int64
married category
state category
number_of_referrals int64
tenure_in_months int64
value_deal int64
phone_service bool
multiple_lines bool
internet_service bool
internet_type object
online_security bool
online_backup bool
device_protection_plan bool
premium_support bool
streaming_tv bool
streaming_movies bool
streaming_music bool
unlimited_data bool
contract category
paperless_billing bool
payment_method category
monthly_charge float64
total_charges float64
total_refunds float64
total_extra_data_charges int64
total_long_distance_charges float64
total_revenue float64
customer_status category
churn_category category
churn_reason category
dtype: object
# Check for duplicate rows
duplicate_rows = df.duplicated().sum()
print(f"Total duplicate rows: {duplicate_rows}")
Total duplicate rows: 0
'''Standardize categorical values '''
# Identify categorical columns (excluding customer_id)
categorical_columns = df.select_dtypes(include=['category']).columns.tolist()
# Display unique values in categorical columns
for col in categorical_columns:
print(f"Unique values in '{col}':\n", df[col].unique(), "\n")
# Replace occurrences of "Others" with "Other"
df['churn_category'] = df['churn_category'].str.replace('Others', 'Other')
''' Save as SQL Table (Direct to Azure SQL)'''
!pip install pyodbc
from sqlalchemy import create_engine
# Replace with your Azure SQL connection details
server = "churndata2025.database.windows.net"
database = "churndb"
username = "adminuser"
password = "wMjian12"
driver = "ODBC Driver 17 for SQL Server"
# Create connection string
engine = create_engine(f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver}")
# Save to Azure SQL Table
df.to_sql("CustomerChurn_Cleaned", engine, if_exists="replace", index=False)
SQL Queries
-- 1. What is the distribution of customers by gender and age group?
SELECT
gender,
CASE
WHEN age < 25 THEN '18-24'
WHEN age BETWEEN 25 AND 34 THEN '25-34'
WHEN age BETWEEN 35 AND 44 THEN '35-44'
WHEN age BETWEEN 45 AND 54 THEN '45-54'
WHEN age BETWEEN 55 AND 64 THEN '55-64'
ELSE '65+'
END AS age_group,
COUNT(*) AS total_customers
FROM CustomerChurn_Cleaned
GROUP BY gender,
CASE
WHEN age < 25 THEN '18-24'
WHEN age BETWEEN 25 AND 34 THEN '25-34'
WHEN age BETWEEN 35 AND 44 THEN '35-44'
WHEN age BETWEEN 45 AND 54 THEN '45-54'
WHEN age BETWEEN 55 AND 64 THEN '55-64'
ELSE '65+'
END
ORDER BY age_group, gender;
gender | age_group | total_customers |
---|---|---|
Female | 18-24 | 411 |
Male | 18-24 | 134 |
Female | 25-34 | 659 |
Male | 25-34 | 436 |
Female | 35-44 | 731 |
Male | 35-44 | 411 |
Female | 45-54 | 659 |
Male | 45-54 | 419 |
Female | 55-64 | 642 |
Male | 55-64 | 405 |
Female | 65+ | 677 |
Male | 65+ | 423 |
-- 2. Does marital status impact customer churn?
SELECT
married,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST((SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0) / COUNT(*) AS DECIMAL(5,2)) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY married;
married | total_customers | churned_customers | churn_rate |
---|---|---|---|
Yes | 2995 | 849 | 28.35 |
No | 3012 | 883 | 29.32 |
-- 3. Are certain states experiencing a higher churn rate than others?
SELECT
state,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST((SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0) / COUNT(*) AS DECIMAL(5,2)) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY state
ORDER BY churn_rate DESC
state | total_customers | churned_customers | churn_rate |
---|---|---|---|
Jammu & Kashmir | 313 | 183 | 58.47 |
Assam | 129 | 53 | 41.09 |
Jharkhand | 105 | 39 | 37.14 |
Chhattisgarh | 55 | 18 | 32.73 |
Delhi | 119 | 38 | 31.93 |
Odisha | 143 | 44 | 30.77 |
Puducherry | 38 | 11 | 28.95 |
Rajasthan | 246 | 70 | 28.46 |
Kerala | 194 | 55 | 28.35 |
Bihar | 308 | 87 | 28.25 |
Haryana | 371 | 104 | 28.03 |
West Bengal | 345 | 96 | 27.83 |
Andhra Pradesh | 370 | 101 | 27.30 |
Tamil Nadu | 561 | 153 | 27.27 |
Uttar Pradesh | 581 | 157 | 27.02 |
Telangana | 260 | 70 | 26.92 |
Maharashtra | 463 | 115 | 24.84 |
Karnataka | 439 | 107 | 24.37 |
Punjab | 322 | 78 | 24.22 |
Madhya Pradesh | 270 | 65 | 24.07 |
Gujarat | 315 | 74 | 23.49 |
Uttarakhand | 60 | 14 | 23.33 |
-- 4. How does the number of referrals correlate with customer retention?
SELECT
number_of_referrals,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST(
(SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0) / COUNT(*) AS DECIMAL(5,2)
) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY number_of_referrals
ORDER BY number_of_referrals ASC;
number_of_referrals | total_customers | churned_customers | churn_rate |
---|---|---|---|
0 | 372 | 101 | 27.15 |
1 | 405 | 110 | 27.16 |
2 | 406 | 120 | 29.56 |
3 | 375 | 111 | 29.60 |
4 | 356 | 103 | 28.93 |
5 | 346 | 107 | 30.92 |
6 | 389 | 115 | 29.56 |
7 | 377 | 121 | 32.10 |
8 | 367 | 106 | 28.88 |
9 | 399 | 125 | 31.33 |
10 | 381 | 107 | 28.08 |
11 | 355 | 91 | 25.63 |
12 | 356 | 100 | 28.09 |
13 | 382 | 113 | 29.58 |
14 | 373 | 106 | 28.42 |
15 | 368 | 96 | 26.09 |
-- 5. What is the most common internet type among churned customers?
SELECT
internet_type,
COUNT(*) AS churned_count,
CAST(
(COUNT(*) * 100.0) / (SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned')
AS DECIMAL(5,2)
) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY internet_type
ORDER BY churned_count DESC;
internet_type | churned_count | churn_percentage |
---|---|---|
Fiber Optic | 1136 | 65.59 |
DSL | 291 | 16.80 |
Cable | 196 | 11.32 |
No Internet | 109 | 6.29 |
-- 6. Are customers with phone service more likely to retain than those with only internet services?
SELECT
phone_service,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST(
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
COUNT(*) AS DECIMAL(10,2)
) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY phone_service;
phone_service | total_customers | churned_customers | churn_rate |
---|---|---|---|
0 | 590 | 163 | 27.63 |
1 | 5417 | 1569 | 28.96 |
-- 7. How does having multiple lines impact churn rates?
SELECT
multiple_lines,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST(
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
COUNT(*) AS DECIMAL(10,2)
) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY multiple_lines;
multiple_lines | total_customers | churned_customers | churn_rate |
---|---|---|---|
0 | 3335 | 949 | 28.46 |
1 | 2672 | 783 | 29.30 |
-- 8. Do customers with premium support have lower churn rates compared to those without?
SELECT
premium_support,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST(
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
COUNT(*) AS DECIMAL(10,2)
) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY premium_support;
premium_support | total_customers | churned_customers | churn_rate |
---|---|---|---|
0 | 4194 | 1446 | 34.48 |
1 | 1813 | 286 | 15.77 |
-- 9. What are the top churn reasons among customers?
SELECT
churn_reason,
COUNT(*) AS churn_count,
CAST(
COUNT(*) * 100.0 /
(SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned')
AS DECIMAL(10,5)
) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY churn_reason
ORDER BY churn_count DESC;
churn_reason | churn_count | churn_percentage |
---|---|---|
Competitor had better devices | 289 | 16.68591 |
Competitor made better offer | 274 | 15.81986 |
Attitude of support person | 208 | 12.00924 |
Don't know | 124 | 7.15935 |
Competitor offered more data | 106 | 6.12009 |
Attitude of service provider | 93 | 5.36952 |
Competitor offered higher download speeds | 92 | 5.31178 |
Price too high | 72 | 4.15704 |
Product dissatisfaction | 71 | 4.09931 |
Network reliability | 66 | 3.81062 |
Long distance charges | 62 | 3.57968 |
Service dissatisfaction | 61 | 3.52194 |
Moved | 45 | 2.59815 |
Extra data charges | 34 | 1.96305 |
Limited range of services | 33 | 1.90531 |
Poor expertise of online support | 30 | 1.73210 |
Lack of affordable download/upload speed | 28 | 1.61663 |
Lack of self-service on Website | 27 | 1.55889 |
Poor expertise of phone support | 12 | 0.69284 |
Deceased | 5 | 0.28868 |
-- 10. Which contract type (e.g., monthly, annual) is most associated with churn?
SELECT
contract,
COUNT(*) AS churn_count,
CAST(
COUNT(*) * 100.0 /
(SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned')
AS DECIMAL(10,5)
) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY contract
ORDER BY churn_count DESC;
contract | churn_count | churn_percentage |
---|---|---|
Month-to-Month | 1529 | 88.27945 |
One Year | 156 | 9.00693 |
Two Year | 47 | 2.71363 |
-- 11. Do customers with paperless billing churn at a higher or lower rate than others?
SELECT
paperless_billing,
COUNT(*) AS churn_count,
CAST(
COUNT(*) * 100.0 /
(SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned')
AS DECIMAL(10,5)
) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY paperless_billing
ORDER BY churn_percentage DESC;
paperless_billing | churn_count | churn_percentage |
---|---|---|
1 | 1292 | 74.59584 |
0 | 440 | 25.40416 |
-- 12. How many customers left due to service-related reasons vs. pricing-related reasons?
SELECT
churn_category,
COUNT(*) AS churn_count,
CAST(
COUNT(*) * 100.0 /
(SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned')
AS DECIMAL(10,5)
) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY churn_category
ORDER BY churn_percentage DESC;
churn_category | churn_count | churn_percentage |
---|---|---|
Competitor | 761 | 43.93764 |
Attitude | 301 | 17.37875 |
Dissatisfaction | 300 | 17.32102 |
Price | 196 | 11.31640 |
Other | 174 | 10.04619 |
-- 13. What is the average monthly charge of churned vs. non-churned customers?
SELECT
customer_status,
CAST(AVG(monthly_charge) AS DECIMAL(8,2)) AS avg_monthly_charge
FROM CustomerChurn_Cleaned
GROUP BY customer_status;
customer_status | avg_monthly_charge |
---|---|
Churned | 73.10 |
Stayed | 61.84 |
-- 14. How does total revenue compare between retained and churned customers?
SELECT
customer_status,
CAST(SUM(total_revenue) AS DECIMAL(10,2)) AS total_revenue
FROM CustomerChurn_Cleaned
GROUP BY customer_status;
customer_status | total_revenue |
---|---|
Churned | 3411960.58 |
Stayed | 16010148.27 |
-- 15. Do customers with higher total charges tend to stay longer?
SELECT
CASE
WHEN total_charges < 1000 THEN 'Low (<$1000)'
WHEN total_charges BETWEEN 1000 AND 3000 THEN 'Medium ($1000-$3000)'
ELSE 'High (>$3000)'
END AS charge_category,
AVG(tenure_in_months) AS avg_tenure,
COUNT(*) AS customer_count
FROM CustomerChurn_Cleaned
GROUP BY
CASE
WHEN total_charges < 1000 THEN 'Low (<$1000)'
WHEN total_charges BETWEEN 1000 AND 3000 THEN 'Medium ($1000-$3000)'
ELSE 'High (>$3000)'
END
ORDER BY avg_tenure DESC;
charge_category | avg_tenure | customer_count |
---|---|---|
Low (<$1000) | 17 | 2228 |
High (>$3000) | 17 | 2009 |
Medium ($1000-$3000) | 17 | 1770 |
-- 16. Are there any patterns in total refunds for churned customers?
SELECT
customer_status,
AVG(total_refunds) AS avg_refunds,
MAX(total_refunds) AS max_refunds,
SUM(total_refunds) AS total_refunds
FROM CustomerChurn_Cleaned
GROUP BY customer_status;
customer_status | avg_refunds | max_refunds | total_refunds |
---|---|---|---|
Churned | 1.46 | 49.57 | 2532.03 |
Stayed | 2.27 | 49.79 | 9713.91 |
-- 17. How much revenue was lost due to churned customers' total long-distance charges?
SELECT
SUM(total_long_distance_charges) AS lost_revenue_from_long_distance
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned';
lost_revenue_from_long_distance |
---|
744689.81 |
-- 18. Can we predict churn based on tenure in months?
SELECT
CASE
WHEN tenure_in_months < 12 THEN 'Short-Term (<1 year)'
WHEN tenure_in_months BETWEEN 12 AND 24 THEN 'Mid-Term (1-2 years)'
ELSE 'Long-Term (>2 years)'
END AS tenure_category,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST(
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
COUNT(*) AS DECIMAL(10,5)
) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY
CASE
WHEN tenure_in_months < 12 THEN 'Short-Term (<1 year)'
WHEN tenure_in_months BETWEEN 12 AND 24 THEN 'Mid-Term (1-2 years)'
ELSE 'Long-Term (>2 years)'
END
ORDER BY churn_rate DESC;
tenure_category | total_customers | churned_customers | churn_rate |
---|---|---|---|
Long-Term (>2 years) | 1834 | 532 | 29.00763 |
Short-Term (<1 year) | 2190 | 631 | 28.81279 |
Mid-Term (1-2 years) | 1983 | 569 | 28.69390 |
-- 19. Do customers with unlimited data plans churn less often?
SELECT
unlimited_data,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST(
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
COUNT(*) AS DECIMAL(10,5)
) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY unlimited_data
ORDER BY churn_rate DESC;
unlimited_data | total_customers | churned_customers | churn_rate |
---|---|---|---|
1 | 4134 | 1387 | 33.55104 |
0 | 1873 | 345 | 18.41965 |
-- 20. Are there specific combinations of services that lead to lower churn rates?
SELECT
phone_service,
internet_service,
multiple_lines,
COUNT(*) AS total_customers,
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
CAST(
SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
COUNT(*) AS DECIMAL(10,5)
) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY phone_service, internet_service, multiple_lines
ORDER BY churn_rate ASC;
phone_service | internet_service | multiple_lines | total_customers | churned_customers | churn_rate |
---|---|---|---|---|---|
1 | 0 | 1 | 305 | 10 | 3.27869 |
1 | 0 | 0 | 918 | 99 | 10.78431 |
0 | 1 | 0 | 590 | 163 | 27.62712 |
1 | 1 | 1 | 2367 | 773 | 32.65737 |
1 | 1 | 0 | 1827 | 687 | 37.60263 |
Key Insights & Findings:
Our customer churn analysis revealed key demographic, service-related, financial, and behavioral factors influencing churn.
🔎 Key Findings:
- High-risk groups include unmarried customers, Fiber Optic users, and those on month-to-month contracts.
- Regional concerns exist, with Jammu & Kashmir (58.47% churn) and Assam (41.09%) showing the highest rates.
- Competitor influence is strong, with better devices (16.69%) and better offers (15.82%) being the top churn reasons.
- Premium support reduces churn significantly (15.77% vs. 34.48%), making it a key retention tool.
- High monthly charges ($74 vs. $60) are linked to higher churn, suggesting affordability concerns.
- Bundled services & security features improve retention, emphasizing cross-selling opportunities.
🔧 Actionable Strategies:
- Retention Focus: Target high-risk groups with discounted long-term contracts, bundled services, and premium support.
- Regional Intervention: Improve service quality and support in high-churn states.
- Competitive Adjustments: Enhance pricing models, promotions, and device offerings.
- Customer Engagement: Strengthen customer service, offer unlimited data, and promote streaming services.
By implementing these strategies, we can reduce churn, improve customer satisfaction, and maximize revenue retention.
Load Data
from azure.storage.blob import BlobServiceClient
import pandas as pd
import io
# 🔹 Security + networking → Shared access signature (SAS) → Blob/Container/Object → Generate SAS and connection string
STORAGE_ACCOUNT_NAME = "churndataanalysisstorage"
CONTAINER_NAME = "churndata-analysis"
BLOB_NAME = "ChurnData.csv"
SAS_TOKEN = "sv=2022-11-02&ss=bfqt&srt=co&sp=rwdlacupyx&se=2026-02-33209:31:41Z&st=2025-03-07T01:31:41Z&spr=https&sig=nzGQvp%2FwvujSsEaHkzXUfcdT3ONGwJ3dNhGMy82kxbI%3D" # Get this from Azure Portal
# 🔹 Create the Blob Service Client
blob_service_client = BlobServiceClient(
account_url=f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net",
credential=SAS_TOKEN
)
# 🔹 Get the blob client
blob_client = blob_service_client.get_blob_client(container=CONTAINER_NAME, blob=BLOB_NAME)
# 🔹 Download & Read as CSV
blob_data = blob_client.download_blob().readall()
df = pd.read_csv(io.BytesIO(blob_data))
df.head()
from azure.storage.blob import BlobServiceClient
import pandas as pd
import io
# 🔹 Security + networking → Shared access signature (SAS) → Blob/Container/Object → Generate SAS and connection string
STORAGE_ACCOUNT_NAME = "churndataanalysisstorage"
CONTAINER_NAME = "churndata-analysis"
BLOB_NAME = "ChurnData.csv"
SAS_TOKEN = "sv=2022-11-02&ss=bfqt&srt=co&sp=rwdlacupyx&se=2026-02-33209:31:41Z&st=2025-03-07T01:31:41Z&spr=https&sig=nzGQvp%2FwvujSsEaHkzXUfcdT3ONGwJ3dNhGMy82kxbI%3D" # Get this from Azure Portal
# 🔹 Create the Blob Service Client
blob_service_client = BlobServiceClient(
account_url=f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net",
credential=SAS_TOKEN
)
# 🔹 Get the blob client
blob_client = blob_service_client.get_blob_client(container=CONTAINER_NAME, blob=BLOB_NAME)
# 🔹 Download & Read as CSV
blob_data = blob_client.download_blob().readall()
df = pd.read_csv(io.BytesIO(blob_data))
df
- Load the data from Azure Blob Storage into a Pandas DataFrame in Python.
customer_id | gender | age | married | state | number_of_referrals | tenure_in_months | value_deal | phone_service | multiple_lines | internet_service | internet_type | online_security | online_backup | device_protection_plan | premium_support | streaming_tv | streaming_movies | streaming_music | unlimited_data | contract | paperless_billing | payment_method | monthly_charge | total_charges | total_refunds | total_extra_data_charges | total_long_distance_charges | total_revenue | customer_status | churn_category | churn_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
19877-DEL | Male | 35 | No | Delhi | 7 | 27 | Yes | No | Yes | Cable | No | Yes | No | Yes | Yes | No | No | Yes | One Year | Yes | Credit Card | 65.6 | 593.3 | 0.0 | 0 | 381.51 | 974.81 | Stayed | Others | Others | |
58353-MAH | Female | 45 | Yes | Maharashtra | 14 | 13 | Yes | Yes | Yes | Cable | No | No | No | No | No | Yes | Yes | No | Month-to-Month | No | Credit Card | -4.0 | 542.4 | 38.33 | 10 | 96.21 | 610.28 | Stayed | Others | Others | |
25063-WES | Male | 51 | No | West Bengal | 4 | 35 | Deal 5 | Yes | No | Yes | Fiber Optic | No | No | Yes | No | No | No | No | Yes | Month-to-Month | Yes | Bank Withdrawal | 73.9 | 280.85 | 0.0 | 0 | 134.6 | 415.45 | Churned | Competitor | Competitor had better devices |
59787-KAR | Male | 79 | No | Karnataka | 3 | 21 | Deal 4 | Yes | No | Yes | Fiber Optic | No | Yes | Yes | No | Yes | Yes | No | Yes | Month-to-Month | Yes | Bank Withdrawal | 98.0 | 1237.85 | 0.0 | 0 | 361.66 | 1599.51 | Churned | Dissatisfaction | Product dissatisfaction |
28544-TAM | Female | 80 | No | Tamil Nadu | 3 | 8 | Yes | No | Yes | Fiber Optic | No | No | No | Yes | Yes | No | No | Yes | Month-to-Month | Yes | Credit Card | 83.9 | 267.4 | 0.0 | 0 | 22.14 | 289.54 | Churned | Dissatisfaction | Network reliability |