Azure Data Pipeline
Churn Analysis & Power BI Visualization
Home / Projects – Azure Data Pipeline for Churn Analysis & Power BI Visualization

📂 Table of Content

🔎 Introduction​

This project represents a full end-to-end data pipeline that ingests customer churn data, processes it in Azure SQL, and visualizes insights using Power BI.

 

📌 Problem Statement:

Customer churn is a critical challenge for subscription-based businesses. Understanding why customers leave and identifying patterns in churn behavior can help businesses reduce churn, enhance customer retention, and increase revenue.

 

🔎 Objective:

  • Extract and process customer data from Azure Data Lake Storage
  • Load and transform the data into Azure SQL Database
  • Conduct exploratory data analysis (EDA) and statistical insights
  • Develop interactive Power BI dashboards to track churn trends and key customer attributes

📋 Project Overview​

  • Data Source: CSV file (Customer Churn Data)
  • Storage: Azure Data Lake Storage Gen2
  • Processing: Azure Data Factory (ETL process)
  • Database: Azure SQL Database
  • Data Analysis: Query execution via VS Code
  • Visualization: Power BI Dashboard with live SQL connectivity

🔒 Key Features

  • Automated Data Pipeline → Ingests raw churn data from Data Lake to Azure SQL.
  • Data Transformation & Cleaning → Handles missing values, data types, and duplicates.
  • Query Execution & Insights → Runs SQL queries in VS Code for data analysis.
  • Seamless Power BI Integration → Real-time visualization & business insights.
  • Scalable & Secure → Uses cloud-based resources with firewall and authentication setup.

🛠 Tools Used

  • Azure Data Factory – ETL Pipeline
  • Azure Data Lake Storage – Cloud Storage
  • Azure SQL Database – Data Processing & Storage
  • VS Code (MSSQL Extension) – Query Execution
  • Power BI – Visualization & Reporting

📊 Summary of the Dataset​

🔑 Key Dataset Attributes

    • Total Records: Number of customer transactions or records in the dataset.
    • Total Columns: 30+ features related to customer demographics, services, and churn.
    • Data Coverage: Includes customer subscription data, services used, billing, and churn details.

👤 Customer Demographics

    • Customer ID: Unique identifier for each customer.
    • Gender & Age: Customer’s demographic details.
    • Marital Status: Whether the customer is married or not.
    • State: The location of the customer.

📡 Service & Subscription Details

    • Number of Referrals: Number of customers referred by the user.
    • Tenure in Months: Duration of customer subscription.
    • Value Deal: Special deals or discounts availed by customers.
    • Phone & Internet Services: Includes details on phone service, multiple lines, and internet plans.
    • Internet Type: Fiber optic, DSL, or other types of internet services.
    • Additional Services: Includes online security, device protection, premium support, streaming services, and unlimited data.

💳 Billing & Payment Information

    • Contract Type: Monthly, yearly, or other contract agreements.
    • Paperless Billing: Indicates if the customer opts for digital invoicing.
    • Payment Method: Different payment types like credit card, electronic check, or bank transfer.
    • Monthly & Total Charges: Amount paid monthly and overall billing history.
    • Total Refunds & Extra Charges: Records of refunds, extra data charges, and long-distance call costs.

📉 Churn & Customer Retention

    • Customer Status: Indicates whether a customer is active, churned, or inactive.
    • Churn Category: The reason for customer churn, such as service issues, pricing, or competition.
    • Churn Reason: Specific reasons why customers decided to leave, helping in retention analysis.

🔑 Key questions

❓ Customer Demographics & Segmentation

      1. What is the distribution of customers by gender and age group?
      2. Does marital status impact customer churn?
      3. Are certain states experiencing a higher churn rate than others?
      4. How does the number of referrals correlate with customer retention?

❓ Subscription & Service Analysis

      1. What is the most common internet type among churned customers?
      2. Are customers with phone service more likely to retain than those with only internet services?
      3. How does having multiple lines impact churn rates?
      4. Do customers with premium support have lower churn rates compared to those without?

❓ Churn Behavior & Reasons

      1. What are the top churn reasons among customers?
      2. Which contract type (e.g., monthly, annual) is most associated with churn?
      3. Do customers with paperless billing churn at a higher or lower rate than others?
      4. How many customers left due to service-related reasons vs. pricing-related reasons?

❓ Financial & Revenue Impact

      1. What is the average monthly charge of churned vs. non-churned customers?
      2. How does total revenue compare between retained and churned customers?
      3. Do customers with higher total charges tend to stay longer?
      4. Are there any patterns in total refunds for churned customers?
      5. How much revenue was lost due to churned customers’ total long-distance charges?

❓ Predictive Insights & Retention Opportunities

      1. Can we predict churn based on tenure in months?
      2. Do customers with unlimited data plans churn less often?
      3. Are there specific combinations of services that lead to lower churn rates?

⚙ Phase 1: Extract, Load & Connect (ETL Process)

ETL & Data Integration Flow (chart): Azure Data Factory & SQL

In this phase, data is extracted from an external source (CSV), loaded into Azure Data Lake Storage, processed using Azure Data Factory, and finally stored in Azure SQL Database for further analysis.


    • Data Extraction: Uploading raw CSV to Azure Data Lake Storage Gen2
    • Data Transformation: Using Azure Data Factory to clean and map schemas
    • Data Loading: Ingesting data into Azure SQL Database
    • Connectivity: Establishing SQL Server connections for querying

📊 Phase 2: Exploratory & Statistical Data Analysis

Data Cleaning & Preprocessing
  • Identify & handle missing values
  • Convert data types 
  • Remove duplicates 
  • Standardize categorical values 
  • Save as SQL Table (Direct to Azure SQL)
				
					# Check for missing values
missing_values = df.isnull().sum()

# Display only columns with missing values
missing_values = missing_values[missing_values > 0]
print(missing_values)

				
			
				
					value_deal       3297
internet_type    1223
dtype: int64
				
			
				
					# Fill missing values
df.loc[df['internet_service'] == 'No', 'internet_type'] = 'No Internet'
df['internet_type'].fillna('Unknown', inplace=True)  # Fill remaining missing values
df['value_deal'].fillna('No Deal', inplace=True)
				
			
				
					# Convert value_deal to numeric format
import numpy as np
# Convert 'No Deal' to 0 and extract numbers from 'Deal X'
df['value_deal'] = df['value_deal'].replace('No Deal', '0')  # Replace 'No Deal' with '0'
df['value_deal'] = df['value_deal'].str.extract('(\d+)').fillna(0).astype(int)  # Extract numbers

df['value_deal'].unique()
				
			
				
					array([0, 5, 4, 1, 2, 3])
				
			
				
					# Convert Yes/No columns to Boolean (True/False)
bool_columns = [
    'phone_service', 'multiple_lines', 'internet_service', 'online_security', 
    'online_backup', 'device_protection_plan', 'premium_support', 'streaming_tv', 
    'streaming_movies', 'streaming_music', 'unlimited_data', 'paperless_billing'
]

df[bool_columns] = df[bool_columns].applymap(lambda x: True if x == 'Yes' else False)

# Convert categorical columns to 'category' type
category_columns = [
    'gender', 'married', 'state', 'contract', 'payment_method', 
    'customer_status', 'churn_category', 'churn_reason'
]

df[category_columns] = df[category_columns].astype('category')

df.dtypes
				
			
				
					customer_id                      object
gender                         category
age                               int64
married                        category
state                          category
number_of_referrals               int64
tenure_in_months                  int64
value_deal                        int64
phone_service                      bool
multiple_lines                     bool
internet_service                   bool
internet_type                    object
online_security                    bool
online_backup                      bool
device_protection_plan             bool
premium_support                    bool
streaming_tv                       bool
streaming_movies                   bool
streaming_music                    bool
unlimited_data                     bool
contract                       category
paperless_billing                  bool
payment_method                 category
monthly_charge                  float64
total_charges                   float64
total_refunds                   float64
total_extra_data_charges          int64
total_long_distance_charges     float64
total_revenue                   float64
customer_status                category
churn_category                 category
churn_reason                   category
dtype: object
				
			
				
					# Check for duplicate rows
duplicate_rows = df.duplicated().sum()
print(f"Total duplicate rows: {duplicate_rows}")
				
			
				
					Total duplicate rows: 0
				
			
				
					'''Standardize categorical values '''

# Identify categorical columns (excluding customer_id)
categorical_columns = df.select_dtypes(include=['category']).columns.tolist()

# Display unique values in categorical columns
for col in categorical_columns:
    print(f"Unique values in '{col}':\n", df[col].unique(), "\n")

# Replace occurrences of "Others" with "Other"
df['churn_category'] = df['churn_category'].str.replace('Others', 'Other')
				
			
				
					''' Save as SQL Table (Direct to Azure SQL)'''

!pip install pyodbc
from sqlalchemy import create_engine

# Replace with your Azure SQL connection details
server = "churndata2025.database.windows.net"
database = "churndb"
username = "adminuser"
password = "wMjian12"
driver = "ODBC Driver 17 for SQL Server"

# Create connection string
engine = create_engine(f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver}")

# Save to Azure SQL Table
df.to_sql("CustomerChurn_Cleaned", engine, if_exists="replace", index=False)
				
			
SQL Queries
				
					-- 1. What is the distribution of customers by gender and age group?
SELECT 
    gender,
    CASE 
        WHEN age < 25 THEN '18-24'
        WHEN age BETWEEN 25 AND 34 THEN '25-34'
        WHEN age BETWEEN 35 AND 44 THEN '35-44'
        WHEN age BETWEEN 45 AND 54 THEN '45-54'
        WHEN age BETWEEN 55 AND 64 THEN '55-64'
        ELSE '65+'
    END AS age_group,
    COUNT(*) AS total_customers
FROM CustomerChurn_Cleaned
GROUP BY gender, 
    CASE 
        WHEN age < 25 THEN '18-24'
        WHEN age BETWEEN 25 AND 34 THEN '25-34'
        WHEN age BETWEEN 35 AND 44 THEN '35-44'
        WHEN age BETWEEN 45 AND 54 THEN '45-54'
        WHEN age BETWEEN 55 AND 64 THEN '55-64'
        ELSE '65+'
    END
ORDER BY age_group, gender;
				
			
genderage_grouptotal_customers
Female18-24411
Male18-24134
Female25-34659
Male25-34436
Female35-44731
Male35-44411
Female45-54659
Male45-54419
Female55-64642
Male55-64405
Female65+677
Male65+423
				
					-- 2. Does marital status impact customer churn?
SELECT 
    married,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST((SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0) / COUNT(*) AS DECIMAL(5,2)) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY married;
				
			
marriedtotal_customerschurned_customerschurn_rate
Yes299584928.35
No301288329.32
				
					-- 3. Are certain states experiencing a higher churn rate than others?
SELECT 
    state,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST((SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0) / COUNT(*) AS DECIMAL(5,2)) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY state
ORDER BY churn_rate DESC
				
			
statetotal_customerschurned_customerschurn_rate
Jammu & Kashmir31318358.47
Assam1295341.09
Jharkhand1053937.14
Chhattisgarh551832.73
Delhi1193831.93
Odisha1434430.77
Puducherry381128.95
Rajasthan2467028.46
Kerala1945528.35
Bihar3088728.25
Haryana37110428.03
West Bengal3459627.83
Andhra Pradesh37010127.30
Tamil Nadu56115327.27
Uttar Pradesh58115727.02
Telangana2607026.92
Maharashtra46311524.84
Karnataka43910724.37
Punjab3227824.22
Madhya Pradesh2706524.07
Gujarat3157423.49
Uttarakhand601423.33
				
					-- 4. How does the number of referrals correlate with customer retention?
SELECT 
    number_of_referrals,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST(
            (SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0) / COUNT(*) AS DECIMAL(5,2)
    ) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY number_of_referrals
ORDER BY number_of_referrals ASC;
				
			
number_of_referralstotal_customerschurned_customerschurn_rate
037210127.15
140511027.16
240612029.56
337511129.60
435610328.93
534610730.92
638911529.56
737712132.10
836710628.88
939912531.33
1038110728.08
113559125.63
1235610028.09
1338211329.58
1437310628.42
153689626.09
				
					-- 5. What is the most common internet type among churned customers?
SELECT 
    internet_type, 
    COUNT(*) AS churned_count,
    CAST(
            (COUNT(*) * 100.0) / (SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned')
            AS DECIMAL(5,2)
    ) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY internet_type
ORDER BY churned_count DESC;

				
			
internet_typechurned_countchurn_percentage
Fiber Optic113665.59
DSL29116.80
Cable19611.32
No Internet1096.29
				
					-- 6. Are customers with phone service more likely to retain than those with only internet services?
SELECT 
    phone_service,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST(
        SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
        COUNT(*) AS DECIMAL(10,2)
    ) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY phone_service;


				
			
phone_servicetotal_customerschurned_customerschurn_rate
059016327.63
15417156928.96
				
					-- 7. How does having multiple lines impact churn rates?
SELECT 
    multiple_lines,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST(
        SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
        COUNT(*) AS DECIMAL(10,2)
    ) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY multiple_lines;



				
			
multiple_linestotal_customerschurned_customerschurn_rate
0333594928.46
1267278329.30
				
					-- 8. Do customers with premium support have lower churn rates compared to those without?
SELECT 
    premium_support,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST(
        SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
        COUNT(*) AS DECIMAL(10,2)
    ) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY premium_support;




				
			
premium_supporttotal_customerschurned_customerschurn_rate
04194144634.48
1181328615.77
				
					-- 9. What are the top churn reasons among customers?
SELECT 
    churn_reason,
    COUNT(*) AS churn_count,
    CAST(
        COUNT(*) * 100.0 / 
        (SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned') 
        AS DECIMAL(10,5)
    ) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY churn_reason
ORDER BY churn_count DESC;





				
			
churn_reasonchurn_countchurn_percentage
Competitor had better devices28916.68591
Competitor made better offer27415.81986
Attitude of support person20812.00924
Don't know1247.15935
Competitor offered more data1066.12009
Attitude of service provider935.36952
Competitor offered higher download speeds925.31178
Price too high724.15704
Product dissatisfaction714.09931
Network reliability663.81062
Long distance charges623.57968
Service dissatisfaction613.52194
Moved452.59815
Extra data charges341.96305
Limited range of services331.90531
Poor expertise of online support301.73210
Lack of affordable download/upload speed281.61663
Lack of self-service on Website271.55889
Poor expertise of phone support120.69284
Deceased50.28868
				
					-- 10. Which contract type (e.g., monthly, annual) is most associated with churn?
SELECT 
    contract,
    COUNT(*) AS churn_count,
    CAST(
        COUNT(*) * 100.0 / 
        (SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned') 
        AS DECIMAL(10,5)
    ) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY contract
ORDER BY churn_count DESC;






				
			
contractchurn_countchurn_percentage
Month-to-Month152988.27945
One Year1569.00693
Two Year472.71363
				
					-- 11. Do customers with paperless billing churn at a higher or lower rate than others?
SELECT 
    paperless_billing,
    COUNT(*) AS churn_count,
    CAST(
        COUNT(*) * 100.0 / 
        (SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned') 
        AS DECIMAL(10,5)
    ) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY paperless_billing
ORDER BY churn_percentage DESC;







				
			
paperless_billingchurn_countchurn_percentage
1129274.59584
044025.40416
				
					-- 12. How many customers left due to service-related reasons vs. pricing-related reasons?
SELECT 
    churn_category,
    COUNT(*) AS churn_count,
    CAST(
        COUNT(*) * 100.0 / 
        (SELECT COUNT(*) FROM CustomerChurn_Cleaned WHERE customer_status = 'Churned') 
        AS DECIMAL(10,5)
    ) AS churn_percentage
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned'
GROUP BY churn_category
ORDER BY churn_percentage DESC;








				
			
churn_categorychurn_countchurn_percentage
Competitor76143.93764
Attitude30117.37875
Dissatisfaction30017.32102
Price19611.31640
Other17410.04619
				
					-- 13. What is the average monthly charge of churned vs. non-churned customers?
SELECT 
    customer_status,
    CAST(AVG(monthly_charge) AS DECIMAL(8,2)) AS avg_monthly_charge
FROM CustomerChurn_Cleaned
GROUP BY customer_status;









				
			
customer_statusavg_monthly_charge
Churned73.10
Stayed61.84
				
					-- 14. How does total revenue compare between retained and churned customers?
SELECT 
    customer_status,
    CAST(SUM(total_revenue) AS DECIMAL(10,2)) AS total_revenue
FROM CustomerChurn_Cleaned
GROUP BY customer_status;










				
			
customer_statustotal_revenue
Churned3411960.58
Stayed16010148.27
				
					-- 15. Do customers with higher total charges tend to stay longer?
SELECT 
    CASE 
        WHEN total_charges < 1000 THEN 'Low (<$1000)'
        WHEN total_charges BETWEEN 1000 AND 3000 THEN 'Medium ($1000-$3000)'
        ELSE 'High (>$3000)'
    END AS charge_category,
    AVG(tenure_in_months) AS avg_tenure,
    COUNT(*) AS customer_count
FROM CustomerChurn_Cleaned
GROUP BY 
    CASE 
        WHEN total_charges < 1000 THEN 'Low (<$1000)'
        WHEN total_charges BETWEEN 1000 AND 3000 THEN 'Medium ($1000-$3000)'
        ELSE 'High (>$3000)'
    END
ORDER BY avg_tenure DESC;











				
			
charge_categoryavg_tenurecustomer_count
Low (<$1000)172228
High (>$3000)172009
Medium ($1000-$3000)171770
				
					-- 16. Are there any patterns in total refunds for churned customers?
SELECT 
    customer_status,
    AVG(total_refunds) AS avg_refunds,
    MAX(total_refunds) AS max_refunds,
    SUM(total_refunds) AS total_refunds
FROM CustomerChurn_Cleaned
GROUP BY customer_status;
				
			
customer_statusavg_refundsmax_refundstotal_refunds
Churned1.4649.572532.03
Stayed2.2749.799713.91
				
					-- 17. How much revenue was lost due to churned customers' total long-distance charges?
SELECT 
    SUM(total_long_distance_charges) AS lost_revenue_from_long_distance
FROM CustomerChurn_Cleaned
WHERE customer_status = 'Churned';













				
			
lost_revenue_from_long_distance
744689.81
				
					-- 18. Can we predict churn based on tenure in months?
SELECT 
    CASE 
        WHEN tenure_in_months < 12 THEN 'Short-Term (<1 year)'
        WHEN tenure_in_months BETWEEN 12 AND 24 THEN 'Mid-Term (1-2 years)'
        ELSE 'Long-Term (>2 years)'
    END AS tenure_category,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST(
        SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
        COUNT(*) AS DECIMAL(10,5)
    ) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY 
    CASE 
        WHEN tenure_in_months < 12 THEN 'Short-Term (<1 year)'
        WHEN tenure_in_months BETWEEN 12 AND 24 THEN 'Mid-Term (1-2 years)'
        ELSE 'Long-Term (>2 years)'
    END
ORDER BY churn_rate DESC;














				
			
tenure_categorytotal_customerschurned_customerschurn_rate
Long-Term (>2 years)183453229.00763
Short-Term (<1 year)219063128.81279
Mid-Term (1-2 years)198356928.69390
				
					-- 19. Do customers with unlimited data plans churn less often?
SELECT 
    unlimited_data,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST(
        SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
        COUNT(*) AS DECIMAL(10,5)
    ) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY unlimited_data
ORDER BY churn_rate DESC;

				
			
unlimited_datatotal_customerschurned_customerschurn_rate
14134138733.55104
0187334518.41965
				
					-- 20. Are there specific combinations of services that lead to lower churn rates?
SELECT 
    phone_service, 
    internet_service, 
    multiple_lines,
    COUNT(*) AS total_customers,
    SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) AS churned_customers,
    CAST(
        SUM(CASE WHEN customer_status = 'Churned' THEN 1 ELSE 0 END) * 100.0 /
        COUNT(*) AS DECIMAL(10,5)
    ) AS churn_rate
FROM CustomerChurn_Cleaned
GROUP BY phone_service, internet_service, multiple_lines
ORDER BY churn_rate ASC;


				
			
phone_serviceinternet_servicemultiple_linestotal_customerschurned_customerschurn_rate
101305103.27869
1009189910.78431
01059016327.62712
111236777332.65737
110182768737.60263
Key Insights & Findings:

Our customer churn analysis revealed key demographic, service-related, financial, and behavioral factors influencing churn.

🔎  Key Findings:

    • High-risk groups include unmarried customers, Fiber Optic users, and those on month-to-month contracts.
    • Regional concerns exist, with Jammu & Kashmir (58.47% churn) and Assam (41.09%) showing the highest rates.
    • Competitor influence is strong, with better devices (16.69%) and better offers (15.82%) being the top churn reasons.
    • Premium support reduces churn significantly (15.77% vs. 34.48%), making it a key retention tool.
    • High monthly charges ($74 vs. $60) are linked to higher churn, suggesting affordability concerns.
    • Bundled services & security features improve retention, emphasizing cross-selling opportunities.

🔧 Actionable Strategies:

    • Retention Focus: Target high-risk groups with discounted long-term contracts, bundled services, and premium support.
    • Regional Intervention: Improve service quality and support in high-churn states.
    • Competitive Adjustments: Enhance pricing models, promotions, and device offerings.
    • Customer Engagement: Strengthen customer service, offer unlimited data, and promote streaming services.

By implementing these strategies, we can reduce churn, improve customer satisfaction, and maximize revenue retention.

Load Data
				
					from azure.storage.blob import BlobServiceClient
import pandas as pd
import io

# 🔹 Security + networking → Shared access signature (SAS) → Blob/Container/Object → Generate SAS and connection string
STORAGE_ACCOUNT_NAME = "churndataanalysisstorage"
CONTAINER_NAME = "churndata-analysis"
BLOB_NAME = "ChurnData.csv"
SAS_TOKEN = "sv=2022-11-02&ss=bfqt&srt=co&sp=rwdlacupyx&se=2026-02-33209:31:41Z&st=2025-03-07T01:31:41Z&spr=https&sig=nzGQvp%2FwvujSsEaHkzXUfcdT3ONGwJ3dNhGMy82kxbI%3D"  # Get this from Azure Portal

# 🔹 Create the Blob Service Client
blob_service_client = BlobServiceClient(
    account_url=f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net",
    credential=SAS_TOKEN
)

# 🔹 Get the blob client
blob_client = blob_service_client.get_blob_client(container=CONTAINER_NAME, blob=BLOB_NAME)

# 🔹 Download & Read as CSV
blob_data = blob_client.download_blob().readall()
df = pd.read_csv(io.BytesIO(blob_data))

df.head()

				
			
				
					from azure.storage.blob import BlobServiceClient
import pandas as pd
import io

# 🔹 Security + networking → Shared access signature (SAS) → Blob/Container/Object → Generate SAS and connection string
STORAGE_ACCOUNT_NAME = "churndataanalysisstorage"
CONTAINER_NAME = "churndata-analysis"
BLOB_NAME = "ChurnData.csv"
SAS_TOKEN = "sv=2022-11-02&ss=bfqt&srt=co&sp=rwdlacupyx&se=2026-02-33209:31:41Z&st=2025-03-07T01:31:41Z&spr=https&sig=nzGQvp%2FwvujSsEaHkzXUfcdT3ONGwJ3dNhGMy82kxbI%3D"  # Get this from Azure Portal

# 🔹 Create the Blob Service Client
blob_service_client = BlobServiceClient(
    account_url=f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net",
    credential=SAS_TOKEN
)

# 🔹 Get the blob client
blob_client = blob_service_client.get_blob_client(container=CONTAINER_NAME, blob=BLOB_NAME)

# 🔹 Download & Read as CSV
blob_data = blob_client.download_blob().readall()
df = pd.read_csv(io.BytesIO(blob_data))

df

				
			
  • Load the data from Azure Blob Storage into a Pandas DataFrame in Python.
customer_idgenderagemarriedstatenumber_of_referralstenure_in_monthsvalue_dealphone_servicemultiple_linesinternet_serviceinternet_typeonline_securityonline_backupdevice_protection_planpremium_supportstreaming_tvstreaming_moviesstreaming_musicunlimited_datacontractpaperless_billingpayment_methodmonthly_chargetotal_chargestotal_refundstotal_extra_data_chargestotal_long_distance_chargestotal_revenuecustomer_statuschurn_categorychurn_reason
19877-DELMale35NoDelhi727YesNoYesCableNoYesNoYesYesNoNoYesOne YearYesCredit Card65.6593.30.00381.51974.81StayedOthersOthers
58353-MAHFemale45YesMaharashtra1413YesYesYesCableNoNoNoNoNoYesYesNoMonth-to-MonthNoCredit Card-4.0542.438.331096.21610.28StayedOthersOthers
25063-WESMale51NoWest Bengal435Deal 5YesNoYesFiber OpticNoNoYesNoNoNoNoYesMonth-to-MonthYesBank Withdrawal73.9280.850.00134.6415.45ChurnedCompetitorCompetitor had better devices
59787-KARMale79NoKarnataka321Deal 4YesNoYesFiber OpticNoYesYesNoYesYesNoYesMonth-to-MonthYesBank Withdrawal98.01237.850.00361.661599.51ChurnedDissatisfactionProduct dissatisfaction
28544-TAMFemale80NoTamil Nadu38YesNoYesFiber OpticNoNoNoYesYesNoNoYesMonth-to-MonthYesCredit Card83.9267.40.0022.14289.54ChurnedDissatisfactionNetwork reliability

📈 Phase 3: Visualization & Dashboarding (Power BI)