From https://www.kaggle.com/datasets/loveall/clicks-conversion-tracking :
The data used in this project is from an anonymous organisation’s social media ad campaign. The data file can be downloaded from here. The file conversion_data.csv contains 1143 observations in 11 variables. Below are the descriptions of the variables.
1.) ad_id: an unique ID for each ad.
2.) xyz_campaign_id: an ID associated with each ad campaign of XYZ company.
3.) fb_campaign_id: an ID associated with how Facebook tracks each campaign.
4.) age: age of the person to whom the ad is shown.
5.) gender: gender of the person to whim the add is shown
6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).
7.) Impressions: the number of times the ad was shown.
8.) Clicks: number of clicks on for that ad.
9.) Spent: Amount paid by company xyz to Facebook, to show that ad.
10.) Total conversion: Total number of people who enquired about the product after seeing the ad.
11.) Approved conversion: Total number of people who bought the product after seeing the ad.
Inspected dataset in Excel and found no cleaning necessary
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import os
It's a good practice to verify and set working directories and helps to make code more accessible to those trying to implement in other tools.
print(os.getcwd())
path = "F:\[Personal]\Data Analytics Portfolio\Marketing Conversion Analysis\sales_conversion"
os.chdir(path)
print(os.getcwd())
C:\Users\Tyler F:\[Personal]\Data Analytics Portfolio\Marketing Conversion Analysis\sales_conversion
df = pd.read_csv(r'F:\[Personal]\Data Analytics Portfolio\Marketing Conversion Analysis\sales_conversion\KAG_conversion_data.csv')
Inspecting data types and high level information about dataset.
df.dtypes
ad_id int64 xyz_campaign_id int64 fb_campaign_id int64 age object gender object interest int64 Impressions int64 Clicks int64 Spent float64 Total_Conversion int64 Approved_Conversion int64 dtype: object
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1143 entries, 0 to 1142 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ad_id 1143 non-null int64 1 xyz_campaign_id 1143 non-null int64 2 fb_campaign_id 1143 non-null int64 3 age 1143 non-null object 4 gender 1143 non-null object 5 interest 1143 non-null int64 6 Impressions 1143 non-null int64 7 Clicks 1143 non-null int64 8 Spent 1143 non-null float64 9 Total_Conversion 1143 non-null int64 10 Approved_Conversion 1143 non-null int64 dtypes: float64(1), int64(8), object(2) memory usage: 98.4+ KB
df.describe()
| ad_id | xyz_campaign_id | fb_campaign_id | interest | Impressions | Clicks | Spent | Total_Conversion | Approved_Conversion | |
|---|---|---|---|---|---|---|---|---|---|
| count | 1.143000e+03 | 1143.000000 | 1143.000000 | 1143.000000 | 1.143000e+03 | 1143.000000 | 1143.000000 | 1143.000000 | 1143.000000 |
| mean | 9.872611e+05 | 1067.382327 | 133783.989501 | 32.766404 | 1.867321e+05 | 33.390201 | 51.360656 | 2.855643 | 0.944007 |
| std | 1.939928e+05 | 121.629393 | 20500.308622 | 26.952131 | 3.127622e+05 | 56.892438 | 86.908418 | 4.483593 | 1.737708 |
| min | 7.087460e+05 | 916.000000 | 103916.000000 | 2.000000 | 8.700000e+01 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 7.776325e+05 | 936.000000 | 115716.000000 | 16.000000 | 6.503500e+03 | 1.000000 | 1.480000 | 1.000000 | 0.000000 |
| 50% | 1.121185e+06 | 1178.000000 | 144549.000000 | 25.000000 | 5.150900e+04 | 8.000000 | 12.370000 | 1.000000 | 1.000000 |
| 75% | 1.121804e+06 | 1178.000000 | 144657.500000 | 31.000000 | 2.217690e+05 | 37.500000 | 60.025000 | 3.000000 | 1.000000 |
| max | 1.314415e+06 | 1178.000000 | 179982.000000 | 114.000000 | 3.052003e+06 | 421.000000 | 639.949998 | 60.000000 | 21.000000 |
Making a dictionary to ensure that each distinct campaign is readily identifiable. We'll use this later.
camp_dict = {916: 1, 936: 2, 1178: 3}
df['camp_code'] = df['xyz_campaign_id'].map(camp_dict)
print(df['camp_code'].head())
0 1 1 1 2 1 3 1 4 1 Name: camp_code, dtype: int64
Calculating "subscribers", defined as those who ended up paying in some form.
subscribers = df[df['Approved_Conversion'] > 0]['ad_id'].nunique()
total = df['ad_id'].nunique()
conv_rate = subscribers / total
print(round(conv_rate*100, 2), '%')
51.09 %
Exploring our paid conversion rate.
# Calculate Paid Conversion Rate
paid_conv = df[df['Approved_Conversion'] > 0]['Approved_Conversion'].nunique()
unpaid_conv = df[df['Total_Conversion'] > 0]['Total_Conversion'].nunique()
paid_conv_rate = paid_conv / unpaid_conv
print(round(paid_conv_rate*100, 2), '%')
48.39 %
Observe if there are differences in gender across the demographic.
men = df[df['gender'] == 'M']['gender'].nunique()
women = df[df['gender'] == 'F']['gender'].nunique()
total_gender = df['gender'].nunique()
men_rate = men / total_gender
print(round(men_rate*100, 2), '%')
women_rate = women / total_gender
print(round(women_rate*100, 2), '%')
50.0 % 50.0 %
Exploring several interesting plots below to get a better feel for the data.
sns.countplot(data = df, x = 'age', hue = 'gender')
plt.title('Demographics by User Count - M/F')
plt.xlabel('Age')
plt.ylabel('# Ad Recipients')
plt.show()
sns.countplot(data = df, x = 'Clicks')
plt.title('Clicks by User Count')
plt.xlabel('Click Count')
plt.ylabel('# Ad Recipients')
# Setting arbitrary tick marks to unclutter x-axis. Rotate xticks because there are so many
plt.xticks([0, 50, 100, 150, 200],rotation = 60)
plt.show()
sns.countplot(data = df, x = 'Total_Conversion')
plt.title('Unpaid Conversions (Enquiries) by User Count')
plt.xlabel('Unpaid Conversions')
plt.ylabel('# Ad Recipients')
# Rotate xticks because there are so many
plt.xticks(rotation = 60)
plt.show()
sns.countplot(data = df, x = 'Approved_Conversion')
plt.title('Paid Conversions (Clicks) by User Count')
plt.xlabel('Paid Conversions')
plt.ylabel('# Ad Recipients')
plt.show()
# Recall the code from above:
# camp_dict = {916: 1, 936: 2, 1178: 3}
# df['camp_code'] = df['xyz_campaign_id'].map(camp_dict)
# print(df['camp_code'].head())
Use the Approved Conversion and Total Conversion columns to calculate a new column for paid conversions and then group by campaigns and aggregate by mean.
# Assess subscriber rate (ie. paid users from total users with ads shown)
df['sub_rate'] = df['Approved_Conversion']/total
# Recall that total is all users who received ads
camp_sub_rates = df.groupby('camp_code')['sub_rate'].mean()
print(camp_sub_rates.head())
camp_code 1 0.000389 2 0.000345 3 0.001221 Name: sub_rate, dtype: float64
# How many users end up making enquiries from the ads shown?
df['enq_rate'] = df['Total_Conversion']/total
camp_enq_rates = df.groupby('camp_code')['enq_rate'].mean()
print(camp_enq_rates.head())
camp_code 1 0.000940 2 0.001013 3 0.003736 Name: enq_rate, dtype: float64
# How many users end up making clicks on the ad?
df['click_rate'] = df['Clicks']/total
camp_click_rates = df.groupby('camp_code')['click_rate'].mean()
print(camp_click_rates.head())
camp_code 1 0.001831 2 0.003741 3 0.050489 Name: click_rate, dtype: float64
# From clicks, how many convert to enquiries?
# Clicks will have divide by zero errors unless we remove them in some way.
df['Clicks_NZ'] = df[df['Clicks'] > 0]['Clicks']
df['enq_conv_rate'] = df['Total_Conversion']/df['Clicks_NZ']
camp_click_enq_rates = df.groupby('camp_code')['enq_conv_rate'].mean()
print(camp_click_enq_rates.head())
camp_code 1 0.678163 2 0.552590 3 0.152334 Name: enq_conv_rate, dtype: float64
# From enquiries, how many convert to paid?
df['paid_conv_rate'] = df['Approved_Conversion']/df['Total_Conversion']
camp_paid_rates = df.groupby('camp_code')['paid_conv_rate'].mean()
print(camp_paid_rates.head())
camp_code 1 0.416667 2 0.347355 3 0.326204 Name: paid_conv_rate, dtype: float64
sns.barplot(data = df, x = 'camp_code', y = 'Spent')
plt.title('Customer Spend by Campaign')
plt.xlabel('Campaign Number')
plt.ylabel('Amount Spent')
plt.show()
The results show that campaign 3 is the clear winner in terms of effectiveness.
Campaign 3 is better at generating clicks, enquiries, and paid customers.
Interestingly, the first campaign was strongest at converting both clicks into enquiries and enquiries into paid customers but this is where we can see how important the sheer reach and initial click-through-rate of marketing campaigns is with campaign 3's success being driven by overwhelmingly better spend and baseline conversion metrics.
That said, in the end, cash is king, but it is important to analyze the individual components of campaigns because the strengths of the other two campaigns may indicate that there is yet room for improvement in campaign 3 despite its success.