Marketing Conversion Rates Using Python¶

About Dataset¶

From https://www.kaggle.com/datasets/loveall/clicks-conversion-tracking :

The data used in this project is from an anonymous organisation’s social media ad campaign. The data file can be downloaded from here. The file conversion_data.csv contains 1143 observations in 11 variables. Below are the descriptions of the variables.

1.) ad_id: an unique ID for each ad.

2.) xyz_campaign_id: an ID associated with each ad campaign of XYZ company.

3.) fb_campaign_id: an ID associated with how Facebook tracks each campaign.

4.) age: age of the person to whom the ad is shown.

5.) gender: gender of the person to whim the add is shown

6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).

7.) Impressions: the number of times the ad was shown.

8.) Clicks: number of clicks on for that ad.

9.) Spent: Amount paid by company xyz to Facebook, to show that ad.

10.) Total conversion: Total number of people who enquired about the product after seeing the ad.

11.) Approved conversion: Total number of people who bought the product after seeing the ad.

Inspected dataset in Excel and found no cleaning necessary

Importing Modules Needed and Inspecting Data¶

In [2]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import os

It's a good practice to verify and set working directories and helps to make code more accessible to those trying to implement in other tools.

In [3]:
print(os.getcwd())
path = "F:\[Personal]\Data Analytics Portfolio\Marketing Conversion Analysis\sales_conversion"
os.chdir(path)
print(os.getcwd())
C:\Users\Tyler
F:\[Personal]\Data Analytics Portfolio\Marketing Conversion Analysis\sales_conversion
In [4]:
df = pd.read_csv(r'F:\[Personal]\Data Analytics Portfolio\Marketing Conversion Analysis\sales_conversion\KAG_conversion_data.csv')

Inspecting data types and high level information about dataset.

In [5]:
df.dtypes
Out[5]:
ad_id                    int64
xyz_campaign_id          int64
fb_campaign_id           int64
age                     object
gender                  object
interest                 int64
Impressions              int64
Clicks                   int64
Spent                  float64
Total_Conversion         int64
Approved_Conversion      int64
dtype: object
In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1143 entries, 0 to 1142
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ad_id                1143 non-null   int64  
 1   xyz_campaign_id      1143 non-null   int64  
 2   fb_campaign_id       1143 non-null   int64  
 3   age                  1143 non-null   object 
 4   gender               1143 non-null   object 
 5   interest             1143 non-null   int64  
 6   Impressions          1143 non-null   int64  
 7   Clicks               1143 non-null   int64  
 8   Spent                1143 non-null   float64
 9   Total_Conversion     1143 non-null   int64  
 10  Approved_Conversion  1143 non-null   int64  
dtypes: float64(1), int64(8), object(2)
memory usage: 98.4+ KB
In [7]:
df.describe()
Out[7]:
ad_id xyz_campaign_id fb_campaign_id interest Impressions Clicks Spent Total_Conversion Approved_Conversion
count 1.143000e+03 1143.000000 1143.000000 1143.000000 1.143000e+03 1143.000000 1143.000000 1143.000000 1143.000000
mean 9.872611e+05 1067.382327 133783.989501 32.766404 1.867321e+05 33.390201 51.360656 2.855643 0.944007
std 1.939928e+05 121.629393 20500.308622 26.952131 3.127622e+05 56.892438 86.908418 4.483593 1.737708
min 7.087460e+05 916.000000 103916.000000 2.000000 8.700000e+01 0.000000 0.000000 0.000000 0.000000
25% 7.776325e+05 936.000000 115716.000000 16.000000 6.503500e+03 1.000000 1.480000 1.000000 0.000000
50% 1.121185e+06 1178.000000 144549.000000 25.000000 5.150900e+04 8.000000 12.370000 1.000000 1.000000
75% 1.121804e+06 1178.000000 144657.500000 31.000000 2.217690e+05 37.500000 60.025000 3.000000 1.000000
max 1.314415e+06 1178.000000 179982.000000 114.000000 3.052003e+06 421.000000 639.949998 60.000000 21.000000

Setup, Data Exploration, and Plotting¶

Making a dictionary to ensure that each distinct campaign is readily identifiable. We'll use this later.

In [8]:
camp_dict = {916: 1, 936: 2, 1178: 3}
df['camp_code'] = df['xyz_campaign_id'].map(camp_dict)
print(df['camp_code'].head())
0    1
1    1
2    1
3    1
4    1
Name: camp_code, dtype: int64

Calculating "subscribers", defined as those who ended up paying in some form.

In [9]:
subscribers = df[df['Approved_Conversion'] > 0]['ad_id'].nunique()
total = df['ad_id'].nunique()
conv_rate = subscribers / total
print(round(conv_rate*100, 2), '%')
51.09 %

Exploring our paid conversion rate.

In [10]:
# Calculate Paid Conversion Rate

paid_conv = df[df['Approved_Conversion'] > 0]['Approved_Conversion'].nunique()
unpaid_conv = df[df['Total_Conversion'] > 0]['Total_Conversion'].nunique()
paid_conv_rate = paid_conv / unpaid_conv
print(round(paid_conv_rate*100, 2), '%')
48.39 %

Observe if there are differences in gender across the demographic.

In [11]:
men = df[df['gender'] == 'M']['gender'].nunique()
women = df[df['gender'] == 'F']['gender'].nunique()
total_gender = df['gender'].nunique()
men_rate = men / total_gender
print(round(men_rate*100, 2), '%')
women_rate = women / total_gender
print(round(women_rate*100, 2), '%')
50.0 %
50.0 %

Exploring several interesting plots below to get a better feel for the data.

In [12]:
sns.countplot(data = df, x = 'age', hue = 'gender')
plt.title('Demographics by User Count - M/F')
plt.xlabel('Age')
plt.ylabel('# Ad Recipients')
plt.show()
In [13]:
sns.countplot(data = df, x = 'Clicks')
plt.title('Clicks by User Count')
plt.xlabel('Click Count')
plt.ylabel('# Ad Recipients')
# Setting arbitrary tick marks to unclutter x-axis. Rotate xticks because there are so many
plt.xticks([0, 50, 100, 150, 200],rotation = 60)
plt.show()
In [14]:
sns.countplot(data = df, x = 'Total_Conversion')
plt.title('Unpaid Conversions (Enquiries) by User Count')
plt.xlabel('Unpaid Conversions')
plt.ylabel('# Ad Recipients')
# Rotate xticks because there are so many
plt.xticks(rotation = 60)
plt.show()
In [15]:
sns.countplot(data = df, x = 'Approved_Conversion')
plt.title('Paid Conversions (Clicks) by User Count')
plt.xlabel('Paid Conversions')
plt.ylabel('# Ad Recipients')
plt.show()

Assessing Success Metrics by Campaign¶

In [16]:
# Recall the code from above:

# camp_dict = {916: 1, 936: 2, 1178: 3}
# df['camp_code'] = df['xyz_campaign_id'].map(camp_dict)
# print(df['camp_code'].head())

Use the Approved Conversion and Total Conversion columns to calculate a new column for paid conversions and then group by campaigns and aggregate by mean.

In [19]:
# Assess subscriber rate (ie. paid users from total users with ads shown) 

df['sub_rate'] = df['Approved_Conversion']/total

# Recall that total is all users who received ads

camp_sub_rates = df.groupby('camp_code')['sub_rate'].mean()
print(camp_sub_rates.head())
camp_code
1    0.000389
2    0.000345
3    0.001221
Name: sub_rate, dtype: float64
In [26]:
# How many users end up making enquiries from the ads shown?

df['enq_rate'] = df['Total_Conversion']/total

camp_enq_rates = df.groupby('camp_code')['enq_rate'].mean()
print(camp_enq_rates.head())
camp_code
1    0.000940
2    0.001013
3    0.003736
Name: enq_rate, dtype: float64
In [24]:
# How many users end up making clicks on the ad?

df['click_rate'] = df['Clicks']/total

camp_click_rates = df.groupby('camp_code')['click_rate'].mean()
print(camp_click_rates.head())
camp_code
1    0.001831
2    0.003741
3    0.050489
Name: click_rate, dtype: float64
In [29]:
# From clicks, how many convert to enquiries?

# Clicks will have divide by zero errors unless we remove them in some way.

df['Clicks_NZ'] = df[df['Clicks'] > 0]['Clicks']
df['enq_conv_rate'] = df['Total_Conversion']/df['Clicks_NZ']

camp_click_enq_rates = df.groupby('camp_code')['enq_conv_rate'].mean()
print(camp_click_enq_rates.head())
camp_code
1    0.678163
2    0.552590
3    0.152334
Name: enq_conv_rate, dtype: float64
In [25]:
# From enquiries, how many convert to paid?

df['paid_conv_rate'] = df['Approved_Conversion']/df['Total_Conversion']

camp_paid_rates = df.groupby('camp_code')['paid_conv_rate'].mean()
print(camp_paid_rates.head())
camp_code
1    0.416667
2    0.347355
3    0.326204
Name: paid_conv_rate, dtype: float64

Visualizing Customer Spend by Campaign¶

In [30]:
sns.barplot(data = df, x = 'camp_code', y = 'Spent')
plt.title('Customer Spend by Campaign')
plt.xlabel('Campaign Number')
plt.ylabel('Amount Spent')
plt.show()

Results and Discussion¶

The results show that campaign 3 is the clear winner in terms of effectiveness.

Campaign 3 is better at generating clicks, enquiries, and paid customers.

Interestingly, the first campaign was strongest at converting both clicks into enquiries and enquiries into paid customers but this is where we can see how important the sheer reach and initial click-through-rate of marketing campaigns is with campaign 3's success being driven by overwhelmingly better spend and baseline conversion metrics.

That said, in the end, cash is king, but it is important to analyze the individual components of campaigns because the strengths of the other two campaigns may indicate that there is yet room for improvement in campaign 3 despite its success.