You've received an analytical task from an international online store. Your predecessor failed to complete it: they launched an A/B test and then quit (to start a watermelon farm in Brazil). They left only the technical specifications and the test results.
recommender_system_test
product_page
event), product cart views (product_cart
) and purchases (purchase
). At each of the stage of the funnel product_page → product_cart → purchase
, there will be at least a 10% increase.This is an A/B test, the purpose of which is to test changes related to the introduction of an improved recommendation system.
import pandas as pd
import numpy as np
from numpy import median
from scipy import stats as st
import math as mth
import datetime as dt
from datetime import datetime, timedelta
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import plotly.express as px
from plotly import graph_objects as go
import sys
import warnings
if not sys.warnoptions:
warnings.simplefilter("ignore")
#import the files
location = '/some/local/path'
try:
calendar = pd.read_csv(location + 'ab_project_marketing_events_us.csv')
events = pd.read_csv(location + 'final_ab_events_us.csv')
users = pd.read_csv(location + 'final_ab_new_users_us.csv')
participants = pd.read_csv(location + 'final_ab_participants_us.csv')
except:
calendar = pd.read_csv('/datasets/ab_project_marketing_events_us.csv')
events = pd.read_csv('/datasets/final_ab_events_us.csv')
users = pd.read_csv('/datasets/final_ab_new_users_us.csv')
participants = pd.read_csv('/datasets/final_ab_participants_us.csv')
def data_study(data):
"""This function runs methods to study the data and prints out the results.
The input argument is a dataset.
"""
print(data.info(memory_usage='deep'))
print("")
print("Head:")
display(data.head())
print("")
print("Tail:")
display(data.tail())
print("")
print("Check for Nulls:")
display(data.isnull().sum())
print("")
print("Check for 0:")
for c in data.columns:
print(c , len(data[data[c] == 0]))
print("")
print('Number of duplicated rows:', data.duplicated().sum())
data_study(calendar)
# as calender has only 14 entries we can look at the entire dataset
calendar
We can see there are no missing values or duplicated rows in this dataset.
There are 2 columns that need to be converted to dt.
calendar['start_dt'] = pd.to_datetime(calendar['start_dt'])
calendar['finish_dt'] = pd.to_datetime(calendar['finish_dt'])
calendar.info(memory_usage='deep')
We can see that converting lowered memory usage from 4.0KB to 2.4KB.
As our AB test deals with EU users, we can create a subset of EU promotions.
calendar_eu = calendar[calendar['regions'].str.contains('EU', regex=False) == True]
calendar_eu
data_study(users)
No missimg values or duplicated rows here.
#convert first_date to dt type
users['first_date'] = pd.to_datetime(users['first_date'])
#change date column name to make it easier to distinguish from other dates
users = users.rename(columns={'first_date':'join_date'})
Next I will check the values in region and device columns. Both seem like columns that might contain duplicates and can be eventually converted to category type to optimize the data.
#check values in region column
users.region.value_counts()
#check values in device column
users.device.value_counts()
#convert region and device to category type
users['region'] = users['region'].astype('category')
users['device'] = users['device'].astype('category')
users.info(memory_usage='deep')
After data type conversion the memory usage dropped from 15.4MB to 4.9MB
#check there are no repeat user_ids
users.user_id.nunique()
The number of unique user ids is similar to the number of entries, which means there are no repeat ids.
#check the dates for anomalies
users.join_date.describe()
len(users[users['join_date'] > '2020-12-21'])
In this dataset we have users from all regions (we only need EU) and 8.5% of the data are users that joined after 21/12/20.
At the moment I will leave those users since they will be dropped in following steps.
In addition, this indicates that the test dicription written by the previous analyst is not accurate.
data_study(events)
events.event_name.value_counts()
The missing values in details column are to be expected since there are values only for purchase events.
Adding the count of purchase events (62,740) to the count of NaN values (377,577) gives use total of entries in the dataset.
This data is unnecessary for the analysis so the missing values can be left untouched.
# convert event_dt to dt type
events['event_dt'] = pd.to_datetime(events['event_dt'])
events.info(memory_usage='deep')
Conversion to datetime type reduced memory usage from 93.4MB to 64.8MB.
I decided not to convert event_name to category type, has it makes it harder to plot graph.
events.event_dt.describe()
events.user_id.nunique()
As with the users dataset, we have more data than necessary for the test. The unnecessary data will be dropped in future steps.
data_study(participants)
participants.ab_test.value_counts()
From the description of the test I can understand that there was a mistake in the test name. The correct test is 'interface_eu_test' and not 'recommender_system_test' as stated in the description.
#making a list of all test participants
test_participants = participants[participants['ab_test'] == 'interface_eu_test']['user_id']
#checking if the participants from EU
users[users.user_id.isin(list(test_participants))]['region'].unique()
#subset of test events
test_events = events[events.user_id.isin(list(test_participants))].reset_index(drop=True)
test_events.head()
#now I can add the group column and drop unnecessary columns
test_events = test_events.merge(participants, how='inner', on='user_id').drop(columns=['details','ab_test'])
test_events.head()
test_events.info(memory_usage='deep')
Go back to the Table of Contents
groups = test_events.groupby('group').agg({'user_id':'nunique'})
groups
users_by_group = test_events.groupby('user_id').agg({'group':'nunique'}).reset_index()
users_by_group.head()
users_by_group.query('group > 1').count()
There are 441 users that were recruited into both groups. These users should be dropped.
drop_list = users_by_group.query('group > 1')['user_id'].tolist()
test_events = test_events.drop(test_events[test_events.user_id.isin(drop_list)].index).reset_index(drop=True)
test_events.info()
This eliminated 5,562 events from the log. Normally, 6.33% would be too much data to discard, but for the purpose of A/B testing we have no other choice. We can't have participants in the test that got both options of the system.
The fact that we got so many users in both groups might indicate that there was a bug in the recruitment process.
group_clean = test_events.groupby('group').agg({'user_id':'nunique'})
group_clean
After removing those users we are left with less participants than description requirs (6,000).
Since we know the description is inaccurate, there is no reason not to examine the results anyway.
We'll start with a simple funnel devided by test groups:
Group A:
groupA = test_events.query('group == "A"')
groupB = test_events.query('group == "B"')
groupA_funnel = groupA.groupby('event_name').agg({'user_id':'nunique'}).sort_values(by='user_id', ascending=False).reset_index()
groupA_funnel
groupA_funnel = groupA_funnel.reindex([0, 1, 3, 2])
groupA_funnel
fig = go.Figure(go.Funnel(
y = ['Login','Product page','Product cart','Purchase'],
x = groupA_funnel['user_id'],
textinfo='label + text + percent initial + percent previous', hoverinfo='skip'
))
fig.update_layout(title="Simple event funnel for group A")
fig.show()
Group B:
groupB_funnel = groupB.groupby('event_name').agg({'user_id':'nunique'}).sort_values(by='user_id', ascending=False).reset_index()
groupB_funnel
fig = go.Figure(go.Funnel(
y = ['Login','Product page','Product cart','Purchase'],
x = groupB_funnel['user_id'],
textinfo='label + text + percent initial + percent previous', hoverinfo='skip'
))
fig.update_layout(title="Simple event funnel for group B")
fig.show()
These funnels are very interesting, as they show a difference in funnel stages between the two test groups.
While group B has the expected funnel of: login --> product_page --> product_cart --> purchase, for group A the last two steps are reversed. Group A users first make a purchase and then visit the cart. The site might have a "buy now" button that allows users to skip the cart.
To look in further, I will plot a sequential funnel for each group separately.
Group A
usersA = groupA.pivot_table(
index='user_id',
columns='event_name',
values='event_dt',
aggfunc='min')
usersA.head(20)
step_1_A = ~usersA['login'].isna()
step_2_A = step_1_A & (usersA['product_page'] >= usersA['login'])
step_3_A = step_2_A & (usersA['product_cart'] >= usersA['product_page'])
step_4_A = step_3_A & (usersA['purchase'] >= usersA['product_cart'])
n_logins_A = usersA[step_1_A].shape[0]
n_product_A = usersA[step_2_A].shape[0]
n_checkout_A = usersA[step_3_A].shape[0]
n_payment_A = usersA[step_4_A].shape[0]
print('Logins:', n_logins_A)
print('Product page:', n_product_A)
print('Cart:', n_checkout_A)
print('Paid:', n_payment_A)
fig = go.Figure(go.Funnel(
y = ['Login','Product Page','Cart','Payment'],
x = [n_logins_A,n_product_A,n_checkout_A,n_payment_A],
textinfo='label + text + percent initial + percent previous', hoverinfo='skip'
))
fig.update_layout(title="Sequential funnel for test group A")
fig.show()
Group B
usersB = groupB.pivot_table(
index='user_id',
columns='event_name',
values='event_dt',
aggfunc='min')
usersB.head(20)
step_1_B = ~usersB['login'].isna()
step_2_B = step_1_B & (usersB['product_page'] >= usersB['login'])
step_3_B = step_2_B & (usersB['product_cart'] >= usersB['product_page'])
step_4_B = step_3_B & (usersB['purchase'] >= usersB['product_cart'])
n_logins_B = usersB[step_1_B].shape[0]
n_product_B = usersB[step_2_B].shape[0]
n_checkout_B = usersB[step_3_B].shape[0]
n_payment_B = usersB[step_4_B].shape[0]
print('Logins:', n_logins_B)
print('Product page:', n_product_B)
print('Cart:', n_checkout_B)
print('Paid:', n_payment_B)
fig = go.Figure(go.Funnel(
y = ['Login','Product Page','Cart','Payment'],
x = [n_logins_B,n_product_B,n_checkout_B,n_payment_B],
textinfo='label + text + percent initial + percent previous', hoverinfo='skip'
))
fig.update_layout(title="Sequential funnel for test group B")
fig.show()
It is also very interesting to see that the sequential funnel charts for the two groups are almost the same (there are very minor differences).
We can see that the the sequential funnel for the 2 groups is the same while the simple funnel shows a difference in the sequence.
Only 2% of users complete a purchase going through all 4 stages of the sequence in this particulare order, while 32-32% of the test users complete a purchase on the site.
This teaches us that the app has shortcuts that allow users to skip certine events. For example, skipping the cart (checkout) and going straight to payment.
Note: While looking at the pivot tables for the sequential funnel I noticed that there are some irregularities in the event timestamps. Events that should happen in a sequence have the same timestamp. I tried to handle this by using >= when calculating the funnel values. Nevertheless, those irregularities shold be kept in mind when looking at the data.
Go back to the Table of Contents
Is the number of events per user distributed equally among the samples?
events_per_userA = groupA.groupby('user_id').agg({'event_name':'count'}).reset_index()
print('Average number of events per user in group A:', round(events_per_userA['event_name'].mean(), 2))
events_per_userB = groupB.groupby('user_id').agg({'event_name':'count'}).reset_index()
print('Average number of events per user in group B:', round(events_per_userB['event_name'].mean(), 2))
There is a small difference between the samples. It seems that users in the control group visit the site slightly more.
#histogram by date and time
plt.figure(figsize=(10,5))
test_events['event_dt'].hist(bins=150, xrot=60, color='royalblue');
plt.title('Distribution of events');
plt.xlabel('Dates');
plt.ylabel('Number of events');
There is a gradual rise in event amout until Dec 13th. This makes sense- this is the recruitment period.
After that, the events reach a peak at around Dec 21.
According to the EU events calendar, the site launched the Christmas&New Year Promo on Dec 25th. We should expect a rise in events after that date. Nevertheless, the amount of events drops drastically (back to recruitment period numbers).
This can be explained by holiday season- people don't actually shop on the holiday itself. The promo should have started before Christmas and not on the day of.
calendar_eu
Go back to the Table of Contents
Are there any peculiarities in the data that you have to take into account before starting the A/B test?
There are several things to notice:
#creating a pivot table by groups and events
pivot = test_events.pivot_table(index='event_name', values='user_id', columns='group', aggfunc=lambda x: x.nunique()).reset_index()
pivot
I will define a function that runs the z-test and can be used to test all events.
#defining a function that runs the z-test
def check_hypothesis(event, alpha=0.05):
"""This function runs a z-test to check for statistical difference between the proportions.
The input argumentsare event name to test and alpha level.
"""
successes1 = pivot[pivot.event_name == event]['A'].iloc[0]
successes2 = pivot[pivot.event_name == event]['B'].iloc[0]
trials1 = test_events[test_events.group == 'A']['user_id'].nunique()
trials2 = test_events[test_events.group == 'B']['user_id'].nunique()
p1 = successes1/trials1
p2 = successes2/trials2
p_combined = (successes1 + successes2) / (trials1 + trials2)
difference = p1 - p2
z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))
distr = st.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print('p-value: ', p_value)
if (p_value < alpha):
print("Reject H0 for", event)
else:
print("Fail to Reject H0 for", event)
print('')
Now we can run the z-test on each event:
H0 - There is a no significant difference between the proportions of the control group and the test group for checked event.
H1 - There is a significant difference between the proportions of the control group and the test group for checked event.
The significance level is set to 5% as normally used in the business industry.
for i in pivot.event_name.unique():
check_hypothesis(i, alpha=0.05)
There is no statistically significant difference between the two groups for login, product page visits and product cart visits.
There is, however, a difference in purchase. Group A participants made more purchases.
The test was not successful for several reasons:
Despite the above, I notice to following:
To conclude: The test should be run again in January (after the holiday season ends) and after technical issues are fixed.