A/B Testing for online store

Task:

You've received an analytical task from an international online store. Your predecessor failed to complete it: they launched an A/B test and then quit (to start a watermelon farm in Brazil). They left only the technical specifications and the test results.

Technical description

  • Test name: recommender_system_test
  • Groups: А (control), B (new payment funnel)
  • Launch date: 2020-12-07
  • The date when they stopped taking up new users: 2020-12-21
  • End date: 2021-01-01
  • Audience: 15% of the new users from the EU region
  • Purpose of the test: testing changes related to the introduction of an improved recommendation system
  • Expected result: within 14 days of signing up, users will show better conversion into product page views (the product_page event), product cart views (product_cart) and purchases (purchase). At each of the stage of the funnel product_page → product_cart → purchase, there will be at least a 10% increase.
  • Expected number of test participants: 6000

This is an A/B test, the purpose of which is to test changes related to the introduction of an improved recommendation system.

Opening data

In [1]:
import pandas as pd 
import numpy as np
from numpy import median
from scipy import stats as st
import math as mth

import datetime as dt
from datetime import datetime, timedelta

import matplotlib as mpl
import matplotlib.pyplot as plt

import seaborn as sns

import plotly
import plotly.express as px
from plotly import graph_objects as go

import sys
import warnings
if not sys.warnoptions:    
       warnings.simplefilter("ignore")
In [2]:
#import the files

location = '/some/local/path'

try: 
    calendar = pd.read_csv(location + 'ab_project_marketing_events_us.csv')
    events = pd.read_csv(location + 'final_ab_events_us.csv')
    users = pd.read_csv(location + 'final_ab_new_users_us.csv') 
    participants = pd.read_csv(location + 'final_ab_participants_us.csv') 
    
except: 
    calendar = pd.read_csv('/datasets/ab_project_marketing_events_us.csv')
    events = pd.read_csv('/datasets/final_ab_events_us.csv')
    users = pd.read_csv('/datasets/final_ab_new_users_us.csv') 
    participants = pd.read_csv('/datasets/final_ab_participants_us.csv') 
In [3]:
def data_study(data): 
    """This function runs methods to study the data and prints out the results.

    The input argument is a dataset.  
    """
    print(data.info(memory_usage='deep'))
    print("")
    print("Head:")
    display(data.head())
    print("")
    print("Tail:")
    display(data.tail())
    print("")
    print("Check for Nulls:")
    display(data.isnull().sum()) 
    print("")
    print("Check for 0:")
    for c in data.columns: 
        print(c , len(data[data[c] == 0]))
    print("")
    print('Number of duplicated rows:', data.duplicated().sum())
In [4]:
data_study(calendar)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   name       14 non-null     object
 1   regions    14 non-null     object
 2   start_dt   14 non-null     object
 3   finish_dt  14 non-null     object
dtypes: object(4)
memory usage: 4.0 KB
None

Head:
name regions start_dt finish_dt
0 Christmas&New Year Promo EU, N.America 2020-12-25 2021-01-03
1 St. Valentine's Day Giveaway EU, CIS, APAC, N.America 2020-02-14 2020-02-16
2 St. Patric's Day Promo EU, N.America 2020-03-17 2020-03-19
3 Easter Promo EU, CIS, APAC, N.America 2020-04-12 2020-04-19
4 4th of July Promo N.America 2020-07-04 2020-07-11
Tail:
name regions start_dt finish_dt
9 Victory Day CIS (May 9th) Event CIS 2020-05-09 2020-05-11
10 CIS New Year Gift Lottery CIS 2020-12-30 2021-01-07
11 Dragon Boat Festival Giveaway APAC 2020-06-25 2020-07-01
12 Single's Day Gift Promo APAC 2020-11-11 2020-11-12
13 Chinese Moon Festival APAC 2020-10-01 2020-10-07
Check for Nulls:
name         0
regions      0
start_dt     0
finish_dt    0
dtype: int64
Check for 0:
name 0
regions 0
start_dt 0
finish_dt 0

Number of duplicated rows: 0
In [5]:
# as calender has only 14 entries we can look at the entire dataset
calendar
Out[5]:
name regions start_dt finish_dt
0 Christmas&New Year Promo EU, N.America 2020-12-25 2021-01-03
1 St. Valentine's Day Giveaway EU, CIS, APAC, N.America 2020-02-14 2020-02-16
2 St. Patric's Day Promo EU, N.America 2020-03-17 2020-03-19
3 Easter Promo EU, CIS, APAC, N.America 2020-04-12 2020-04-19
4 4th of July Promo N.America 2020-07-04 2020-07-11
5 Black Friday Ads Campaign EU, CIS, APAC, N.America 2020-11-26 2020-12-01
6 Chinese New Year Promo APAC 2020-01-25 2020-02-07
7 Labor day (May 1st) Ads Campaign EU, CIS, APAC 2020-05-01 2020-05-03
8 International Women's Day Promo EU, CIS, APAC 2020-03-08 2020-03-10
9 Victory Day CIS (May 9th) Event CIS 2020-05-09 2020-05-11
10 CIS New Year Gift Lottery CIS 2020-12-30 2021-01-07
11 Dragon Boat Festival Giveaway APAC 2020-06-25 2020-07-01
12 Single's Day Gift Promo APAC 2020-11-11 2020-11-12
13 Chinese Moon Festival APAC 2020-10-01 2020-10-07

We can see there are no missing values or duplicated rows in this dataset.
There are 2 columns that need to be converted to dt.

In [6]:
calendar['start_dt'] = pd.to_datetime(calendar['start_dt'])
calendar['finish_dt'] = pd.to_datetime(calendar['finish_dt'])
calendar.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   name       14 non-null     object        
 1   regions    14 non-null     object        
 2   start_dt   14 non-null     datetime64[ns]
 3   finish_dt  14 non-null     datetime64[ns]
dtypes: datetime64[ns](2), object(2)
memory usage: 2.4 KB

We can see that converting lowered memory usage from 4.0KB to 2.4KB.

As our AB test deals with EU users, we can create a subset of EU promotions.

In [7]:
calendar_eu = calendar[calendar['regions'].str.contains('EU', regex=False) == True]
calendar_eu
Out[7]:
name regions start_dt finish_dt
0 Christmas&New Year Promo EU, N.America 2020-12-25 2021-01-03
1 St. Valentine's Day Giveaway EU, CIS, APAC, N.America 2020-02-14 2020-02-16
2 St. Patric's Day Promo EU, N.America 2020-03-17 2020-03-19
3 Easter Promo EU, CIS, APAC, N.America 2020-04-12 2020-04-19
5 Black Friday Ads Campaign EU, CIS, APAC, N.America 2020-11-26 2020-12-01
7 Labor day (May 1st) Ads Campaign EU, CIS, APAC 2020-05-01 2020-05-03
8 International Women's Day Promo EU, CIS, APAC 2020-03-08 2020-03-10
In [8]:
data_study(users)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 61733 entries, 0 to 61732
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user_id     61733 non-null  object
 1   first_date  61733 non-null  object
 2   region      61733 non-null  object
 3   device      61733 non-null  object
dtypes: object(4)
memory usage: 15.4 MB
None

Head:
user_id first_date region device
0 D72A72121175D8BE 2020-12-07 EU PC
1 F1C668619DFE6E65 2020-12-07 N.America Android
2 2E1BF1D4C37EA01F 2020-12-07 EU PC
3 50734A22C0C63768 2020-12-07 EU iPhone
4 E1BDDCE0DAFA2679 2020-12-07 N.America iPhone
Tail:
user_id first_date region device
61728 1DB53B933257165D 2020-12-20 EU Android
61729 538643EB4527ED03 2020-12-20 EU Mac
61730 7ADEE837D5D8CBBD 2020-12-20 EU PC
61731 1C7D23927835213F 2020-12-20 EU iPhone
61732 8F04273BB2860229 2020-12-20 EU Android
Check for Nulls:
user_id       0
first_date    0
region        0
device        0
dtype: int64
Check for 0:
user_id 0
first_date 0
region 0
device 0

Number of duplicated rows: 0

No missimg values or duplicated rows here.

In [9]:
#convert first_date to dt type
users['first_date'] = pd.to_datetime(users['first_date'])
In [10]:
#change date column name to make it easier to distinguish from other dates
users = users.rename(columns={'first_date':'join_date'})

Next I will check the values in region and device columns. Both seem like columns that might contain duplicates and can be eventually converted to category type to optimize the data.

In [11]:
#check values in region column
users.region.value_counts()
Out[11]:
EU           46270
N.America     9155
CIS           3155
APAC          3153
Name: region, dtype: int64
In [12]:
#check values in device column
users.device.value_counts()
Out[12]:
Android    27520
PC         15599
iPhone     12530
Mac         6084
Name: device, dtype: int64
In [13]:
#convert region and device to category type
users['region'] = users['region'].astype('category') 
users['device'] = users['device'].astype('category') 
In [14]:
users.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 61733 entries, 0 to 61732
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   user_id    61733 non-null  object        
 1   join_date  61733 non-null  datetime64[ns]
 2   region     61733 non-null  category      
 3   device     61733 non-null  category      
dtypes: category(2), datetime64[ns](1), object(1)
memory usage: 4.9 MB

After data type conversion the memory usage dropped from 15.4MB to 4.9MB

In [15]:
#check there are no repeat user_ids
users.user_id.nunique()
Out[15]:
61733

The number of unique user ids is similar to the number of entries, which means there are no repeat ids.

In [16]:
#check the dates for anomalies
users.join_date.describe()
Out[16]:
count                   61733
unique                     17
top       2020-12-21 00:00:00
freq                     6290
first     2020-12-07 00:00:00
last      2020-12-23 00:00:00
Name: join_date, dtype: object
In [17]:
len(users[users['join_date'] > '2020-12-21'])
Out[17]:
5263

In this dataset we have users from all regions (we only need EU) and 8.5% of the data are users that joined after 21/12/20.
At the moment I will leave those users since they will be dropped in following steps.
In addition, this indicates that the test dicription written by the previous analyst is not accurate.

In [18]:
data_study(events)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 440317 entries, 0 to 440316
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   user_id     440317 non-null  object 
 1   event_dt    440317 non-null  object 
 2   event_name  440317 non-null  object 
 3   details     62740 non-null   float64
dtypes: float64(1), object(3)
memory usage: 93.4 MB
None

Head:
user_id event_dt event_name details
0 E1BDDCE0DAFA2679 2020-12-07 20:22:03 purchase 99.99
1 7B6452F081F49504 2020-12-07 09:22:53 purchase 9.99
2 9CD9F34546DF254C 2020-12-07 12:59:29 purchase 4.99
3 96F27A054B191457 2020-12-07 04:02:40 purchase 4.99
4 1FD7660FDF94CA1F 2020-12-07 10:15:09 purchase 4.99
Tail:
user_id event_dt event_name details
440312 245E85F65C358E08 2020-12-30 19:35:55 login NaN
440313 9385A108F5A0A7A7 2020-12-30 10:54:15 login NaN
440314 DB650B7559AC6EAC 2020-12-30 10:59:09 login NaN
440315 F80C9BDDEA02E53C 2020-12-30 09:53:39 login NaN
440316 7AEC61159B672CC5 2020-12-30 11:36:13 login NaN
Check for Nulls:
user_id            0
event_dt           0
event_name         0
details       377577
dtype: int64
Check for 0:
user_id 0
event_dt 0
event_name 0
details 0

Number of duplicated rows: 0
In [19]:
events.event_name.value_counts()
Out[19]:
login           189552
product_page    125563
purchase         62740
product_cart     62462
Name: event_name, dtype: int64

The missing values in details column are to be expected since there are values only for purchase events.
Adding the count of purchase events (62,740) to the count of NaN values (377,577) gives use total of entries in the dataset.
This data is unnecessary for the analysis so the missing values can be left untouched.

In [20]:
# convert event_dt to dt type
events['event_dt'] = pd.to_datetime(events['event_dt'])
In [21]:
events.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 440317 entries, 0 to 440316
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype         
---  ------      --------------   -----         
 0   user_id     440317 non-null  object        
 1   event_dt    440317 non-null  datetime64[ns]
 2   event_name  440317 non-null  object        
 3   details     62740 non-null   float64       
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 64.8 MB

Conversion to datetime type reduced memory usage from 93.4MB to 64.8MB.
I decided not to convert event_name to category type, has it makes it harder to plot graph.

In [22]:
events.event_dt.describe()
Out[22]:
count                  440317
unique                 267268
top       2020-12-23 02:37:24
freq                       10
first     2020-12-07 00:00:33
last      2020-12-30 23:36:33
Name: event_dt, dtype: object
In [23]:
events.user_id.nunique()
Out[23]:
58703

As with the users dataset, we have more data than necessary for the test. The unnecessary data will be dropped in future steps.

In [24]:
data_study(participants)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18268 entries, 0 to 18267
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   user_id  18268 non-null  object
 1   group    18268 non-null  object
 2   ab_test  18268 non-null  object
dtypes: object(3)
memory usage: 3.7 MB
None

Head:
user_id group ab_test
0 D1ABA3E2887B6A73 A recommender_system_test
1 A7A3664BD6242119 A recommender_system_test
2 DABC14FDDFADD29E A recommender_system_test
3 04988C5DF189632E A recommender_system_test
4 482F14783456D21B B recommender_system_test
Tail:
user_id group ab_test
18263 1D302F8688B91781 B interface_eu_test
18264 3DE51B726983B657 A interface_eu_test
18265 F501F79D332BE86C A interface_eu_test
18266 63FBE257B05F2245 A interface_eu_test
18267 79F9ABFB029CF724 B interface_eu_test
Check for Nulls:
user_id    0
group      0
ab_test    0
dtype: int64
Check for 0:
user_id 0
group 0
ab_test 0

Number of duplicated rows: 0
In [25]:
participants.ab_test.value_counts()
Out[25]:
interface_eu_test          11567
recommender_system_test     6701
Name: ab_test, dtype: int64

From the description of the test I can understand that there was a mistake in the test name. The correct test is 'interface_eu_test' and not 'recommender_system_test' as stated in the description.

Crerating a dataset for testing

Go back to the Table of Contents

In [26]:
#making a list of all test participants
test_participants = participants[participants['ab_test'] == 'interface_eu_test']['user_id']
In [27]:
#checking if the participants from EU
users[users.user_id.isin(list(test_participants))]['region'].unique()
Out[27]:
[EU]
Categories (1, object): [EU]
In [28]:
#subset of test events
test_events = events[events.user_id.isin(list(test_participants))].reset_index(drop=True)
test_events.head()
Out[28]:
user_id event_dt event_name details
0 96F27A054B191457 2020-12-07 04:02:40 purchase 4.99
1 A92195E3CFB83DBD 2020-12-07 00:32:07 purchase 4.99
2 354D653172FF2A2D 2020-12-07 15:45:11 purchase 4.99
3 7FCD34F47C13A9AC 2020-12-07 22:06:13 purchase 9.99
4 0313C457F07C339E 2020-12-07 13:10:48 purchase 9.99
In [29]:
#now I can add the group column and drop unnecessary columns
test_events = test_events.merge(participants, how='inner', on='user_id').drop(columns=['details','ab_test'])
test_events.head()
Out[29]:
user_id event_dt event_name group
0 96F27A054B191457 2020-12-07 04:02:40 purchase B
1 96F27A054B191457 2020-12-08 09:43:14 purchase B
2 96F27A054B191457 2020-12-09 00:44:10 purchase B
3 96F27A054B191457 2020-12-26 00:33:57 purchase B
4 96F27A054B191457 2020-12-07 04:02:41 product_page B
In [30]:
test_events.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 87821 entries, 0 to 87820
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   user_id     87821 non-null  object        
 1   event_dt    87821 non-null  datetime64[ns]
 2   event_name  87821 non-null  object        
 3   group       87821 non-null  object        
dtypes: datetime64[ns](1), object(3)
memory usage: 18.1 MB

Stage summary:

Go back to the Table of Contents

  1. Our data for the test consists of 4 datasets.
  2. In each dataset I converted the date columns to datetime type.
  3. Comparing the acctual data with the test description shows that the description is not accurate and shouls be double checked.
  4. The datasets contain more data than necessary for the test. I created a new subset that contains only the data needed going forward.

Exploratory Data Analysis

Are there users who are present in both samples?

Go back to the Table of Contents

In [31]:
groups = test_events.groupby('group').agg({'user_id':'nunique'})
groups
Out[31]:
user_id
group
A 5792
B 5499
In [32]:
users_by_group = test_events.groupby('user_id').agg({'group':'nunique'}).reset_index()
users_by_group.head()
Out[32]:
user_id group
0 0002CE61FF2C4011 1
1 001064FEAAB631A1 1
2 001E72F50D1C48FA 1
3 002412F1EB3F6E38 1
4 002540BE89C930FB 1
In [33]:
users_by_group.query('group > 1').count()
Out[33]:
user_id    441
group      441
dtype: int64

There are 441 users that were recruited into both groups. These users should be dropped.

In [34]:
drop_list = users_by_group.query('group > 1')['user_id'].tolist()

test_events = test_events.drop(test_events[test_events.user_id.isin(drop_list)].index).reset_index(drop=True)
test_events.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 82259 entries, 0 to 82258
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   user_id     82259 non-null  object        
 1   event_dt    82259 non-null  datetime64[ns]
 2   event_name  82259 non-null  object        
 3   group       82259 non-null  object        
dtypes: datetime64[ns](1), object(3)
memory usage: 2.5+ MB

This eliminated 5,562 events from the log. Normally, 6.33% would be too much data to discard, but for the purpose of A/B testing we have no other choice. We can't have participants in the test that got both options of the system.
The fact that we got so many users in both groups might indicate that there was a bug in the recruitment process.

In [35]:
group_clean = test_events.groupby('group').agg({'user_id':'nunique'})
group_clean
Out[35]:
user_id
group
A 5351
B 5058

After removing those users we are left with less participants than description requirs (6,000).
Since we know the description is inaccurate, there is no reason not to examine the results anyway.

Study conversion at different stages of the funnel

Go back to the Table of Contents

We'll start with a simple funnel devided by test groups:

Group A:

In [36]:
groupA = test_events.query('group == "A"')
groupB = test_events.query('group == "B"')
In [37]:
groupA_funnel = groupA.groupby('event_name').agg({'user_id':'nunique'}).sort_values(by='user_id', ascending=False).reset_index()
groupA_funnel
Out[37]:
event_name user_id
0 login 5350
1 product_page 3577
2 purchase 1903
3 product_cart 1726
In [38]:
groupA_funnel = groupA_funnel.reindex([0, 1, 3, 2])
groupA_funnel
Out[38]:
event_name user_id
0 login 5350
1 product_page 3577
3 product_cart 1726
2 purchase 1903
In [39]:
fig = go.Figure(go.Funnel(
    y = ['Login','Product page','Product cart','Purchase'],
    x = groupA_funnel['user_id'],
    textinfo='label + text + percent initial + percent previous', hoverinfo='skip'
    ))
fig.update_layout(title="Simple event funnel for group A")
fig.show() 

Group B:

In [40]:
groupB_funnel = groupB.groupby('event_name').agg({'user_id':'nunique'}).sort_values(by='user_id', ascending=False).reset_index()
groupB_funnel
Out[40]:
event_name user_id
0 login 5058
1 product_page 3318
2 product_cart 1716
3 purchase 1683
In [41]:
fig = go.Figure(go.Funnel(
    y = ['Login','Product page','Product cart','Purchase'],
    x = groupB_funnel['user_id'],
    textinfo='label + text + percent initial + percent previous', hoverinfo='skip'
    ))
fig.update_layout(title="Simple event funnel for group B")
fig.show() 

These funnels are very interesting, as they show a difference in funnel stages between the two test groups.
While group B has the expected funnel of: login --> product_page --> product_cart --> purchase, for group A the last two steps are reversed. Group A users first make a purchase and then visit the cart. The site might have a "buy now" button that allows users to skip the cart.

To look in further, I will plot a sequential funnel for each group separately.

Group A

In [42]:
usersA = groupA.pivot_table(
    index='user_id', 
    columns='event_name', 
    values='event_dt',
    aggfunc='min') 
usersA.head(20)
Out[42]:
event_name login product_cart product_page purchase
user_id
0002CE61FF2C4011 2020-12-07 04:37:36 2020-12-07 04:37:37 2020-12-07 04:37:42 NaT
002412F1EB3F6E38 2020-12-09 09:35:20 NaT 2020-12-09 09:35:20 NaT
002540BE89C930FB 2020-12-08 18:05:24 2020-12-08 18:05:24 2020-12-08 18:05:24 NaT
003346BB64227D0C 2020-12-17 05:24:01 NaT NaT NaT
00341D8401F0F665 2020-12-21 11:14:50 NaT NaT NaT
004C58ADE7CA8C4A 2020-12-18 15:13:33 NaT 2020-12-18 15:13:35 2020-12-18 15:13:33
0053DD654C9513D6 2020-12-12 04:35:37 NaT 2020-12-12 04:35:38 2020-12-12 04:35:37
005881C49580D705 2020-12-23 13:54:49 2020-12-23 13:54:49 NaT NaT
00807760AE4F0504 2020-12-21 09:24:41 NaT 2020-12-21 09:24:41 NaT
0093130DFBF1DC6A 2020-12-20 01:07:27 2020-12-20 01:07:27 2020-12-20 01:07:27 2020-12-20 01:07:27
00A29C54C162D2CF 2020-12-20 22:31:17 NaT NaT NaT
00B45E6226F03553 2020-12-14 06:00:44 NaT NaT NaT
00C2BA76985336C1 2020-12-12 15:56:42 NaT NaT NaT
00C3BF17FD8CBFC9 2020-12-22 15:56:59 NaT 2020-12-22 15:57:02 2020-12-22 15:56:59
00D627B7002B66CB 2020-12-17 07:03:57 NaT 2020-12-17 07:03:59 NaT
00D937C10E86138E 2020-12-15 04:14:42 NaT NaT NaT
00DCC58F2FAD6F66 2020-12-18 10:10:40 NaT NaT NaT
00E8D06BF3E5A23B 2020-12-14 21:59:06 NaT 2020-12-14 21:59:08 2020-12-14 21:59:06
00EECC8699408DD1 2020-12-09 08:24:37 2020-12-09 08:24:37 2020-12-09 08:24:37 2020-12-09 08:24:37
00EFA157F7B6E1C4 2020-12-09 14:12:14 2020-12-09 14:12:14 2020-12-09 14:12:14 NaT
In [43]:
step_1_A = ~usersA['login'].isna()
step_2_A = step_1_A & (usersA['product_page'] >= usersA['login'])
step_3_A = step_2_A & (usersA['product_cart'] >= usersA['product_page'])
step_4_A = step_3_A & (usersA['purchase'] >= usersA['product_cart'])

n_logins_A = usersA[step_1_A].shape[0]
n_product_A = usersA[step_2_A].shape[0]
n_checkout_A = usersA[step_3_A].shape[0]
n_payment_A = usersA[step_4_A].shape[0]

print('Logins:', n_logins_A)
print('Product page:', n_product_A)
print('Cart:', n_checkout_A)
print('Paid:', n_payment_A)
Logins: 5350
Product page: 3577
Cart: 820
Paid: 113
In [44]:
fig = go.Figure(go.Funnel(
    y = ['Login','Product Page','Cart','Payment'],
    x = [n_logins_A,n_product_A,n_checkout_A,n_payment_A],
    textinfo='label + text + percent initial + percent previous', hoverinfo='skip'
    ))
fig.update_layout(title="Sequential funnel for test group A")
fig.show() 

Group B

In [45]:
usersB = groupB.pivot_table(
    index='user_id', 
    columns='event_name', 
    values='event_dt',
    aggfunc='min') 
usersB.head(20)
Out[45]:
event_name login product_cart product_page purchase
user_id
001064FEAAB631A1 2020-12-20 14:43:27 NaT 2020-12-20 14:43:28 NaT
001E72F50D1C48FA 2020-12-17 15:44:05 NaT 2020-12-17 15:44:06 NaT
0031F1B5E9FBF708 2020-12-14 00:47:10 NaT NaT 2020-12-14 00:47:10
0036BE15EE4D319D 2020-12-21 01:56:05 NaT NaT NaT
003F86A34B575D27 2020-12-22 16:14:09 2020-12-22 16:14:09 NaT NaT
0050F43F34C955F4 2020-12-13 19:41:56 2020-12-13 19:41:59 2020-12-13 19:41:57 2020-12-13 19:41:56
0069105CBE9CA8DC 2020-12-07 17:50:08 NaT 2020-12-07 17:50:09 NaT
00722905AE9E2441 2020-12-13 08:39:06 NaT 2020-12-13 08:39:07 NaT
0074BEB9856FBC87 2020-12-13 13:02:23 2020-12-13 13:02:23 2020-12-13 13:02:23 NaT
0082A350AF0647FA 2020-12-22 21:31:54 NaT 2020-12-22 21:31:56 2020-12-22 21:31:54
009F1C78D679CDEC 2020-12-11 14:20:56 2020-12-11 14:20:56 2020-12-11 14:20:57 2020-12-11 14:20:56
00C8947A03ECB797 2020-12-21 17:02:08 NaT 2020-12-21 17:02:10 NaT
00C8FDF85B779D88 2020-12-13 03:03:02 NaT 2020-12-13 03:03:02 NaT
0107365328F9C3DE 2020-12-14 02:25:31 NaT 2020-12-14 02:25:31 NaT
01260F7CCAF2357F 2020-12-20 09:11:58 2020-12-20 09:12:00 2020-12-20 09:12:00 NaT
01292EDFD9CCF3FC 2020-12-21 15:09:37 NaT 2020-12-21 15:09:39 NaT
012E1FBA4CBBDD8C 2020-12-22 07:22:22 2020-12-22 07:22:22 2020-12-22 07:22:23 NaT
012EE9FDC4BCCC06 2020-12-07 22:08:03 NaT 2020-12-07 22:08:04 NaT
01503216FE61EE4D 2020-12-14 04:37:57 NaT NaT NaT
0156BAE5D3A05F3B 2020-12-10 21:00:46 2020-12-10 21:00:49 NaT NaT
In [46]:
step_1_B = ~usersB['login'].isna()
step_2_B = step_1_B & (usersB['product_page'] >= usersB['login'])
step_3_B = step_2_B & (usersB['product_cart'] >= usersB['product_page'])
step_4_B = step_3_B & (usersB['purchase'] >= usersB['product_cart'])

n_logins_B = usersB[step_1_B].shape[0]
n_product_B = usersB[step_2_B].shape[0]
n_checkout_B = usersB[step_3_B].shape[0]
n_payment_B = usersB[step_4_B].shape[0]

print('Logins:', n_logins_B)
print('Product page:', n_product_B)
print('Cart:', n_checkout_B)
print('Paid:', n_payment_B)
Logins: 5058
Product page: 3318
Cart: 805
Paid: 98
In [47]:
fig = go.Figure(go.Funnel(
    y = ['Login','Product Page','Cart','Payment'],
    x = [n_logins_B,n_product_B,n_checkout_B,n_payment_B],
    textinfo='label + text + percent initial + percent previous', hoverinfo='skip'
    ))
fig.update_layout(title="Sequential funnel for test group B")
fig.show() 

It is also very interesting to see that the sequential funnel charts for the two groups are almost the same (there are very minor differences).
We can see that the the sequential funnel for the 2 groups is the same while the simple funnel shows a difference in the sequence.
Only 2% of users complete a purchase going through all 4 stages of the sequence in this particulare order, while 32-32% of the test users complete a purchase on the site.
This teaches us that the app has shortcuts that allow users to skip certine events. For example, skipping the cart (checkout) and going straight to payment.

Note: While looking at the pivot tables for the sequential funnel I noticed that there are some irregularities in the event timestamps. Events that should happen in a sequence have the same timestamp. I tried to handle this by using >= when calculating the funnel values. Nevertheless, those irregularities shold be kept in mind when looking at the data.

Average number of events per user

Go back to the Table of Contents

Is the number of events per user distributed equally among the samples?

In [48]:
events_per_userA = groupA.groupby('user_id').agg({'event_name':'count'}).reset_index()
print('Average number of events per user in group A:', round(events_per_userA['event_name'].mean(), 2))
Average number of events per user in group A: 8.1
In [49]:
events_per_userB = groupB.groupby('user_id').agg({'event_name':'count'}).reset_index()
print('Average number of events per user in group B:', round(events_per_userB['event_name'].mean(), 2))
Average number of events per user in group B: 7.69

There is a small difference between the samples. It seems that users in the control group visit the site slightly more.

What period of time does the data cover?

Go back to the Table of Contents

In [50]:
#histogram by date and time
plt.figure(figsize=(10,5))
test_events['event_dt'].hist(bins=150, xrot=60, color='royalblue');
plt.title('Distribution of events');
plt.xlabel('Dates');
plt.ylabel('Number of events');

There is a gradual rise in event amout until Dec 13th. This makes sense- this is the recruitment period.
After that, the events reach a peak at around Dec 21.
According to the EU events calendar, the site launched the Christmas&New Year Promo on Dec 25th. We should expect a rise in events after that date. Nevertheless, the amount of events drops drastically (back to recruitment period numbers).
This can be explained by holiday season- people don't actually shop on the holiday itself. The promo should have started before Christmas and not on the day of.

In [51]:
calendar_eu
Out[51]:
name regions start_dt finish_dt
0 Christmas&New Year Promo EU, N.America 2020-12-25 2021-01-03
1 St. Valentine's Day Giveaway EU, CIS, APAC, N.America 2020-02-14 2020-02-16
2 St. Patric's Day Promo EU, N.America 2020-03-17 2020-03-19
3 Easter Promo EU, CIS, APAC, N.America 2020-04-12 2020-04-19
5 Black Friday Ads Campaign EU, CIS, APAC, N.America 2020-11-26 2020-12-01
7 Labor day (May 1st) Ads Campaign EU, CIS, APAC 2020-05-01 2020-05-03
8 International Women's Day Promo EU, CIS, APAC 2020-03-08 2020-03-10

Comments on the data

Go back to the Table of Contents

Are there any peculiarities in the data that you have to take into account before starting the A/B test?

There are several things to notice:

  1. There seems to be a problem at the recruitment stage of the test, which resulted in 441 participants (6.33% of events) landing in both test groups. We had to remove these users. As a result, we are running the test with less participants than expected.
  2. There are some irragularities with the timestamps. For example, some users preformed all events at the exact same moment. I tried to solve this issue in the analysis but I can't be sure the results correctly reflect user behavior.
In [52]:
#creating a pivot table by groups and events
pivot = test_events.pivot_table(index='event_name', values='user_id', columns='group', aggfunc=lambda x: x.nunique()).reset_index()
pivot
Out[52]:
group event_name A B
0 login 5350 5058
1 product_cart 1726 1716
2 product_page 3577 3318
3 purchase 1903 1683

I will define a function that runs the z-test and can be used to test all events.

In [53]:
#defining a function that runs the z-test
def check_hypothesis(event, alpha=0.05):
    """This function runs a z-test to check for statistical difference between the proportions.

    The input argumentsare event name to test and alpha level.
    """
    
    successes1 = pivot[pivot.event_name == event]['A'].iloc[0]
    successes2 = pivot[pivot.event_name == event]['B'].iloc[0]
    
    trials1 = test_events[test_events.group == 'A']['user_id'].nunique()
    trials2 = test_events[test_events.group == 'B']['user_id'].nunique()
    
    p1 = successes1/trials1

    p2 = successes2/trials2

    p_combined = (successes1 + successes2) / (trials1 + trials2)

    difference = p1 - p2
        
    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))
  
    distr = st.norm(0, 1) 

    p_value = (1 - distr.cdf(abs(z_value))) * 2

    print('p-value: ', p_value)

    if (p_value < alpha):
        print("Reject H0 for", event)
    else:
        print("Fail to Reject H0 for", event)
    
    print('')

Now we can run the z-test on each event:

H0 - There is a no significant difference between the proportions of the control group and the test group for checked event.

H1 - There is a significant difference between the proportions of the control group and the test group for checked event.

The significance level is set to 5% as normally used in the business industry.

In [54]:
for i in pivot.event_name.unique():
    check_hypothesis(i, alpha=0.05)
p-value:  0.33090965942701267
Fail to Reject H0 for login

p-value:  0.07014790784441405
Fail to Reject H0 for product_cart

p-value:  0.17829662456570405
Fail to Reject H0 for product_page

p-value:  0.014023560498416465
Reject H0 for purchase

There is no statistically significant difference between the two groups for login, product page visits and product cart visits.
There is, however, a difference in purchase. Group A participants made more purchases.

The test was not successful for several reasons:

  1. Holiday season is not a good time to run an A/B test. Shopping habits change over the month of December (Rise before the holidays and drop during the holidays, a pattern we can clearly see in the data).
  2. In addition, it is unwise to run a test while having a promotion (unless the promotion is what's being tested), since we can't distinguish between behavior changes that are related to the feature and changes related to the promotion. In this particular case, the promotion started too late so it might have little affect on the users (they weren't visiting the site anyway because of the holidays).
  3. There were less participants in the test than expected.
  4. The timestamps are inconsistent.

Despite the above, I notice to following:

  1. There was a difference in the funnel between the two groups. We know group B visited the cart more than group A, but can't pin point exactly at what stage of the funnel this happens.
  2. Group A participants made more purchases.

To conclude: The test should be run again in January (after the holiday season ends) and after technical issues are fixed.

Thank you for your attention!