CallMeMaybe

Project Description

The virtual telephony service CallMeMaybe (call center) is developing a new function that will give supervisors information on the least effective operators. An operator is considered ineffective if they have a large number of missed incoming calls (internal and external) and a long waiting time for incoming calls. Moreover, if an operator is supposed to make outgoing calls, a small number of them is also a sign of ineffectiveness.

The datasets contain data on the use of the virtual telephony service CallMeMaybe. Its clients are organizations that need to distribute large numbers of incoming calls among various operators or make outgoing calls through their operators. Operators can also make internal calls to communicate with one another. These calls go through CallMeMaybe's network.

Executive summary

Link to presentation: https://drive.google.com/file/d/1EXTnZYiKVr9UPSqS_IO8U_-A0boTII16/view?usp=sharing

Link to dashboard: https://public.tableau.com/profile/alina7324#!/vizhome/CallMeMaybe_16106288661390/CallMeMaybedashboard

Task: Determine the thresholds for effective operator performance according to KPIs

Recommendations:

  • Effective operators:
    • Answer calls in 25 seconds
    • Make more than 34 outgoing calls each day
    • Get 25 answered calls each day
  • Ineffective operators:

    • Take longer than 25 seconds to answer an incoming call
    • Make less than 34 outgoing calls per day
    • Get less than 25 answered calls per day
  • Check missed calls (abandonment rate) on the company level and not on operator level.

  • Each operator should only make one type of call (incoming or outgoing). This will decrease the call abandonment rate.
  • Put emphasis on outgoing calls.
  • Put emphasis on improving the operators working with clients with tariff plans B and C.

Opening data

In [1]:
import pandas as pd 
import numpy as np
from numpy import median
from scipy import stats as st
import math as mth

import datetime as dt
from datetime import datetime

import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.gridspec as gs

import seaborn as sns

import plotly
import plotly.express as px
from plotly import graph_objects as go
import plotly.io as pio
pio.templates.default = "seaborn"

from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import *

import sys
import warnings
if not sys.warnoptions:    
       warnings.simplefilter("ignore")
In [2]:
#import the files

location = '/some/local/path'

try: 
    calls = pd.read_csv(location + 'telecom_dataset_us.csv')
    clients = pd.read_csv(location + 'telecom_clients_us.csv')
except: 
    calls = pd.read_csv('/datasets/telecom_dataset_us.csv')
    clients = pd.read_csv('/datasets/telecom_clients_us.csv')
In [3]:
def data_study(df): 
    """This function runs methods to study the data and prints out the results.

    The input argument is a dataset.  
    """
    print(df.info(memory_usage='deep'))
    print("")
    print("Head:")
    display(df.head())
    print("")
    print("Tail:")
    display(df.tail())
    print("")
    print("Check for Nulls:")
    display(df.isnull().sum()) 
    print("")
    print("Check for 0:")
    for c in df.columns: 
        print(c , len(df[df[c] == 0]))
    print("")
    print('Number of duplicated rows:', df.duplicated().sum())
In [4]:
data_study(calls)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53902 entries, 0 to 53901
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   user_id              53902 non-null  int64  
 1   date                 53902 non-null  object 
 2   direction            53902 non-null  object 
 3   internal             53785 non-null  object 
 4   operator_id          45730 non-null  float64
 5   is_missed_call       53902 non-null  bool   
 6   calls_count          53902 non-null  int64  
 7   call_duration        53902 non-null  int64  
 8   total_call_duration  53902 non-null  int64  
dtypes: bool(1), float64(1), int64(4), object(3)
memory usage: 11.1 MB
None

Head:
user_id date direction internal operator_id is_missed_call calls_count call_duration total_call_duration
0 166377 2019-08-04 00:00:00+03:00 in False NaN True 2 0 4
1 166377 2019-08-05 00:00:00+03:00 out True 880022.0 True 3 0 5
2 166377 2019-08-05 00:00:00+03:00 out True 880020.0 True 1 0 1
3 166377 2019-08-05 00:00:00+03:00 out True 880020.0 False 1 10 18
4 166377 2019-08-05 00:00:00+03:00 out False 880022.0 True 3 0 25
Tail:
user_id date direction internal operator_id is_missed_call calls_count call_duration total_call_duration
53897 168606 2019-11-10 00:00:00+03:00 out True 957922.0 True 1 0 38
53898 168606 2019-11-11 00:00:00+03:00 out True 957922.0 False 2 479 501
53899 168606 2019-11-15 00:00:00+03:00 out True 957922.0 False 4 3130 3190
53900 168606 2019-11-15 00:00:00+03:00 out True 957922.0 False 4 3130 3190
53901 168606 2019-11-19 00:00:00+03:00 in False NaN True 2 0 64
Check for Nulls:
user_id                   0
date                      0
direction                 0
internal                117
operator_id            8172
is_missed_call            0
calls_count               0
call_duration             0
total_call_duration       0
dtype: int64
Check for 0:
user_id 0
date 0
direction 0
internal 47621
operator_id 0
is_missed_call 30334
calls_count 0
call_duration 23263
total_call_duration 1320

Number of duplicated rows: 4900

Missing values:
There are 8,172 missing operators. This is 15% of the data. I can't just remove this much data, so I will investigate these rows to look for the best solution.

In [5]:
#subset of rows with missing operators
missing_op = calls[calls['operator_id'].isnull()]
missing_op.sample(10)
Out[5]:
user_id date direction internal operator_id is_missed_call calls_count call_duration total_call_duration
35215 167521 2019-11-24 00:00:00+03:00 in False NaN True 18 0 471
7782 166636 2019-10-15 00:00:00+03:00 in False NaN True 1 0 38
31004 167359 2019-11-23 00:00:00+03:00 in False NaN True 1 0 56
35673 167532 2019-11-28 00:00:00+03:00 in False NaN True 6 0 83
43908 168021 2019-10-20 00:00:00+03:00 in False NaN True 2 0 0
13874 166803 2019-11-21 00:00:00+03:00 in False NaN True 1 0 25
33015 167479 2019-10-23 00:00:00+03:00 in False NaN True 1 0 6
26920 167158 2019-11-22 00:00:00+03:00 in False NaN True 7 0 15
16028 166908 2019-08-27 00:00:00+03:00 in False NaN True 2 0 30
41477 167870 2019-11-16 00:00:00+03:00 in False NaN True 1 0 86
In [6]:
#looking for commonalities of missing values  
print(missing_op.user_id.nunique())
print(" ")
print(missing_op.date.nunique())
print(" ")
print(missing_op.direction.value_counts())
print(" ")
print(missing_op.internal.value_counts())
print(" ")
print(missing_op.is_missed_call.value_counts())
305
 
119
 
in     7972
out     200
Name: direction, dtype: int64
 
False    7760
True      355
Name: internal, dtype: int64
 
True     8050
False     122
Name: is_missed_call, dtype: int64

The missing values are distributed among 305 users, 119 dates, and all types of calls.
What is very clear is that almost all the calls here are incoming external unanswered calls. These calls might have happened outside the working hours or were abandoned by the caller before they were assigned to an operator.
Since we are working here to identify operator metrics, these calls are irrelevant to our task and should be dropped from the data. However, it is essential to point out that this reflects negatively on the call center as a whole.
I kept the original dataset as is to demonstrate the difference in the EDA section.

In [7]:
#drop rows with missing values ()
data = calls.dropna(subset=['operator_id']).reset_index(drop=True)
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45730 entries, 0 to 45729
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   user_id              45730 non-null  int64  
 1   date                 45730 non-null  object 
 2   direction            45730 non-null  object 
 3   internal             45670 non-null  object 
 4   operator_id          45730 non-null  float64
 5   is_missed_call       45730 non-null  bool   
 6   calls_count          45730 non-null  int64  
 7   call_duration        45730 non-null  int64  
 8   total_call_duration  45730 non-null  int64  
dtypes: bool(1), float64(1), int64(4), object(3)
memory usage: 2.8+ MB

Now let's check the missing values in internal columnn:

In [8]:
print(data['internal'].isnull().sum())
print(" ")
data[data['internal'].isnull()].head(10)
60
 
Out[8]:
user_id date direction internal operator_id is_missed_call calls_count call_duration total_call_duration
1663 166406 2019-09-02 00:00:00+03:00 in NaN 879898.0 False 1 2 9
5302 166541 2019-09-26 00:00:00+03:00 in NaN 908960.0 False 1 393 423
5306 166541 2019-09-26 00:00:00+03:00 in NaN 908958.0 False 2 547 612
6359 166604 2019-11-01 00:00:00+03:00 in NaN 893402.0 False 1 94 117
7339 166658 2019-09-24 00:00:00+03:00 in NaN 890404.0 False 1 150 157
7748 166658 2019-10-15 00:00:00+03:00 in NaN 890404.0 False 1 51 57
13415 166916 2019-10-01 00:00:00+03:00 in NaN 906396.0 False 1 100 117
13531 166916 2019-10-07 00:00:00+03:00 in NaN 906406.0 False 3 378 461
13781 166916 2019-10-23 00:00:00+03:00 in NaN 906400.0 False 1 81 110
15558 166983 2019-09-02 00:00:00+03:00 in NaN 901880.0 False 1 119 127
In [9]:
data.internal.value_counts()
Out[9]:
False    39861
True      5809
Name: internal, dtype: int64

There are 60 missing values left in internal column (0.1%). This column is not significant for our analysis. For this reason, I decided to leave these values as they were. I will convert the column type to bool in the next step, which will automatically convert the NaN to False.

Converting data types:

In [10]:
#convert date from object to dt and eliminate the hour (we can see it's insignificant here)
data['date'] = pd.to_datetime(data['date']).dt.date #this gets rid of the hour but returns a str
data['date'] = pd.to_datetime(data['date']) #this converts again to dt
In [11]:
#convert operator_id from float to int
data['operator_id'] = data['operator_id'].astype('int') 
In [12]:
#convert internal from object to bool
data['internal'] = data['internal'].astype('bool') 
In [13]:
#convert direction from object to category
data['direction'] = data['direction'].astype('category') 
In [14]:
#test
print(data.info(memory_usage='deep'))
print("")
data.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45730 entries, 0 to 45729
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   user_id              45730 non-null  int64         
 1   date                 45730 non-null  datetime64[ns]
 2   direction            45730 non-null  category      
 3   internal             45730 non-null  bool          
 4   operator_id          45730 non-null  int32         
 5   is_missed_call       45730 non-null  bool          
 6   calls_count          45730 non-null  int64         
 7   call_duration        45730 non-null  int64         
 8   total_call_duration  45730 non-null  int64         
dtypes: bool(2), category(1), datetime64[ns](1), int32(1), int64(4)
memory usage: 2.1 MB
None

Out[14]:
user_id date direction internal operator_id is_missed_call calls_count call_duration total_call_duration
0 166377 2019-08-05 out True 880022 True 3 0 5
1 166377 2019-08-05 out True 880020 True 1 0 1
2 166377 2019-08-05 out True 880020 False 1 10 18
3 166377 2019-08-05 out False 880022 True 3 0 25
4 166377 2019-08-05 out False 880020 False 2 3 29

Conversion of data types reduced memory usage from 11.1MB to 2.1MB

Duplicates
Before dropping rows, there were 4,900 duplicated rows in the data. First, let's see if that changed:

In [15]:
data.duplicated().sum()
Out[15]:
4184
In [16]:
data[data.duplicated()].sample(20)
Out[16]:
user_id date direction internal operator_id is_missed_call calls_count call_duration total_call_duration
17639 167016 2019-11-06 in False 916424 False 27 2306 2557
1002 166405 2019-10-14 out False 882684 False 1 46 61
18228 167035 2019-11-18 in False 923526 False 33 2152 2398
6177 166582 2019-10-29 in False 885890 False 17 613 778
32105 167650 2019-10-01 out False 921318 True 18 0 497
26620 167445 2019-11-10 out False 920726 False 2 156 170
15715 166983 2019-09-18 in False 901884 False 7 1477 1589
510 166377 2019-11-25 in False 880028 False 1 89 105
42040 168187 2019-11-26 out False 937966 True 8 0 100
21113 167125 2019-10-01 out True 902778 True 2 0 8
13372 166916 2019-09-27 in False 906404 False 26 2976 3986
43921 168336 2019-11-21 out False 948758 True 15 0 318
25221 167285 2019-10-11 out True 908640 False 1 284 287
37340 168047 2019-10-30 out False 937604 False 1 4 4
24495 167199 2019-11-17 in False 911310 False 1 18 24
8794 166678 2019-08-31 in False 888868 False 1 34 42
39857 168187 2019-10-18 out False 937808 True 9 0 288
43347 168253 2019-11-18 out True 952948 True 2 0 0
22697 167172 2019-10-29 out False 926490 False 1 756 768
32898 167654 2019-11-13 out False 918986 True 12 0 357

There is no apparent connection between the duplicates.
As each row is a record of an operator's work, the rows are clearly caused by a bug in the system and should be dropped.

In [17]:
#dropping duplicates 
data = data.drop_duplicates()
calls = calls.drop_duplicates() 
In [18]:
#test
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 41546 entries, 0 to 45728
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   user_id              41546 non-null  int64         
 1   date                 41546 non-null  datetime64[ns]
 2   direction            41546 non-null  category      
 3   internal             41546 non-null  bool          
 4   operator_id          41546 non-null  int32         
 5   is_missed_call       41546 non-null  bool          
 6   calls_count          41546 non-null  int64         
 7   call_duration        41546 non-null  int64         
 8   total_call_duration  41546 non-null  int64         
dtypes: bool(2), category(1), datetime64[ns](1), int32(1), int64(4)
memory usage: 2.2 MB

Check for errors in the data:

In [19]:
#check if there are any errors in is_missed_call
data.query('is_missed_call == True & call_duration > 0').count()
Out[19]:
user_id                296
date                   296
direction              296
internal               296
operator_id            296
is_missed_call         296
calls_count            296
call_duration          296
total_call_duration    296
dtype: int64
In [20]:
data.query('is_missed_call == True & call_duration > 0').sample(5)
Out[20]:
user_id date direction internal operator_id is_missed_call calls_count call_duration total_call_duration
5780 166582 2019-09-11 in False 885890 True 1 1 27
36344 167977 2019-11-19 in False 944226 True 2 66 170
42187 168187 2019-11-28 in False 937812 True 1 61 69
5231 166541 2019-09-13 in False 908958 True 1 1 21
27179 167466 2019-10-21 in False 921818 True 1 1 6

There are 296 rows where call duration is larger than 0 but indicated as a missed call. This is an error in is_missed_call column. To fix the error, I'll change the is_missed_call value to False for these rows.

In [21]:
data['is_missed_call'][(data.is_missed_call == True) & (data.call_duration > 0)] = False
In [22]:
#test
data.query('is_missed_call == True & call_duration > 0').count() 
Out[22]:
user_id                0
date                   0
direction              0
internal               0
operator_id            0
is_missed_call         0
calls_count            0
call_duration          0
total_call_duration    0
dtype: int64
In [23]:
#Check values in numeric columns
data.describe().T
Out[23]:
count mean std min 25% 50% 75% max
user_id 41546.0 167301.311992 600.418838 166377.0 166782.0 167175.0 167827.0 168606.0
operator_id 41546.0 916523.315409 21230.041008 879896.0 900790.5 913938.0 937708.0 973286.0
calls_count 41546.0 16.900424 59.749373 1.0 1.0 4.0 13.0 4817.0
call_duration 41546.0 1009.769172 4064.106117 0.0 0.0 106.0 770.0 144395.0
total_call_duration 41546.0 1321.592813 4785.978633 0.0 67.0 288.0 1104.0 166155.0
In [24]:
data['date'].describe()
Out[24]:
count                   41546
unique                    118
top       2019-11-25 00:00:00
freq                      988
first     2019-08-02 00:00:00
last      2019-11-28 00:00:00
Name: date, dtype: object
In [25]:
data.describe(include=['category','bool']).T  
Out[25]:
count unique top freq
direction 41546 2 out 28813
internal 41546 2 False 36161
is_missed_call 41546 2 False 27732

Everything else seems to be ok.

In [26]:
data_study(clients)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 732 entries, 0 to 731
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   user_id      732 non-null    int64 
 1   tariff_plan  732 non-null    object
 2   date_start   732 non-null    object
dtypes: int64(1), object(2)
memory usage: 98.1 KB
None

Head:
user_id tariff_plan date_start
0 166713 A 2019-08-15
1 166901 A 2019-08-23
2 168527 A 2019-10-29
3 167097 A 2019-09-01
4 168193 A 2019-10-16
Tail:
user_id tariff_plan date_start
727 166554 B 2019-08-08
728 166911 B 2019-08-23
729 167012 B 2019-08-28
730 166867 B 2019-08-22
731 166565 B 2019-08-08
Check for Nulls:
user_id        0
tariff_plan    0
date_start     0
dtype: int64
Check for 0:
user_id 0
tariff_plan 0
date_start 0

Number of duplicated rows: 0

No missing or duplicates here.
Date column should be converted to dt:

In [27]:
#convert date to dt type
clients['date_start'] = pd.to_datetime(clients['date_start'])
In [28]:
#rename the column 
clients = clients.rename(columns={'date_start':'join_date'})
In [29]:
clients['join_date'].describe()
Out[29]:
count                     732
unique                     73
top       2019-09-24 00:00:00
freq                       24
first     2019-08-01 00:00:00
last      2019-10-31 00:00:00
Name: join_date, dtype: object
In [30]:
clients.describe(include='object').T  
Out[30]:
count unique top freq
tariff_plan 732 3 C 395

I want to make sure that there are no logged calls for a user before the join date. For this purpose, and for further analysis that involves the clients' dataset, I will create a merged dataset that contains all the data.

In [31]:
#merge the two datasets
full_data = data.merge(clients, how='inner', on='user_id')
full_data.head()
Out[31]:
user_id date direction internal operator_id is_missed_call calls_count call_duration total_call_duration tariff_plan join_date
0 166377 2019-08-05 out True 880022 True 3 0 5 B 2019-08-01
1 166377 2019-08-05 out True 880020 True 1 0 1 B 2019-08-01
2 166377 2019-08-05 out True 880020 False 1 10 18 B 2019-08-01
3 166377 2019-08-05 out False 880022 True 3 0 25 B 2019-08-01
4 166377 2019-08-05 out False 880020 False 2 3 29 B 2019-08-01
In [32]:
#check for date anomalies
full_data.query('date < join_date').count() 
Out[32]:
user_id                0
date                   0
direction              0
internal               0
operator_id            0
is_missed_call         0
calls_count            0
call_duration          0
total_call_duration    0
tariff_plan            0
join_date              0
dtype: int64

Final touches:

In [33]:
#reorder columns 
full_data = full_data.reindex(columns=['user_id','join_date','tariff_plan','date','direction','internal','operator_id',
                                       'is_missed_call','calls_count','call_duration','total_call_duration'])

full_data.head()
Out[33]:
user_id join_date tariff_plan date direction internal operator_id is_missed_call calls_count call_duration total_call_duration
0 166377 2019-08-01 B 2019-08-05 out True 880022 True 3 0 5
1 166377 2019-08-01 B 2019-08-05 out True 880020 True 1 0 1
2 166377 2019-08-01 B 2019-08-05 out True 880020 False 1 10 18
3 166377 2019-08-01 B 2019-08-05 out False 880022 True 3 0 25
4 166377 2019-08-01 B 2019-08-05 out False 880020 False 2 3 29

Preprocessing stage summary:

  • The data consists of two datasets:
    • calls - a log of 53,902 rows and 9 columns
    • clients - new clients 732 rows and 3 columns
  • In the calls dataset, I found 8,172 missing values and 4,900 duplicated rows. These rows were eliminated.
  • There were 5 columns in the calls dataset that required conversion to other data types. The conversion reduced memory usage from 11.1MB to 2.1MB.
  • I found 296 rows with an error in is_missed_call column and fixed it.
  • Eventually, calls data has 41,546 rows.
  • The clients dataset had no missing or duplicated data.
  • I created a dataset that includes all the data.

Exploratory data analysis

Find how many users and operators are in the log

Go back to the Table of Contents

In [34]:
print('The log features {} users and {} operators.'.format(full_data.user_id.nunique(),full_data.operator_id.nunique()))
The log features 290 users and 1092 operators.

We have significantlly more operators than users.

In [35]:
grouped_operator = full_data.groupby('operator_id').agg({'user_id':'nunique'}).sort_values(by='user_id', ascending=False).reset_index()
grouped_operator.head()
Out[35]:
operator_id user_id
0 879896 1
1 937966 1
2 938072 1
3 938070 1
4 938022 1

Each operator is assigned to one user.
Now let's look from a user perspective:

In [36]:
grouped_user = full_data.groupby('user_id').agg({'operator_id':'nunique'}).sort_values(by='operator_id', ascending=False).reset_index()
grouped_user.head(20)
Out[36]:
user_id operator_id
0 168187 50
1 167626 48
2 167497 30
3 168252 28
4 168361 27
5 168062 27
6 166680 21
7 166520 18
8 166658 17
9 166916 16
10 168091 16
11 167176 15
12 167188 15
13 167580 15
14 168336 14
15 167521 12
16 168073 11
17 167445 11
18 166405 10
19 167828 9
In [37]:
fig = px.histogram(grouped_user, x='operator_id', nbins=100)
fig.update_layout(title_text='Distribution of operators assigned to a user',
                 xaxis_title_text='Number of operators', 
                 yaxis_title_text='Number of Users')
fig.show()

We can see that most users are assigned to just one operator, but we have some big clients that need more than one operator.
Let's determine what makes a client big:

In [38]:
def whiskers(df, column):
    """This function calculates the outlier thresholds.

    The input argument is a dataset and column name.  
    """
    stat = df[column].describe()
    iqr = stat[6] - stat[4]
    left_whisker = round(stat[4] - 1.5 * iqr, 2)
    right_whisker = round(stat[6] + 1.5 * iqr, 2)

    if left_whisker < stat[3]: 
        left_whisker = stat[3] 
        
    if right_whisker > stat[7]: 
        right_whisker = stat[7]
        
    return [left_whisker, right_whisker]
In [39]:
whiskers(grouped_user,'operator_id')
Out[39]:
[1.0, 8.5]

We can now say that users that were assigned to more than 9 operators are outliers in the data (basically, the top 19 users that appeared in print out above). In this case, we are checking the quality of each operator's work, so the identity of the user is irrelevant. We can keep the outliers and move on.

Invastigate the proportions of incoming and outgoing calls

Go back to the Table of Contents

In [40]:
#group calls by direction
direction = full_data.groupby('direction').agg({'calls_count':'sum'}).reset_index()
direction
Out[40]:
direction calls_count
0 in 93802
1 out 608343
In [41]:
fig = go.Figure(data=[go.Pie(labels=direction['direction'], values=direction['calls_count'], hole=.2, pull=[0.2, 0])])
fig.update_layout(title_text='Proportion of call direction after eliminating missing operators')
fig.show()

86.6% of calls in the log are outgoing calls. Only a small share of the service done for the clients is answering incoming calls.
It is important to remember that we dropped a large number of incoming calls when handling missing values. Just to get a clear picture of the workload, let's compare to the original data:

In [42]:
direction_all = calls.groupby('direction').agg({'calls_count':'sum'}).reset_index()

fig = go.Figure(data=[go.Pie(labels=direction_all['direction'], values=direction_all['calls_count'], hole=.2, pull=[0.2, 0])])
fig.update_layout(title_text='Proportion of call direction of the entire data (including missing operators)')
fig.show()

We can see that the actual work load is about 75%-25%.

Invastigate the proportions of internal and external calls

Go back to the Table of Contents

In [43]:
in_ex = full_data.groupby('internal').agg({'calls_count':'sum'}).reset_index()
in_ex
Out[43]:
internal calls_count
0 False 688420
1 True 13725
In [44]:
fig = go.Figure(data=[go.Pie(labels=in_ex['internal'], values=in_ex['calls_count'], hole=.2, pull=[0.2, 0])])
fig.update_layout(title_text='Distribution of internal and external calls')
fig.show()

Only a small portion of calls (2%) are internal calls (between operators).

Investigate the distribution of entries over time and determaine whether all the data should be used for analysis

Go back to the Table of Contents

In [45]:
full_data.date.describe()
Out[45]:
count                   41546
unique                    118
top       2019-11-25 00:00:00
freq                      988
first     2019-08-02 00:00:00
last      2019-11-28 00:00:00
Name: date, dtype: object
In [46]:
fig = px.histogram(full_data, x='date', nbins=100)
fig.update_layout(title_text='Distribution of entries by date',
                 xaxis_title_text='Dates', 
                 yaxis_title_text='Number of accurances')
fig.show()
In [47]:
full_data.join_date.describe()
Out[47]:
count                   41546
unique                     65
top       2019-10-16 00:00:00
freq                     2360
first     2019-08-01 00:00:00
last      2019-10-31 00:00:00
Name: join_date, dtype: object

We can see our data covers 4 months (Aug-Nov) and detect each week.
There is a gradual rise in work during these months. As we can see, the users join between 01/08-31/10/2019 (parallelly to the collection of the data). This could explain the rise in workload.
From now on, in the analysis, I will analyze the daily average for each operator. This will allow me to use all the data in the record.

Find Outliers for call duration

Go back to the Table of Contents

In [48]:
def find_outliers(df, column):
    print('Mean {} is: {}'.format(column, df[column].mean()))
    print('Median {} is: {}'.format(column, df[column].median()))
    print('{} whiskers:{}'.format(column, whiskers(df,column)))
    print("")
    fig = px.box(df, y=column)
    fig.update_layout(title_text='Box plot for {}'.format(column))
    fig.show()
In [49]:
find_outliers(full_data, 'total_call_duration')
Mean total_call_duration is: 1321.5928127858278
Median total_call_duration is: 288.0
total_call_duration whiskers:[0.0, 2659.5]

At a first look, we can see that the data is very skewed. There is a big difference between the mean and the median.
There is a long tail, and the left whisker of 2659.5sec (44min) seems like a low bar to set for this metric (a typical workday can be between 8-12 hours). In fact, this will probably eliminate all the effective operators from the data.
The max value in the data is 166,155sec (46 hours). I want to take a closer look at those values.
I think the bar should be set at 12 working hours per day (43,200sec). First, I want to check working hours per operator per day:

In [50]:
#group the total_call_duration by date and by operator
work_day=full_data.groupby(['date','operator_id']).agg({'total_call_duration':'sum'}).sort_values(by='total_call_duration', ascending=False).reset_index()
work_day.head()
Out[50]:
date operator_id total_call_duration
0 2019-09-25 885876 189989
1 2019-09-26 885876 172165
2 2019-09-09 885876 160826
3 2019-09-18 885876 159178
4 2019-10-02 885876 155402
In [51]:
work_day[work_day['total_call_duration'] > 43200]['operator_id'].nunique()
Out[51]:
7

So, 7 operators logged crazy hours. Maybe a few operators used the same login to the system. These operator_ids should be dropped from the data.

In [52]:
drop_op = work_day[work_day['total_call_duration'] > 43200]['operator_id'].unique().tolist()
drop_op
Out[52]:
[885876, 885890, 925922, 929424, 945278, 929428, 908640]
In [53]:
indexNames = full_data[full_data['operator_id'].isin(drop_op)].index
full_data = full_data.drop(indexNames).reset_index()
full_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40658 entries, 0 to 40657
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   index                40658 non-null  int64         
 1   user_id              40658 non-null  int64         
 2   join_date            40658 non-null  datetime64[ns]
 3   tariff_plan          40658 non-null  object        
 4   date                 40658 non-null  datetime64[ns]
 5   direction            40658 non-null  category      
 6   internal             40658 non-null  bool          
 7   operator_id          40658 non-null  int32         
 8   is_missed_call       40658 non-null  bool          
 9   calls_count          40658 non-null  int64         
 10  call_duration        40658 non-null  int64         
 11  total_call_duration  40658 non-null  int64         
dtypes: bool(2), category(1), datetime64[ns](2), int32(1), int64(5), object(1)
memory usage: 2.8+ MB
In [54]:
find_outliers(full_data, 'total_call_duration')
Mean total_call_duration is: 1021.5887402233262
Median total_call_duration is: 279.0
total_call_duration whiskers:[0.0, 2520.38]

The data still has a long tail, but now the values make sense. This tail is precisely what we need to analyze.

Calculate waiting time and average waiting time for each entry

Go back to the Table of Contents

In [55]:
full_data['waiting_duration'] = full_data['total_call_duration'] - full_data['call_duration']
full_data['avg_wait_time'] = full_data['waiting_duration'] / full_data['calls_count']
full_data.head()
Out[55]:
index user_id join_date tariff_plan date direction internal operator_id is_missed_call calls_count call_duration total_call_duration waiting_duration avg_wait_time
0 0 166377 2019-08-01 B 2019-08-05 out True 880022 True 3 0 5 5 1.666667
1 1 166377 2019-08-01 B 2019-08-05 out True 880020 True 1 0 1 1 1.000000
2 2 166377 2019-08-01 B 2019-08-05 out True 880020 False 1 10 18 8 8.000000
3 3 166377 2019-08-01 B 2019-08-05 out False 880022 True 3 0 25 25 8.333333
4 4 166377 2019-08-01 B 2019-08-05 out False 880020 False 2 3 29 26 13.000000

Determine effectiveness using clustering

Go back to the Table of Contents

In this section, I will divide the data into incoming and outgoing calls, choose only the relevant columns for performance assessment, and run the KMeans clustering algorithm to determine effectiveness.

In [56]:
#subset of incoming calls
incoming = full_data.query('direction == "in"')
incoming.head()
Out[56]:
index user_id join_date tariff_plan date direction internal operator_id is_missed_call calls_count call_duration total_call_duration waiting_duration avg_wait_time
21 21 166377 2019-08-01 B 2019-08-12 in False 880028 False 1 407 411 4 4.0
26 26 166377 2019-08-01 B 2019-08-13 in False 880028 False 1 88 102 14 14.0
29 29 166377 2019-08-01 B 2019-08-14 in False 880026 False 2 197 218 21 10.5
30 30 166377 2019-08-01 B 2019-08-14 in False 880028 False 1 33 37 4 4.0
38 38 166377 2019-08-01 B 2019-08-15 in False 880028 False 1 23 27 4 4.0

There are two parameters to measure the ineffectiveness of an operator for incoming calls:

  1. A large number of missed incoming calls (internal and external)
  2. Long waiting time for incoming calls

For clustering, I'll calculate the average waiting time and average missed calls for each operator. The clustering algorithm will determine the threshold for effectiveness.

In [57]:
#converting True-False values in is_missed_call to 0 if calls were answered and to the call count if they were missed. 
incoming['is_missed_call'] = incoming['is_missed_call'].replace([False,True],[0,incoming['calls_count']])
incoming.head()
Out[57]:
index user_id join_date tariff_plan date direction internal operator_id is_missed_call calls_count call_duration total_call_duration waiting_duration avg_wait_time
21 21 166377 2019-08-01 B 2019-08-12 in False 880028 0 1 407 411 4 4.0
26 26 166377 2019-08-01 B 2019-08-13 in False 880028 0 1 88 102 14 14.0
29 29 166377 2019-08-01 B 2019-08-14 in False 880026 0 2 197 218 21 10.5
30 30 166377 2019-08-01 B 2019-08-14 in False 880028 0 1 33 37 4 4.0
38 38 166377 2019-08-01 B 2019-08-15 in False 880028 0 1 23 27 4 4.0
In [58]:
incoming_for_cluster = incoming.groupby(['operator_id']).agg({'is_missed_call':'mean','avg_wait_time':'mean'}).rename(columns={'is_missed_call':'avg_miss_call', 'avg_wait_time':'wait_time'})  
incoming_for_cluster.head()
Out[58]:
avg_miss_call wait_time
operator_id
879896 0.0 13.396199
879898 0.0 14.080117
880020 0.0 7.714286
880022 0.0 14.000000
880026 0.0 5.935185
In [59]:
linked = linkage(incoming_for_cluster, method = 'ward') 

plt.figure(figsize=(15, 10))  
dendrogram(linked, orientation='top',no_labels=True)
plt.title('Hierarchical clustering for incoming calls data')
plt.show() 

The dendrogram shows that indeed there are 2 clusters in this data.

In [60]:
km = KMeans(n_clusters = 2,random_state=True)
labels = km.fit_predict(incoming_for_cluster)
In [61]:
incoming_for_cluster['labels'] = labels
In [62]:
display(incoming_for_cluster.groupby('labels').mean())
print("")
display(incoming_for_cluster.groupby('labels').min())
print("")
display(incoming_for_cluster.groupby('labels').max())
avg_miss_call wait_time
labels
0 0.024017 12.825396
1 0.027530 36.648050

avg_miss_call wait_time
labels
0 0.0 0.678944
1 0.0 24.814815

avg_miss_call wait_time
labels
0 0.80 24.614815
1 0.56 115.500000

Several things to notice:

  1. The rate of missed calls (or Call Abandonment Rate) is similar for both groups. It is essential to point out that we eliminated many missed calls in the preprocessing part. In the next section, I'll check whether or not the difference is statistically significant.
  2. Average waiting time for effective operators is 13 sec.
  3. Average waiting time for ineffective operators is 37 sec.
  4. The ineffectiveness threshold should be set at 25 sec. This is the clear line between the two groups (min in group 1 and max in group 0).
In [63]:
#subset of outgoing calls
outgoing = full_data.query('direction == "out"')
outgoing.head()
Out[63]:
index user_id join_date tariff_plan date direction internal operator_id is_missed_call calls_count call_duration total_call_duration waiting_duration avg_wait_time
0 0 166377 2019-08-01 B 2019-08-05 out True 880022 True 3 0 5 5 1.666667
1 1 166377 2019-08-01 B 2019-08-05 out True 880020 True 1 0 1 1 1.000000
2 2 166377 2019-08-01 B 2019-08-05 out True 880020 False 1 10 18 8 8.000000
3 3 166377 2019-08-01 B 2019-08-05 out False 880022 True 3 0 25 25 8.333333
4 4 166377 2019-08-01 B 2019-08-05 out False 880020 False 2 3 29 26 13.000000

There was only one parameter set to measure the ineffectiveness of an operator for outgoing calls:

  1. A small number of outgoing calls

After reading about call center KPIs, I think that an additional parameter should be answer success rate (ASR).

For clustering, I'll calculate the average number of calls and average answered calls for each operator. The clustering algorithm will determine the threshold for effectiveness.

In [64]:
#converting True-False values in is_missed_call to 0 if calls were missed and to the call count if they were answered. 
outgoing['is_missed_call'] = outgoing['is_missed_call'].replace([False,True],[outgoing['calls_count'],0])
outgoing.head()
Out[64]:
index user_id join_date tariff_plan date direction internal operator_id is_missed_call calls_count call_duration total_call_duration waiting_duration avg_wait_time
0 0 166377 2019-08-01 B 2019-08-05 out True 880022 0 3 0 5 5 1.666667
1 1 166377 2019-08-01 B 2019-08-05 out True 880020 0 1 0 1 1 1.000000
2 2 166377 2019-08-01 B 2019-08-05 out True 880020 1 1 10 18 8 8.000000
3 3 166377 2019-08-01 B 2019-08-05 out False 880022 0 3 0 25 25 8.333333
4 4 166377 2019-08-01 B 2019-08-05 out False 880020 2 2 3 29 26 13.000000
In [65]:
outgoing_for_cluster = outgoing.groupby(['operator_id']).agg({'calls_count':'mean','is_missed_call':'mean'}).rename(columns={'calls_count':'avg_call_count','is_missed_call':'avg_answer_call'})       
outgoing_for_cluster.head()
Out[65]:
avg_call_count avg_answer_call
operator_id
879896 9.083333 6.468750
879898 42.405882 28.523529
880020 2.923077 1.230769
880022 3.048387 1.354839
880026 13.463415 9.512195
In [66]:
linked = linkage(outgoing_for_cluster, method = 'ward') 

plt.figure(figsize=(15, 10))  
dendrogram(linked, orientation='top', no_labels=True)
plt.title('Hierarchical clustering for outgoing calls data')
plt.show() 

We can see two clusters here.

In [67]:
km = KMeans(n_clusters = 2, random_state=True)
labels = km.fit_predict(outgoing_for_cluster)
In [68]:
outgoing_for_cluster['labels'] = labels
In [69]:
display(outgoing_for_cluster.groupby('labels').mean())
print("")
display(outgoing_for_cluster.groupby('labels').min())
print("")
display(outgoing_for_cluster.groupby('labels').max())
avg_call_count avg_answer_call
labels
0 6.335105 3.583750
1 60.747975 32.070915

avg_call_count avg_answer_call
labels
0 1.000000 0.000000
1 33.928571 0.777778

avg_call_count avg_answer_call
labels
0 36.500000 24.636364
1 180.467742 78.096774
In [70]:
outgoing_for_cluster.groupby('labels').median()
Out[70]:
avg_call_count avg_answer_call
labels
0 3.023256 1.875000
1 56.894737 30.645833

Several things to notice:

  1. Average calls a day made by effective operators is 61.
  2. Average calls a day made by ineffective operators is 6.
  3. The ineffectiveness threshold for daily calls should be set at 34 calls (above min for group 1 and below max for group 0).
  4. There is a large difference in ASR between the two groups. The threshold for daily ASR should be 25 calls (above the max value for group 0).

Testing statistical hypothesis

Difference between missed calls rate for incoming calls

Go back to the Table of Contents

In [71]:
effective = incoming_for_cluster.query('labels == 0')
ineffective = incoming_for_cluster.query('labels == 1')

To decide which test I should use to test the hypothesis and how to formulate a null hypothesis, I first need to know whether the data is normally distributed. To do that, I will use the Shapiro test of normality.
Null hypothesis H₀- The data is normally distributed.
Alternative hypothesis H₁- The data is not distributed normally.
The significance level is set to 5%, as normally used in the business industry.

In [72]:
def shapiro(df, column, alpha=0.05):
    """This function checks normal distribution using the Shapiro test.

    The input arguments are a dataset and a column name. Alpha value is an optional argumant.  
    """
    stat, p = st.shapiro(df[column])
    print('Statistics=%.3f, p=%.3f' % (stat, p))
    #interpret the test: 
    if p > alpha:
        print('Sample is normally distributed (fail to reject H0)')
    else:
        print('Sample is not normally distributed (reject H0)')
        
shapiro(effective, 'avg_miss_call')
effective['avg_miss_call'].hist();
Statistics=0.376, p=0.000
Sample is not normally distributed (reject H0)
In [73]:
shapiro(ineffective, 'avg_miss_call')
ineffective['avg_miss_call'].hist();
Statistics=0.401, p=0.000
Sample is not normally distributed (reject H0)

Since my samples are not normally distributed but the distribution has similar shapes, I will use the Wilcoxon-Mann-Whitney test.
This test requires equal sample sizes:

In [74]:
print("Effective length is: ", len(effective['avg_miss_call']))
print("Ineffective length is: ", len(ineffective['avg_miss_call']))
Effective length is:  594
Ineffective length is:  156
In [75]:
eff_sample = effective['avg_miss_call'].sample(len(ineffective['avg_miss_call']))

Null hypothesis H₀- There is no significant difference between the means of the two clusters.

Alternative hypothesis H₁- There is a significant difference between the means of the two clusters.

The significance level is set to 5%, as normally used in the business industry.

In [76]:
alpha = .05 # critical statistical significance level

results = st.wilcoxon(
    eff_sample,
    ineffective['avg_miss_call'])

print('p-value: ', results.pvalue)

if (results.pvalue < alpha):
        print("We reject the null hypothesis")
else:
        print("We can't reject the null hypothesis") 
p-value:  0.9078935889709072
We can't reject the null hypothesis

There is no significant difference between the performance of the two groups in this parameter.

Do tariff plans effect quality of service for outgoing calls

Go back to the Table of Contents

For this section, I will use a z-test to test the proportions between the tariff plans.
First, let's create a subset that includes both tariff plans and clusters.

In [77]:
temp = outgoing_for_cluster.reset_index()
temp.head()
Out[77]:
operator_id avg_call_count avg_answer_call labels
0 879896 9.083333 6.468750 0
1 879898 42.405882 28.523529 1
2 880020 2.923077 1.230769 0
3 880022 3.048387 1.354839 0
4 880026 13.463415 9.512195 0
In [78]:
outgoing = outgoing.merge(temp,how='inner',on='operator_id')
outgoing.head()
Out[78]:
index user_id join_date tariff_plan date direction internal operator_id is_missed_call calls_count call_duration total_call_duration waiting_duration avg_wait_time avg_call_count avg_answer_call labels
0 0 166377 2019-08-01 B 2019-08-05 out True 880022 0 3 0 5 5 1.666667 3.048387 1.354839 0
1 3 166377 2019-08-01 B 2019-08-05 out False 880022 0 3 0 25 25 8.333333 3.048387 1.354839 0
2 14 166377 2019-08-01 B 2019-08-08 out False 880022 2 2 558 568 10 5.000000 3.048387 1.354839 0
3 15 166377 2019-08-01 B 2019-08-08 out False 880022 0 4 0 28 28 7.000000 3.048387 1.354839 0
4 18 166377 2019-08-01 B 2019-08-09 out False 880022 0 2 0 10 10 5.000000 3.048387 1.354839 0
In [79]:
grouped_by_tariff = outgoing.groupby('tariff_plan').agg({'labels':'sum','operator_id':'count'}).rename(columns={'labels':'labels_sum', 'operator_id':'count'}).reset_index()  
grouped_by_tariff
Out[79]:
tariff_plan labels_sum count
0 A 2400 8274
1 B 426 10303
2 C 446 9608

In the outgoing clustering, 0 is the ineffective cluster, and 1 is the effective cluster, so the higher the labels_sum, the better service the plan gets.

In [80]:
#defining a function that runs the z-test
def check_hypothesis(tariff1, tariff2, alpha=0.05):
    """This function checks whether there is a significant difference between the proportions of two populations 
    using the z-test.
    
    The input arguments two populations. Alpha value is an optional argumant.  
    """
    successes1=grouped_by_tariff[grouped_by_tariff.tariff_plan==tariff1]['labels_sum'].iloc[0]
    successes2=grouped_by_tariff[grouped_by_tariff.tariff_plan==tariff2]['labels_sum'].iloc[0]
    
    trials1=grouped_by_tariff[grouped_by_tariff.tariff_plan==tariff1]['count'].iloc[0]
    trials2=grouped_by_tariff[grouped_by_tariff.tariff_plan==tariff2]['count'].iloc[0]
    
    
    p1 = successes1/trials1

    p2 = successes2/trials2

    p_combined = (successes1 + successes2) / (trials1 + trials2)

    difference = p1 - p2
        
    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))
  
    distr = st.norm(0, 1) 

    p_value = (1 - distr.cdf(abs(z_value))) * 2

    print('p-value: ', p_value)

    if (p_value < alpha):
        print("Reject H0 for tariff plans {} and {}".format(tariff1,tariff2))
    else:
        print("Fail to Reject H0 tariff plans {} and {}".format(tariff1,tariff2))

Null hypothesis H₀- There is no significant difference between the proportions of the two groups.

Alternative hypothesis H₁- There is a significant difference between the proportions of the two groups.

The significance level is set to 5% as normally used in the business industry.

In [81]:
check_hypothesis('A', 'B', alpha=0.05)
p-value:  0.0
Reject H0 for tariff plans A and B
In [82]:
check_hypothesis('B', 'C', alpha=0.05)
p-value:  0.08050402350097907
Fail to Reject H0 tariff plans B and C
In [83]:
check_hypothesis('A', 'C', alpha=0.05)
p-value:  0.0
Reject H0 for tariff plans A and C

Tariffs B and C get similarly effective service while tariff A get better quality of service for outgoing calls.

Note:I am aware of the multiple comparisons problem.
When we apply the Bonferroni or any other correction, we increase the risk of committing a type 2 error. As I'm running only 3 tests, an adjustment of the alpha value is not necessary.

Summary:

  • There were 8,172 missing operator ids in the data. This is 15% of the data. Almost all the missing ids were for incoming external unanswered calls. I consider these calls to be abandoned (the caller gave up before the call was assigned to an operator and answered). This is an indicator of the performance of the call center as a whole and not a specific operator.
  • Additional 4,184 rows had duplicated data.
  • During the preprocessing stage, 23.0% of the data was dropped.
  • Each operator is assigned to a single user.
  • Most users are assigned one operator, but some more prominent users need more than 1.
  • A big client would be considered a user with more than 9 operators. We have 19 clients of that caliber.
  • 86.6% of calls are outgoing. I consider this number to be so high because we dropped a large number of incoming calls. Before the incoming calls were dropped, 75.5% were outgoing.
  • Only 2% of the calls are internal.
  • There is a gradual rise in work during the months logged. The users join between 01/08-31/10/2019 (parallelly to the collection of the data). This can account for the rise in workload.
  • There were 7 operators that logged unrealistic working hours. There might have been several operators working on the same login to the company system.
  • Incoming calls KPIs set by the company:
    • A large number of missed incoming calls (internal and external) - Statistical test shows there is no significant difference between effective and ineffective operators.
    • Long waiting time for incoming calls - After 25 seconds, the operator will be considered ineffective.
  • Outgoing calls KPIs:
    • A small number of outgoing calls - An operator with less than 34 daily calls is ineffective.
    • Answer success rate - Threshold should be set at 25 calls per day
  • There is a significant difference between the number of effective operators in tariff plan A and the other two plans.

Conclusions:

  • Missed incoming calls (call abandonment rate)- Most of the relevant calls were never assigned to an operator and excluded from the data. For the missed calls that were analyzed, there was no significant difference between effective and ineffective operators. This indicates that the metric is not suitable to evaluate the performance of a single operator but should be used to evaluate the call center as a whole.
  • Waiting time (FRT- First response rate)- The average waiting time for effective operators is 13 sec and for ineffective operators is 37 sec. Studies show that callers wait 20-30 seconds for a response and usually abandon the call after a minute. The numbers indicate that the operators' performance in this metric is good.
  • Number of outgoing calls (calls per agent)- Average calls a day made by an effective operator is 61 and by ineffective operators is 6. This is a large difference. Since most calls are outgoing, this is the metric to focus on when evaluating an operator's performance.
  • Answer success rate (ASR)- A suggested metric to measure outgoing calls. ASR of an effective operator is 32 calls per day and by ineffective operators is only 3.5.

Recommendations:

  • Check missed calls (abandonment rate) on the company level and not on the operator level.
  • Each operator should only make one type of call (incoming or outgoing). This will decrease the call abandonment rate.
  • Emphasize outgoing calls.
  • Emphasize improving the operators working with clients with tariff plans B and C.
  • Effective operators:
    • Answer calls in 25 seconds
    • Make more than 34 outgoing calls each day
    • Get 25 answered calls each day
  • Ineffective operators:
    • Take longer than 25 seconds to answer an incoming call
    • Make less than 34 outgoing calls per day
    • Get less than 25 answered calls per day
In [ ]: