Customer churn

2 Years ago I published a series of articles on LinkedIn related to customer churn ( this is part I https://www.linkedin.com/pulse/machine-learning-predict-customer-churn-accuracy-patrick-rotzetter/). I tested a few approaches and showed how to explain the model using Lime, how to measure feature importance, fight against class imbalance, and few other related topics.

I recently performed a certification on Azure and wanted to test Azure Auto ML features on a simple example. So, I thought to use the previous examples and see how Azure AutoML simplifies the whole process.

Azure Machine Learning Set up

To use Auto ML from Azure, you need an Azure account of course. Then you can perform quite a few simple steps to create a machine learning workspace where your experiments can be registered. There is much more in a machine learning workspace, but for now let us focus on our experiment.

You can find the full documentation on how to set up and prepare the environment in the Microsoft documentation: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train

The experiment

I decided to run my experiment using my local machine while enjoying the benefits of Azure Machine Learning Auto ML features. Let us go through step by step.

Install the Azure Machine earning Python SDK as documented in the Microsoft documentation ( https://docs.microsoft.com/en-us/python/api/overview/azure/ml/install?view=azure-ml-py). You are now ready to use the python SDK.

Let us set up the environment and ML workspace, I have removed the specific environment details that need to be replaced by your own subscription and workspace details:

import os
subscription_id = os.getenv(“SUBSCRIPTION_ID”, default=”xxxxx”)
resource_group = os.getenv(“RESOURCE_GROUP”, default=”yyyyy")
workspace_name = os.getenv(“WORKSPACE_NAME”, default=”zzzz”)
workspace_region = os.getenv(“WORKSPACE_REGION”, default=”South Central US”)

Let us access the workspace with previously defined parameters and test that everything is fine

from azureml.core import Workspace
try:
  ws = Workspace(subscription_id = subscription_id, resource_group = resource_group,    workspace_name = workspace_name)    
# write the details of the workspace to a configuration file to the notebook library
ws.write_config()    
print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:    
  print("Workspace not accessible. Change your parameters or create a new workspace below")

Then define the experiment

ws = Workspace.from_config()
# choose a name for experiment
experiment_name = 'automl-classification-customer_churn'experiment=Experiment(ws, experiment_name)

Now that we are all set, let us load the data and take a look:

data = pd.read_csv(“churn_modelling.csv”)
data.head()
Image for post
Data set extract

The data set consists of a mix of categorical values like ‘Geography’ and Gender. It includes also numerical and boolean values. In a normal case one would have to hot encode the categorical values, but in our case Azure Auto ML takes care of this. That is really a great time saver.

Now, we just need to define the experiment parameters and start training, that’s it:

import logging
automl_settings = {"n_cross_validations": 3,
    "primary_metric": 'average_precision_score_weighted',             "experiment_timeout_hours": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ability to find the best model possible
"verbosity": logging.INFO, "enable_stack_ensemble": False}
automl_config = AutoMLConfig(task = 'classification',                             debug_log = 'automl_errors.log',training_data =data,                             label_column_name = 'Exited',**automl_settings)
local_run = experiment.submit(automl_config, show_output = True)

The experiment is now running on my local machine while logging the results in the Azure Machine Learning workspace. Pretty nice !

The job starts with different data preparation steps before evaluating different models.

Image for post
Experiment running on local machine

Experiments Results

All the results can then be access using the Azure Machine Learning workspace. A number of metrics are accessible and numerous charts can be used to understand the various models metrics and pick the best one.

Image for post
Model evaluations

The best model seems to perform quite well, despite the class imbalance with more than 90% average precision score. The average precision score summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html).

Image for post
ROC Curve for the ensemble voting method

The model clearly is better than random, and has an area under the curve of around 86%. This is a better metric than accuracy for imbalanced classes.

Note that you can mouse over the chart and display individual metrics at each point of the curve. This is quite a nice feature.

Image for post
Feature importance for ensemble voting method

The 2 top determining factors are age and account balance that seem to influence the churn the most compared to other predictors. Note that AutoML has converted the column names after having done some pre-processing. It seems we had some missing values and the algorithm has imputed missing values with mean values ( Age_MeanImputer).

Conclusion

AutoML provides a very fast way to evaluate different models for supervised learning and saves a lot of time. It has nice reporting features and all this in very few lines of codes. It does not mean that no machine learning knowledge is required, but I find it extremely useful. It produces also nice interactive graphs that can be quite impactful when documenting your experiments.

(Visited 10 times, 1 visits today)
%d bloggers like this: