Moving Beyond Correlation: Using Causal Inference to Drive Real Retention Improvements

Moving Beyond Correlation: Using Causal Inference to Drive Real Retention Improvements

Moving Beyond Correlation: Using Causal Inference to Drive Real Retention Improvements

Explore uplift modeling for customer retention: its comparison with churn, evaluation metrics, data needs, and predictive models.

Introduction to Causal Inference

With the rise of big data, businesses can now easily track every customer interaction and transaction to look for patterns in customer behavior. While correlating factors like past purchases or demographics with outcomes like likelihood to churn can provide some insights, correlation does not imply causation. Simply because two events are correlated, does not mean one causes the other.

To truly understand customer behavior and make reliable predictions, businesses need to move beyond correlation and adopt causal reasoning. Causal inference is the process of determining cause-and-effect relationships from data. Rather than relying on observed correlations, causal inference techniques allow you to model the counterfactual - estimating what would happen if you intervened on the system.

One important application of causal inference is uplift modeling. Uplift modeling aims to directly quantify the incremental impact of a business action like a marketing campaign or special offer. For example, a retailer might want to know how sending an email coupon will causally affect a customer's likelihood to purchase. Uplift modeling helps predict the uplift - the increase or decrease in an outcome metric like purchases - attributable to the action.

Unlike traditional predictive models that focus on correlation, uplift modeling is designed to uncover the causal effects of different interventions. This enables businesses to optimize their marketing spend, retention efforts, and other actions to have the greatest impact on desired outcomes like customer retention and lifetime value. By moving from correlation to causation, companies can truly understand the causal drivers of customer behavior.

The Basics of Uplift Modeling

Traditional churn models are good at predicting which customers are likely to churn, but they don't tell you the incremental impact of taking actions to retain those customers. This is where uplift modeling comes in.

Uplift modeling is a predictive modeling technique that directly estimates the causal effect of a treatment or action on an individual's behavior. The key is that it models the incremental lift in response between a treatment group and a control group.

For example, an uplift model could predict how much more likely a customer is to renew their subscription if they receive a retention discount offer versus if they don't. This helps you determine the true impact of the retention campaign.

Uplift modeling segments customers into four categories based on their incremental response:

  • Persuadables: Customers who are more likely to renew with the retention offer. Targeting them will give a positive uplift.

  • Sure Things: Customers likely to renew regardless of whether they get the offer or not. No incremental gain from targeting them.

  • Lost Causes: Customers unlikely to renew even if given the retention offer. No positive uplift.

  • Sleeping Dogs: Customers who are less likely to renew if given the offer. Targeting them could have a negative effect.

By identifying the persuadable customers, uplift modeling allows you to target retention campaigns to only those customers likely to have a positive response. This is far more efficient than mass blasting campaigns to all customers.

The key difference from churn models is that uplift focuses on the incremental lift versus no action, rather than just predicting an absolute outcome. This leads to smarter campaign targeting, reduced costs, and improved retention results.

The Two-Model Approach

One technique for uplift modeling involves building two separate models - a control model and a treatment model. The control model aims to predict customer behavior without any intervention, while the treatment model predicts behavior when exposed to a marketing action or treatment.

To build these models, customers are divided into a control group that is not exposed to the treatment, and a treatment group that receives the marketing action. Each group should be as close to identical as possible - randomly selected from the same population and receiving the same experience except for the treatment.

The control model is trained on data from the control group, excluding any treatment variables. This allows it to learn patterns and make predictions based only on normal behavior. The treatment model is trained on the treatment group, including the marketing action as an input variable.

After training both models, their predictions are compared on a holdout sample to estimate uplift. For each individual customer, the treatment model predicts their expected outcome with the marketing action, while the control model predicts their expected outcome without it.

The difference between these two predictions is the estimated uplift - the incremental impact of the marketing action on that customer's behavior. Customers with high uplift are good treatment targets, while those with low or negative uplift are better left untreated.

The two-model approach provides a simple and intuitive way to estimate causal uplift directly. However, it requires a lot of data and careful experimental design to train robust control and treatment models. Variations in the samples can bias results. Still, this technique is widely used for uplift modeling when randomized trials are feasible.

The Class Transformation Method

The class transformation method is another technique for building uplift models. Unlike the two model approach which trains separate models for the treatment and control groups, this method transforms the data to simulate treatment and control groups within a single dataset.

Here's how it works:

  • The original dataset is duplicated to create two copies of the data

  • In one copy, the treatment indicator column is changed to 1 for all rows

  • In the other copy, the treatment indicator is changed to 0 for all rows

  • The two copies of the data are then concatenated into a single dataset

This transformed dataset now contains simulated treatment and control groups, with the treatment indicator column identifying which group each row belongs to. We can then train a single uplift model on this concatenated data.

The main advantage of this approach is simplicity. By converting the problem into a single modeling task, we avoid the complexity of training and maintaining multiple models. The data transformation also creates balanced treatment and control groups, even if the original data had imbalanced group sizes.

However, the class transformation method has some downsides:

  • It discards the original group assignments, which can introduce bias

  • The simulated treatment and control groups may not accurately reflect reality

  • It provides less flexibility compared to explicitly modeling each group separately

Overall, the class transformation is a straightforward way to build an uplift model from a single dataset. But the two model approach tends to produce more accurate results since it preserves the original group assignments. The choice between these methods depends on the nature of the data and the desired model complexity. In some cases the simplicity of the class transformation may outweigh the limitations.

Modeling Uplift Directly

A more advanced approach is to model the uplift directly as the target variable. This involves developing machine learning models that are optimized to predict the incremental impact of a treatment, rather than just predicting an outcome.

The key steps are:

  1. Engineer features that are likely related to uplift. These could include customer attributes, behavioral data, derived metrics, and campaign metadata.

  2. Model uplift by training directly on the uplift values in your historical training data. For example, uplift could be calculated as the difference in response rate between treatment and control groups.

  3. Use a model capable of predicting a continuous target like uplift. Options include linear regression, neural networks, gradient boosted machines, and model ensembles.

  4. Tune the model to optimize uplift-based metrics rather than standard accuracy.

Modeling uplift directly has some benefits compared to the two model approach:

  • The model is optimized end-to-end for the actual goal of maximizing uplift. This can improve performance.

  • There is no discrepancy between the training objective and evaluation metric.

  • Feature engineering can focus exclusively on predictive uplift factors.

However, directly modeling uplift has greater complexity. It requires:

  • Sufficient historical data with both control and treatment groups.

  • Careful data checks to ensure accurate uplift values.

  • More advanced machine learning algorithms capable of regression.

  • Custom evaluation metrics based on uplift.

So while direct uplift modeling has advantages, it may be more difficult to implement in practice than other approaches. The added complexity must be weighed against potential performance gains for each use case. But in situations with abundant data, directly modeling uplift is an appealing option.

Evaluating Model Performance

Performance evaluation is crucial for uplift modeling to ensure the model is accurately predicting incremental lift. There are several key metrics that can be used:

  • Accuracy measures the overall proportion of correct predictions. However, for uplift modeling we care more about the relative accuracy between treatment and control groups.

  • Uplift curves show the cumulative uplift captured as a function of the percentage of population targeted. A larger area under the curve indicates better performance.

  • Qini coefficient summarizes the uplift curve into a single number, calculated as the area between the uplift curve and the neutral line. Higher Qini is better.

  • Relative uplift compares the difference in outcomes between treatment and control groups to the baseline expected outcome.

To properly evaluate an uplift model, a holdout set is crucial to avoid overfitting. The training set is used to build the model, while the holdout set provides an unbiased estimate of real-world performance. Without a holdout set, it's easy to overestimate uplift.

Some common evaluation pitfalls to avoid:

  • Leakage between training and holdout data

  • Holdout set not representative of real population

  • Failure to compare performance to baseline expected outcome

  • Assessing overall accuracy rather than difference between treatment/control groups

Proper uplift model evaluation requires care and discipline to generate realistic estimates of incremental lift for business decision making. Using an appropriate holdout set and metrics focused on uplift are key.

Traditional Uplift Metrics

Uplift modeling aims to quantify the incremental impact of a treatment (e.g. marketing campaign) on an outcome of interest (e.g. customer retention). To evaluate model performance, we need metrics that can estimate this uplift effect. The most common traditional uplift metrics are:

Absolute Uplift

Absolute uplift is the simple difference between the outcome with the treatment and the outcome without:

Absolute Uplift = Outcome with Treatment - Outcome without Treatment

For example, if the customer retention rate with a loyalty program is 60% and only 40% without, the absolute uplift is:

Absolute Uplift = 60% - 40% = 20

Relative Uplift

Relative uplift expresses the uplift as a percentage of the baseline outcome:

Relative Uplift = (Outcome with Treatment - Outcome without) / Outcome without

Using the same example with 60% and 40% retention rates, the relative uplift is:

Relative Uplift = (60% - 40%) / 40% = 50

The loyalty program improved retention by 50% over baseline.

Net Uplift

Net uplift accounts for the difference between the actual and natural retention rates:

Net Uplift = (Outcome with Treatment - Outcome without) - (Baseline - Outcome without)

If the baseline retention rate was 50%, the net uplift would be:

Net Uplift = (60% - 40%) - (50% - 40%) = 10

This helps isolate the true incremental impact of the treatment.

These metrics provide a simple way to quantify and reason about the uplift generated by campaigns. But more advanced metrics like uplift modeling aim to improve on these approaches.

New Metrics Based on Potential Outcomes

Causal inference researchers have proposed new metrics for evaluating uplift models that aim to account for uncertainty in outcomes. These metrics are based on the concept of potential outcomes, represented by Y.

The optimal outcome is represented by Y*. This is the outcome that would occur if we could perfectly predict uplift and target the optimal customers. New metrics compare the model's predicted uplift to Y* to evaluate how close the predictions come to the optimal targeting.

For example, one metric is the mean squared error (MSE) between the predicted uplift and Y*. This accounts for the level of uncertainty in the predictions. Models with lower MSE are better at minimizing the difference between predictions and optimal uplift.

Another metric is called Qini, defined as the expected uplift given the model's predictions divided by the maximum possible uplift. This measures how much of the total possible uplift is captured by the model. Higher Qini indicates better performance.

The main advantage of metrics based on potential outcomes is that they account for uncertainty and bias in the predictions. Traditional metrics like net uplift and relative uplift can overestimate performance. Potential outcome metrics give a more realistic evaluation.

The limitation is that calculating Y* requires strong assumptions about the causal relationships. There is rarely enough data to know the true optimal outcome. So these metrics still rely on modeling assumptions. But they represent an improvement over metrics that ignore uncertainty.

Data and Modeling Considerations

Accurate uplift modeling requires substantial amounts of historical customer data to estimate the incremental impact of the marketing action or treatment. You need data on both control groups and treatment groups over long periods of time to confidently determine the causal effect.

The key data inputs for uplift modeling typically include:

  • Demographic data such as age, gender, location, income level

  • Customer attributes like lifetime value, churn risk score

  • Engagement and behavioral data like purchases, logins, clicks

  • Transaction history and order values

  • Marketing touchpoints and responses

This data needs to be processed into features that are relevant for predicting uplift. Common preprocessing steps include:

  • Handling missing values

  • Encoding categorical variables

  • Normalizing numeric columns

  • Creating aggregate features like average order value

  • Performing feature selection to reduce dimensionality

Many machine learning algorithms can be used for uplift modeling with appropriate optimization. Common choices include:

  • Logistic regression

  • Decision trees and random forests

  • Neural networks and deep learning models

  • Support vector machines

These algorithms need to be optimized for directly predicting incremental uplift. Off-the-shelf implementations will not work. The model objective needs to focus on maximizing uplift rather than overall accuracy.

With the right optimization, data and algorithms, you can build highly accurate uplift models to guide customer targeting and retention decisions.

Driving Business Impact with Uplift Modeling

The ultimate goal of uplift modeling is to drive real business impact by retaining more customers and generating more revenue. Once you have built and evaluated your uplift model, the next step is to leverage the model predictions to optimize your marketing strategy.

There are two key ways to use uplift modeling results to improve business outcomes:

Simulating Revenue Impact

You can take the predicted uplifts for each customer and plug them into your revenue models to simulate the expected impact of a marketing campaign. For example, if you know the lifetime value of a retained customer, you can calculate the total added revenue from targeting the high uplift customers. This revenue simulation allows you to quantify the monetary value of your uplift modeling efforts.

To conduct the simulation, you will need:

  • Predicted uplifts for each customer

  • Revenue models for customer lifetime value

  • Costs associated with marketing campaign

By comparing the expected revenue to costs, you can determine the ROI of the marketing campaign under different targeting strategies.

Optimizing Marketing Budget Allocation

In addition to revenue simulation, uplift models allow you to optimize how you allocate budget between different customer segments. You can rank customers by predicted uplift and determine the optimal budget split between high vs low uplift groups to maximize revenue.

For example, you may determine that targeting the top 20% highest uplift customers captures 80% of the revenue. This allows you to focus budget on the segments that will generate the most return.

In summary, driving business impact requires leveraging uplift model predictions to conduct revenue simulations and optimize your marketing budget allocations. This allows you to retain more of your high-value customers and maximize the return from your marketing campaigns.

Book a demo

Book a demo

Book a demo

Our CEO (aka our #1 AE) will demo the product, tell you if Lancey is a good fit, and answer any other product questions you have. More fun than your normal discovery/demo call, promise!

© 2024 Lancey Software Inc. All rights reserved.