Correlation vs Causation: How to measure them for your product?

Analytics and data play a crucial role in making business decisions for mobile marketers. For more than a decade, marketers trust data-driven decisions and use statistics to determine the relationship between various factors. As these decisions create an impact on users to further use the app or uninstall, marketers should be cautious enough while analyzing the data and witnessing possible Correlation Vs Causation.

In this article, we will brief you about what is Causation and Correlation, their differences, and examples.

Correlation vs Causation

The basic definition

Correlation

Correlation is a statistical term which denote the degree of relationship between two entities or variables. It refers the association between two data sets to determine the level of resemblance between both. However, every time the correlation leads to causation, it can sometimes be just a coincidence.

Below are the three types of correlation:

Positive Correlation: It refers when you observe both entities A and B increases, i.e. if A increases, then B increases correspondingly or vice versa.
Negative correlation: When entity A increases, entity B decreases or vice versa.
No Correlation: There is no correlation between the two entities, so a change in entity A does not trigger a change in entity B.

Causation

The causation imply the cause and effect of relationship with each other entity (A and B). In the short term, it means that A causes B. In some circumstances, it means that event A and B don’t appear together or it means that one entity A causes the other entity B.

Why is this important?

The objective of scientific analysis and research is to determine the extent to which one entity relates to another entity. For instance:

– Is there any association between an individual’s education level and their health?

– Did a brand’s marketing campaign result in increased product sales?

– Does a change in the UI result in more active users?

Understanding Correlation vs Causation

The basic understanding of the difference between Correlation vs Causation is essential. Because it can make a huge difference, especially when you are about to decide on something erroneous.

For instance:

If you notice there is an increase in the active user counts in the previous month. However, you cannot conclude that it is the outcome of your previous app store optimization efforts. Rather, it is suggested to test and verify this conclusion for the appropriate outcome. If you test your data set, you can come to a conclusion and claim about your causation.

However, we cannot assume the causation without testing it, even if we see two relevant events or seemingly appear together. There are several other possibilities for a related association, including:

– The opposite is real: event B actually causes event A.

– The two events (A and B) are correlated, but they’re actually caused by event C.

– There’s another event involved: event A does cause an event, but as long as IT happens.

– There is a continuous reaction: A causes D, which leads D to cause C (but you only saw that A causes C from your own eyes).

An example of Correlation vs Causation in product analytics

In case if you find causality in your app/product, where certain user behaviors or actions result in a certain outcome.

For instance, consider you launched a new app for your brand. You firmly believe that customer retention for your brand is associated with in-app social activities, and so you develop a new feature that lets customers join communities.

After a month, you introduce new features, and to determine the impact you create two cohorts with randomly selected customers. Below is the observed scenario:

Customers who joined the social community are retained at a greater rate than the standard users.
Around 90% of the users who joined communities are still active from day 1 when compared to 50% of the users who didn’t join.
By Day 7, you observe 60% customer retention among the community users and over 18% of user retention among the users who were not.

However, based on the above rationale you may be curious to come to a conclusion, but there is no enough data to conclude the social community feature is the cause for higher customer retention. All you see is that the above scenarios are correlated.

How to determine causation?

Once you determine the correlation between two events, you can do a test for causation by conducting experiments on the other variables that control the events and measure the difference.

Below mentioned are two such analyses or experiments to identify causation:

Hypothesis testing
A/B/n experiments

Hypothesis testing

The basic hypothesis test will include an H0 – null hypothesis and an H1 – your primary hypothesis. If required you can conduct a secondary hypothesis or tertiary hypothesis.

H0: There is no association between a user joining an in-app community and customer retention.

H1: If a customer joins the social in the first month, then they remain for more than one year.

The main intention is to see any real difference between your various hypotheses. If you can discard the null hypothesis backed with statistical significance (ideally with the confidence of a minimum 95%), you are almost closer to finding the association between your dependent and independent entities. In the above scenario, if you can decline the null hypothesis by understanding that newly joining users in the community resulted in higher user retention rates (while considering confounding variables that influence your outcomes), then you can probably come to a conclusion that there is some association between user retention and community feature.

To test the above hypothesis, create an equation that exactly reflects the associations between your anticipated cause (independent variable) and the effect (result variable). If your model lets you add value for the exposure variable and constantly returns a result that reflects real observed data, you can probably onto something.

A/B/n Experimentation

The second approach is the A/B/n testing, which makes you understand the difference from correlation to causation. Analyze your variables, modify one, and monitor what happens. If your result is constantly changing (in the same trend), you have determined the variable that makes the actual difference.

For instance:

In the above example, it is observed that users joining social communities show higher retention rates, so you have to remove all other associated entities that could possibly influence the result. For example, customers could have taken some other path that ultimately impacted on the retention.

To determine whether there’s real causation, you should find a direct association between customers joining social communities and using the app for the long-term.

Start with onboarding flow, for the next 500 customers who sign up there, separate them into two equal groups. Half (A) will be asked to join social communities when they sign up and the second (B) another half won’t join the community.

Run this experiment for the next one or two months and then compare customer retention rates between these two groups.

If you find that group A has a relatively higher customer retention rate, then you have solid proof for you to come to the conclusion that there is a real relationship between joining social communities and user retention rates. This association is probably worth researching further to understand why social communities drive higher retention rates.

Takeaway

The more ways to identify true Correlation vs Causation within your app/product, the better you shall become to prioritize your efforts for retention and user engagement.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30