A/B Test Presentation

Presentation Slide_A3_Rebecca Patrick

Introduction

Why Not Watch? (WNW) is a video streaming service that focuses on continually improving their services. This time, WNW would like to refine their recommendation engine algorithm to provide better recommendations for their customers. Better recommendations is important as it leads to an increase in user engagement and average hours watched per user per day (an important metric that opens opportunities for advertising revenue). To assess the success of this new algorithm, WNW conducted an A/B testing where participants are divided into two groups: Group A (control group) and Group B (treated group). The new algorithm went live at 1 minute past midnight on the 18th of July.

Problem Statement

The executives at WNW are requesting analysis on the results of the newly recommended engine algorithm to determine whether implementation of this new algorithm is worthwhile. Effectiveness of this new algorithm in improving user engagement, especially increasing the average hours watched per user per day, should be assessed. Furthermore, additional information regarding the A/B testing process that has been conducted including the selection of the sample data are also of interest. To assess whether the new algorithm is worth to be implemented, we will first assess the data, analyse any bias, evaluate the A/B testing that has been conducted, and come up with a final conclusion and recommendation for this experiment.

Conclusion

This report is trying to examine WNW’s new recommendation engine algorithm which aims to increase the average hours watched per user per day (represented by ‘hours_watched’ variable), an important metric used to price ads for third party marketing companies. The report started with data explanation and exploration. From here, we found that there is uneven representation and groupings between the sample size contained within Group A and Group B. 

Additionally, we also found that social_metric and demographic variable is statistically significant in affecting hours_watched. We also noticed that the relationship between belonging in either Group A or B is statistically significant to the hours_watched variable. However, the impact of belonging in either groups is relatively small in affecting the value of ‘hours_watched’. In other words, despite having statistically significant relationship, the impact of belonging in either Group A or B is rather small on the number of hours watched. This can indicate that actually belonging in either sample group has small effect on the hours_watched value. 

Next, we calculate the minimum sample size for this test and found that the sample size has exceeded this minimum number. Then, we use hypothesis testing to determine the success of the A/B test conducted using two sample t-test. The test reveals that the new engine algorithm (implemented on the 18th of July) has improved the average hours watched of customers in sample Group B. Therefore, based on the parameters used in t-test, adopting this new algorithm is beneficial for WNW. 

Lastly, three recommendations for best practice in future A/B testing such as sample randomisation, adhering to minimum sample size, and considering test period were also provided. In conclusion, the hypothesis test on the A/B test conducted prove that there is an increase in mean of Group B sample. However, considering other aspect in this A/B test (randomisation, sample size, and A/B test duration), one can conclude that there is a high possibility of bias present in the sample groups; present of bias is potent in changing the viability of the A/B test result altogether. Therefore, it is advised that a revised A/B test is conducted with better sampling procedures, and possibly a longer timeframe so that we can be more sure of the actual repercussion of this new algorithm.

Resources

R_Markdown_A3_Rebecca_Patrick.pdf