Thursday, March 24, 2016

How to predict user purchase on a website

As a website-based business,   how to predict a user will do a transaction or not, and based on what?

This article will experiment data pipeline and one of popular ML classification algorithms(Random Forrest) to solve the problem.

1. Supposed in a shopping site,  user will signup by their dob, gender, city, then a series of action happened after such as merchandize view, add favourite, search etc before first purchase.
* user's demographic: age , gendercity
* user's behaviour: searches, views, favourites

After ETL, a user's data looks like:
id age gender city searches views favourites
12345 36 1 16 25 198 3


2. Sampled users based on paid user or not(binary classification). Sampling:
Total Paid Unpaid
20k 8k 12k
* Note: same proportion as full dataset.

3. Load dataset as DataFrame in Spark, split it into training and test sets. Train on first dataset, and then evaluate on test set. The data pipeline as follow:














4. After prediction on test dataset, it got ~90% correctness.


reference: interpret random forestspark random forest

No comments:

Post a Comment