Have fun, North America!: How to predict user purchase on a website

As a website-based business, how to predict a user will do a transaction or not, and based on what?

This article will experiment data pipeline and one of popular ML classification algorithms(Random Forrest) to solve the problem.

1. Supposed in a shopping site, user will signup by their dob, gender, city, then a series of action happened after such as merchandize view, add favourite, search etc before first purchase.
* user's demographic: age , gender, city
* user's behaviour: searches, views, favourites

After ETL, a user's data looks like:

id	age	gender	city	searches	views	favourites
12345	36	1	16	25	198	3

2. Sampled users based on paid user or not(binary classification). Sampling:

Total	Paid	Unpaid
20k	8k	12k

* Note: same proportion as full dataset.

3. Load dataset as DataFrame in Spark, split it into training and test sets. Train on first dataset, and then evaluate on test set. The data pipeline as follow:

4. After prediction on test dataset, it got ~90% correctness.

reference: interpret random forest, spark random forest

Have fun, North America!

Thursday, March 24, 2016

How to predict user purchase on a website

No comments:

Post a Comment

Blog Archive