Create Machine Learning model to predict online purchase conversion
This tutorial helps you to learn how to create machine learning model to predict online purchase conversion using Google Analytics and BigQuery.
If you had setup Google Analytics for your website to track visitors and you wanted to predict whether visitors will make a transaction/purchase, then you are at the right place.
1. Setup BigQuery export
You can export ga session and hit data from Google Analytics 360 account (Note, currently the integration is available only for Google Analytics 360) in to Google BigQuery.
Here are the steps to link Google Analytics to BigQuery and setup BigQuery export.
Note, the standard version of Google Analytics is completely free to use, but Google Analytics 360 is the premium version and it is not free. So for now let us use ga_sessions_ sample analytics data available in the google_analytics_sample public dataset for our learning purpose.
Navigate to Google Cloud Console -> BigQuery -> Resources -> bigquery-public-data (GCP Project) -> google_analytics_sample (public dataset) -> ga_sessions_
2. Create a BigQuery dataset
Create a dataset using the following command
sneppets@cloudshell:~ (sneppets-gcp)$ bq mk bqml_example Dataset 'sneppets-gcp:bqml_example' successfully created. sneppets@cloudshell:~ (sneppets-gcp)$ bq ls datasetId -------------- bqml_example
3. Create a model
Run the following query in the Query Editor to create machine learning model to predict whether a visitor on the website will make online purchase.
CREATE OR REPLACE MODEL `bqml_example.sample_model` OPTIONS(model_type='logistic_reg') AS SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170631' LIMIT 100000;
In the above query, the criteria considered to check whether online purchase has been made are visitor’s mobile device OS, visitor’s country and number of page views.
For instance, bqml_example is the name of the dataset and sample_model is the name of the model. The model type specified in the above query is binary logistic regression.
Note, when you run CREATE MODEL query, it actually creates a Query Job which runs asynchronously.
4. Evaluate the model
Run the following query
SELECT * FROM ml.EVALUATE(MODEL `bqml_example.sample_model`, ( SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'));
As a result, you would see the following table metrics for logistic regression model.
Row | precision | recall | accuracy | f1_score | log_loss | roc_auc |
1 |
0.432 |
0.05027933 |
0.98532971 |
0.090075 |
0.048936587 |
0.981565 |
5. Use the model
For instance, to predict number of online purchases each visitor makes, you can run the following query. As a result this query will return top 10 visitors by purchases made.
SELECT fullVisitorId, SUM(predicted_label) as total_predicted_purchases FROM ml.PREDICT(MODEL `bqml_example.sample_model`, ( SELECT IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(totals.pageviews, 0) AS pageviews, IFNULL(geoNetwork.country, "") AS country, fullVisitorId FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) GROUP BY fullVisitorId ORDER BY total_predicted_purchases DESC LIMIT 10;
Row | fullVisitorId | total_predicted_purchases |
---|---|---|
1 |
9417857471295131045
|
3
|
2 |
8388931032955052746
|
2
|
3 |
112288330928895942
|
2
|
4 |
0376394056092189113
|
2
|
5 |
1280993661204347450
|
2
|
6 |
806992249032686650
|
2
|
7 |
057693500927581077
|
2
|
8 |
2969418676126258798
|
2
|
9 |
7420300501523012460
|
2
|
10 |
1589021726696497303
|
1
|
Further Learning
- What is Data Loading and loading data into Google BigQuery ?
- Cloud Interconnect is a direct, private connection for extending corporate network to Google Cloud
- Deploy an Application to Kubernetes running on Google Cloud Kubernetes Engine (GKE)
- How to create GCP project on Google Cloud Platform