Machine Learning with SQL
Insights
Technical / Data / July 2024 · 11 min read

Machine Learning with SQL

How to leverage BigQuery ML to build and deploy machine learning models using familiar SQL syntax.

BigQuery ML SQL Vertex AI Cloud Functions

The Promise of BigQuery ML

Most organizations have far more SQL expertise than ML expertise. Data analysts who’ve spent years mastering SQL often hit a wall when asked to build predictive models-suddenly they need Python, Jupyter notebooks, and an entirely new toolkit.

BigQuery ML changes this equation. It lets you create, train, evaluate, and deploy machine learning models using standard SQL syntax, directly where your data lives. No data movement, no new languages, no separate infrastructure.


What You Can Build with BQML

BigQuery ML supports a wide range of model types:

  • Linear and logistic regression
  • K-means clustering
  • Matrix factorization (recommendations)
  • Time series forecasting (ARIMA)
  • Boosted trees (XGBoost)
  • Deep neural networks
  • Imported TensorFlow models

Your First BQML Model in 5 Minutes

Let’s build a customer churn prediction model. Assume you have a table with customer features and a binary churned label.

Step 1: Create the Model

CREATE OR REPLACE MODEL `project.dataset.churn_model`
OPTIONS(
  model_type='LOGISTIC_REG',
  input_label_cols=['churned']
) AS
SELECT
  tenure_months,
  monthly_charges,
  total_charges,
  contract_type,
  payment_method,
  churned
FROM `project.dataset.customers`
WHERE partition_date = '2024-01-01';

That’s it. BigQuery automatically handles feature encoding, train/test splitting, and hyperparameter tuning. Training typically completes in minutes for datasets under 100GB.

Step 2: Evaluate the Model

SELECT *
FROM ML.EVALUATE(MODEL `project.dataset.churn_model`);

This returns standard classification metrics: precision, recall, accuracy, F1 score, ROC AUC, and log loss.

Step 3: Make Predictions

SELECT
  customer_id,
  predicted_churned,
  predicted_churned_probs[OFFSET(1)].prob AS churn_probability
FROM ML.PREDICT(
  MODEL `project.dataset.churn_model`,
  (SELECT * FROM `project.dataset.customers` WHERE partition_date = CURRENT_DATE())
)
ORDER BY churn_probability DESC
LIMIT 1000;

You now have a ranked list of customers most likely to churn, ready to feed into your retention campaigns.


Real-World Use Cases

Customer Lifetime Value Prediction

Use regression to predict how much revenue each customer will generate over their lifetime. This powers acquisition cost decisions and personalization strategies.

CREATE MODEL `project.dataset.ltv_model`
OPTIONS(model_type='BOOSTED_TREE_REGRESSOR', input_label_cols=['lifetime_value'])
AS SELECT * FROM `project.dataset.customer_features`;

Demand Forecasting

ARIMA_PLUS models handle seasonality, holidays, and trends automatically. Perfect for inventory planning and resource allocation.

CREATE MODEL `project.dataset.demand_forecast`
OPTIONS(
  model_type='ARIMA_PLUS',
  time_series_timestamp_col='date',
  time_series_data_col='units_sold',
  time_series_id_col='product_id',
  holiday_region='US'
) AS
SELECT date, product_id, units_sold
FROM `project.dataset.sales_history`;

Customer Segmentation

K-means clustering finds natural groupings in your customer base without labeled data.

CREATE MODEL `project.dataset.customer_segments`
OPTIONS(model_type='KMEANS', num_clusters=5)
AS SELECT
  recency_days,
  frequency,
  monetary_value,
  avg_order_value
FROM `project.dataset.rfm_features`;

When to Use BQML vs. Custom Models

Technology Stack
Use BQML - Structured/tabular data, standard ML tasks, rapid iteration, SQL expertise available
Use Custom Models - Unstructured data (images, text), novel architectures, extreme customization, real-time edge inference

Advanced Patterns

Feature Engineering in SQL

BQML’s TRANSFORM clause lets you define feature preprocessing that’s automatically applied during both training and inference:

CREATE MODEL `project.dataset.model`
TRANSFORM(
  ML.BUCKETIZE(age, [18, 25, 35, 50, 65]) AS age_bucket,
  ML.FEATURE_CROSS(STRUCT(region, product_category)) AS region_product,
  ML.QUANTILE_BUCKETIZE(income, 10) AS income_decile,
  * EXCEPT(age, income)
)
OPTIONS(model_type='BOOSTED_TREE_CLASSIFIER', input_label_cols=['converted'])
AS SELECT * FROM `project.dataset.training_data`;

Hyperparameter Tuning

Enable automatic hyperparameter tuning with a single option:

CREATE MODEL `project.dataset.tuned_model`
OPTIONS(
  model_type='BOOSTED_TREE_CLASSIFIER',
  num_trials=20,
  max_parallel_trials=5,
  hparam_tuning_objectives=['ROC_AUC']
) AS SELECT * FROM training_data;

Model Export and Serving

Export trained models to Cloud Storage for deployment in Vertex AI or other serving infrastructure:

EXPORT MODEL `project.dataset.churn_model`
OPTIONS(URI='gs://bucket/models/churn_v1/');

Performance Considerations

100GB+
Training Data
Supported per model
<1s
Batch Prediction
Latency per row

BQML is optimized for batch predictions on large datasets. For real-time inference with sub-100ms latency, export models to Vertex AI endpoints.


Getting Started

If your data is already in BigQuery (or can be), BQML offers the fastest path from data to predictions. We help enterprises design BQML pipelines that integrate with their existing analytics workflows and scale to production workloads.

On this page

Share this article

Ready to get started?

Let's discuss how we can help with your project.

Contact Us

Work with us

Let’s build something together

Our team can help you turn these ideas into production systems.