The Promise of BigQuery ML
Most organizations have far more SQL expertise than ML expertise. Data analysts who’ve spent years mastering SQL often hit a wall when asked to build predictive models-suddenly they need Python, Jupyter notebooks, and an entirely new toolkit.
BigQuery ML changes this equation. It lets you create, train, evaluate, and deploy machine learning models using standard SQL syntax, directly where your data lives. No data movement, no new languages, no separate infrastructure.
What You Can Build with BQML
BigQuery ML supports a wide range of model types:
- Linear and logistic regression
- K-means clustering
- Matrix factorization (recommendations)
- Time series forecasting (ARIMA)
- Boosted trees (XGBoost)
- Deep neural networks
- Imported TensorFlow models
Your First BQML Model in 5 Minutes
Let’s build a customer churn prediction model. Assume you have a table with customer features and a binary churned label.
Step 1: Create the Model
CREATE OR REPLACE MODEL `project.dataset.churn_model`
OPTIONS(
model_type='LOGISTIC_REG',
input_label_cols=['churned']
) AS
SELECT
tenure_months,
monthly_charges,
total_charges,
contract_type,
payment_method,
churned
FROM `project.dataset.customers`
WHERE partition_date = '2024-01-01';
That’s it. BigQuery automatically handles feature encoding, train/test splitting, and hyperparameter tuning. Training typically completes in minutes for datasets under 100GB.
Step 2: Evaluate the Model
SELECT *
FROM ML.EVALUATE(MODEL `project.dataset.churn_model`);
This returns standard classification metrics: precision, recall, accuracy, F1 score, ROC AUC, and log loss.
Step 3: Make Predictions
SELECT
customer_id,
predicted_churned,
predicted_churned_probs[OFFSET(1)].prob AS churn_probability
FROM ML.PREDICT(
MODEL `project.dataset.churn_model`,
(SELECT * FROM `project.dataset.customers` WHERE partition_date = CURRENT_DATE())
)
ORDER BY churn_probability DESC
LIMIT 1000;
You now have a ranked list of customers most likely to churn, ready to feed into your retention campaigns.
Real-World Use Cases
Customer Lifetime Value Prediction
Use regression to predict how much revenue each customer will generate over their lifetime. This powers acquisition cost decisions and personalization strategies.
CREATE MODEL `project.dataset.ltv_model`
OPTIONS(model_type='BOOSTED_TREE_REGRESSOR', input_label_cols=['lifetime_value'])
AS SELECT * FROM `project.dataset.customer_features`;
Demand Forecasting
ARIMA_PLUS models handle seasonality, holidays, and trends automatically. Perfect for inventory planning and resource allocation.
CREATE MODEL `project.dataset.demand_forecast`
OPTIONS(
model_type='ARIMA_PLUS',
time_series_timestamp_col='date',
time_series_data_col='units_sold',
time_series_id_col='product_id',
holiday_region='US'
) AS
SELECT date, product_id, units_sold
FROM `project.dataset.sales_history`;
Customer Segmentation
K-means clustering finds natural groupings in your customer base without labeled data.
CREATE MODEL `project.dataset.customer_segments`
OPTIONS(model_type='KMEANS', num_clusters=5)
AS SELECT
recency_days,
frequency,
monetary_value,
avg_order_value
FROM `project.dataset.rfm_features`;
When to Use BQML vs. Custom Models
Advanced Patterns
Feature Engineering in SQL
BQML’s TRANSFORM clause lets you define feature preprocessing that’s automatically applied during both training and inference:
CREATE MODEL `project.dataset.model`
TRANSFORM(
ML.BUCKETIZE(age, [18, 25, 35, 50, 65]) AS age_bucket,
ML.FEATURE_CROSS(STRUCT(region, product_category)) AS region_product,
ML.QUANTILE_BUCKETIZE(income, 10) AS income_decile,
* EXCEPT(age, income)
)
OPTIONS(model_type='BOOSTED_TREE_CLASSIFIER', input_label_cols=['converted'])
AS SELECT * FROM `project.dataset.training_data`;
Hyperparameter Tuning
Enable automatic hyperparameter tuning with a single option:
CREATE MODEL `project.dataset.tuned_model`
OPTIONS(
model_type='BOOSTED_TREE_CLASSIFIER',
num_trials=20,
max_parallel_trials=5,
hparam_tuning_objectives=['ROC_AUC']
) AS SELECT * FROM training_data;
Model Export and Serving
Export trained models to Cloud Storage for deployment in Vertex AI or other serving infrastructure:
EXPORT MODEL `project.dataset.churn_model`
OPTIONS(URI='gs://bucket/models/churn_v1/');
Performance Considerations
BQML is optimized for batch predictions on large datasets. For real-time inference with sub-100ms latency, export models to Vertex AI endpoints.
Getting Started
If your data is already in BigQuery (or can be), BQML offers the fastest path from data to predictions. We help enterprises design BQML pipelines that integrate with their existing analytics workflows and scale to production workloads.