Back to Blog
AI

Deploying Machine Learning Models to Production

16 min read

Dr. Lisa Wang

ML Engineer

Deploying Machine Learning Models to Production

Deploying Machine Learning Models to Production

Taking ML models from development to production requires careful planning and robust infrastructure.

Model Serving Architecture

REST API Approach

Simple and widely supported:

from fastapi import FastAPI from pydantic import BaseModel import joblib app = FastAPI() model = joblib.load('model.pkl') class PredictionRequest(BaseModel): features: list[float] @app.post("/predict") async def predict(request: PredictionRequest): prediction = model.predict([request.features]) return {"prediction": prediction.tolist()}

gRPC for High Performance

import grpc from concurrent import futures import prediction_pb2 import prediction_pb2_grpc class PredictionService(prediction_pb2_grpc.PredictionServicer): def Predict(self, request, context): # Load and run model result = model.predict(request.features) return prediction_pb2.PredictionResponse(prediction=result)

Model Versioning

MLflow for Model Registry

import mlflow import mlflow.sklearn # Register model mlflow.sklearn.log_model( model, "model", registered_model_name="sales_predictor" ) # Load specific version model = mlflow.pyfunc.load_model( model_uri="models:/sales_predictor/Production" )

Containerization

Docker for ML Services

FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy model and code COPY model.pkl . COPY app.py . # Expose port EXPOSE 8000 # Run application CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Scalability

Horizontal Scaling

Deploy multiple instances behind a load balancer:

  • Use Kubernetes for orchestration
  • Implement health checks
  • Configure auto-scaling policies
  • Monitor resource usage

Batch Prediction

For non-real-time scenarios:

import pandas as pd from prefect import flow, task @task def load_data(path: str) -> pd.DataFrame: return pd.read_csv(path) @task def make_predictions(data: pd.DataFrame, model): return model.predict(data) @task def save_predictions(predictions, output_path: str): pd.DataFrame(predictions).to_csv(output_path) @flow def batch_prediction_pipeline(input_path: str, output_path: str): data = load_data(input_path) predictions = make_predictions(data, model) save_predictions(predictions, output_path)

Model Monitoring

Performance Metrics

Track key metrics:

  • Prediction latency
  • Throughput (requests/second)
  • Error rate
  • Model accuracy/precision/recall

Data Drift Detection

Monitor input distribution changes:

from evidently import ColumnMapping from evidently.report import Report from evidently.metric_preset import DataDriftPreset report = Report(metrics=[ DataDriftPreset(), ]) report.run( reference_data=train_data, current_data=production_data, column_mapping=column_mapping )

Concept Drift

Monitor model performance degradation:

  • Track prediction accuracy over time
  • Set up alerts for significant drops
  • Implement A/B testing for new models
  • Automate retraining pipelines

Feature Store

Centralized Feature Management

from feast import FeatureStore store = FeatureStore(repo_path=".") # Get online features for prediction features = store.get_online_features( features=[ "user_features:age", "user_features:location", "product_features:category", ], entity_rows=[{"user_id": 123, "product_id": 456}], ).to_dict()

CI/CD for ML

Automated Model Pipeline

# .github/workflows/ml-pipeline.yml name: ML Pipeline on: push: branches: [main] jobs: train-and-deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Train Model run: python train.py - name: Evaluate Model run: python evaluate.py - name: Deploy if Better run: python deploy.py

Security Considerations

Input Validation

  • Validate all input data
  • Sanitize features
  • Set rate limits
  • Implement authentication

Model Protection

  • Encrypt model files
  • Use model serving frameworks
  • Implement access controls
  • Monitor for adversarial attacks

Best Practices

  1. Separate Training and Serving Code: Keep concerns isolated
  2. Version Everything: Models, data, and code
  3. Monitor Continuously: Track performance and data quality
  4. Automate Testing: Unit tests, integration tests, model validation
  5. Implement Rollback: Quick recovery from bad deployments
  6. Document Thoroughly: Model cards, API docs, runbooks
  7. Plan for Failure: Graceful degradation, fallback models

Deployment Strategies

Shadow Deployment

Run new model alongside current, compare results:

  • Zero risk to production
  • Real-world performance data
  • Confidence in new model

Canary Deployment

Gradually route traffic to new model:

  • 5% → 25% → 50% → 100%
  • Monitor metrics at each stage
  • Quick rollback if issues

Blue-Green Deployment

Maintain two identical environments:

  • Instant switchover
  • Easy rollback
  • Zero downtime

Conclusion

Successful ML deployment requires treating models as first-class software artifacts with proper versioning, monitoring, and operational practices.

Related Posts