ml-system-design-pattern icon indicating copy to clipboard operation
ml-system-design-pattern copied to clipboard

System design patterns for machine learning

Japanese Korean

Machine learning system design pattern

This repository contains system design patterns for training, serving and operation of machine learning systems in production.

Objectives

The main objective of this document is to explain system patterns for designing machine learning system in production.
This document is not the design patterns for developing machine learning model to achieve certain performance in accuracy, though some columns may refer to those use-cases.

Prerequisites

All of the ML system patterns are designed to be deployed on a public cloud or a Kubernetes cluster. The document tries not to be dependent on a certain programming language or platform as possible, though since Python is the most major language for the machine learning technology, most of the patterns can be developed with Python.

For reading

Please refer below for reading:
GitHub Pages

Sample implementations

Some sample implementations are available below. https://github.com/shibuiwilliam/ml-system-in-actions

Patterns

Serving patterns

The serving patterns are a series of system designs for using machine learning models in production workflow.

  • Web single pattern

  • Synchronous pattern

  • Asynchronous pattern

  • Batch pattern

  • Prep-pred pattern

  • Microservice vertical pattern

  • Microservice horizontal pattern

  • Prediction cache pattern

  • Data cache pattern

  • Prediction circuit break pattern

  • Multiple stage prediction pattern

  • Serving template pattern

  • Edge prediction pattern: To do

  • Antipatterns

    • Online bigsize pattern

    • All-in-one pattern

QA patterns

Pattens to evaluate model as well as prediction server.

  • Shadow AB-testing pattern

  • Online AB-testing pattern

  • Loading test pattern

  • Antipatterns

    • Offline-only pattern

Training patterns

Patterns to construct training pipeline.

  • Batch training pattern

  • Pipeline training pattern

  • Parameter and architecture search pattern

  • Antipatterns

    • Only-me pattern

    • Training code in serving pattern

    • Too many pipes pattern

Operation patterns

The operation patterns contain configuration, logging, monitoring and alerting system designs for machine learning system.

  • Model-in-image pattern

  • Model-load pattern

  • Data model versioning pattern

  • Prediction log pattern

  • Prediction monitoring pattern

  • Parameter-based serving pattern

  • Condition-based-serving pattern

  • Antipatterns

    • No logging pattern

    • Nobody knows pattern

Lifecycle patterns

The lifecycle patterns contain composition of several patterns to realize actual ML system with operation.

  • Train-then-serve pattern

  • Training-to-serving pattern

  • Antipatterns

    - Todo


Committers

Contribution

For adding a new pattern, please use template_design.md as a template, and raise an issue and later PR.
For adding a new antipattern, please use template_antipattern.md as a template, and raise an issue and later PR.
To request for improvement, change or question, please propose an issue.

Please read the CLA carefully before submitting your contribution to Mercari. Under any circumstances, by submitting your contribution, you are deemed to accept and agree to be bound by the terms and conditions of the CLA.

https://www.mercari.com/cla/

License

Copyright 2020 Mercari, Inc.

Licensed under the MIT License.