In machine learning, model deployment is the process of integrating a machine learning model into an existing production environment where it can take in an input and return an output.

Imagine that you’ve spent several months creating a machine learning model that can determine if a transaction is fraudulent or not with a near-perfect f1 score. That’s great, but you’re not done yet. Ideally, you would want your model to determine if a transaction is fraudulent in real-time so that you can prevent it from going through in time. This is where model deployment comes in.

Most online resources focus on the prior steps to the machine learning life cycle like exploratory data analysis (EDA), model selection and model evaluation. However, model deployment is a topic that seems to be rarely discussed because it can be complicated. Deployment isn’t well understood by those without a background in software engineering or DevOps. 

In this article, you’ll learn what model deployment is, the high-level architecture of a model, different methods in deploying a model and factors to consider when determining your method of deployment.

 

What Is Model Deployment?

Deploying a machine learning model, also known as model deployment, simply means integrating a machine learning model into an existing production environment where it can take in an input and return an output. The purpose of deploying your model is so that you can make the predictions from a trained machine learning model available to others, whether that be users, management or other systems. 

Model deployment is closely related to machine learning systems architecture, which refers to the arrangement and interactions of software components within a system to achieve a predefined goal.

 

Model Deployment Criteria

Before you deploy a model, there are a couple of criteria that your machine learning model needs to achieve before it’s ready for deployment:

  1. Portability: This refers to the ability of your software to be transferred from one machine or system to another. A portable model is one with a relatively low response time and one that can be rewritten with minimal effort.
  2. Scalability: This refers to how large your model can scale. A scalable model is one that doesn’t need to be redesigned to maintain its performance.

This will all take place in a production environment, which is a term used to describe the setting where software and other products are actually put into operation for their intended uses by end users.

 

Machine Learning System Architecture for Model Deployment

At a high-level, there are four main parts to a machine learning system:

  1. Data layer: The data layer provides access to all of the data sources that the model will require.
  2. Feature layer: The feature layer is responsible for generating feature data in a transparent, scalable and usable manner.
  3. Scoring layer: The scoring layer transforms features into predictions. Scikit-Learn is most commonly used and is the industry standard for scoring.
  4. Evaluation layer: The evaluation layer checks the equivalence of two models and can be used to monitor production models. It’s used to monitor and compare how closely the training predictions match the predictions on live traffic.

 

3 Model Deployment Methods to Know

There are three general ways to deploy your ML model: one-off, batch, and real-time.

 

1. One-off

You don’t always need to continuously train a machine learning model to deploy it. Sometimes a model is only needed once or periodically. In this case, the model can simply be trained ad-hoc when it’s needed and pushed to production until it deteriorates enough to require fixing.

 

2. Batch

批量训练允许您不断地拥有模型的最新版本。这是一种可扩展的方法,一次获取一个子样本数据,无需在每次更新时使用完整的数据集。如果您在一致的基础上使用模型,但不一定需要实时预测,这是很好的。

 

3.实时

在某些情况下,您需要实时预测,例如确定交易是否是欺诈性的。这可以通过使用在线机器学习模型来实现,例如使用随机梯度下降的线性回归。

关于如何部署机器学习模型的教程。|视频:Thu Vu数据分析

 

要考虑的4个模型部署因素

在决定如何部署机器学习模型时,有许多因素和影响应该考虑。这些因素包括以下内容:

  1. 预测生成的频率以及需要结果的紧急程度。
  2. 预测应该单独生成还是批量生成。
  3. 模型的延迟要求、用户拥有的计算能力以及期望的服务级别协议(SLA)。
  4. 部署和维护模型所需的运营影响和成本。

了解这些因素将有助于您在一次性、批处理和实时模型部署方法中做出决定。