MLOps: Model management, deployment, and monitoring with Azure Machine Learning

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, learn how to apply Machine Learning Operations (MLOps) practices in Azure Machine Learning for the purpose of managing the lifecycle of your models. Applying MLOps practices can improve the quality and consistency of your machine learning solutions.

What is MLOps?

MLOps is based on DevOps principles and practices that increase the efficiency of workflows. Examples include continuous integration, delivery, and deployment. MLOps applies these principles to the machine learning process, with the goal of:

  • Faster experimentation and development of models.
  • Faster deployment of models into production.
  • Quality assurance and end-to-end lineage tracking.

MLOps in Machine Learning

Machine Learning provides the following MLOps capabilities:

  • Create reproducible machine learning pipelines. Use machine learning pipelines to define repeatable and reusable steps for your data preparation, training, and scoring processes.
  • Create reusable software environments. Use these environments for training and deploying models.
  • Register, package, and deploy models from anywhere. You can also track associated metadata required to use the model.
  • Capture the governance data for the end-to-end machine learning lifecycle. The logged lineage information can include who is publishing models and why changes were made. It can also include when models were deployed or used in production.
  • Notify and alert on events in the machine learning lifecycle. Event examples include experiment completion, model registration, model deployment, and data drift detection.
  • Monitor machine learning applications for operational and machine learning-related issues. Compare model inputs between training and inference. Explore model-specific metrics. Provide monitoring and alerts on your machine learning infrastructure.
  • Automate the end-to-end machine learning lifecycle with Machine Learning and Azure Pipelines. By using pipelines, you can frequently update models. You can also test new models. You can continually roll out new machine learning models alongside your other applications and services.

For more information on MLOps, see Machine learning DevOps.

Create reproducible machine learning pipelines

Use machine learning pipelines from Machine Learning to stitch together all the steps in your model training process.

A machine learning pipeline can contain steps from data preparation to feature extraction to hyperparameter tuning to model evaluation. For more information, see Machine learning pipelines.

If you use the designer to create your machine learning pipelines, you can at any time select the ... icon in the upper-right corner of the designer page. Then select Clone. When you clone your pipeline, you iterate your pipeline design without losing your old versions.

Create reusable software environments

By using Machine Learning environments, you can track and reproduce your projects' software dependencies as they evolve. You can use environments to ensure that builds are reproducible without manual software configurations.

Environments describe the pip and conda dependencies for your projects. You can use them for training and deployment of models. For more information, see What are Machine Learning environments?.

Register, package, and deploy models from anywhere

The following sections discuss how to register, package, and deploy models.

Register and track machine learning models

With model registration, you can store and version your models in the Azure cloud, in your workspace. The model registry makes it easy to organize and keep track of your trained models.

注册型号通过名称和版本进行识别。每次您注册一个与现有模型同名的模型时,注册中心都会增加版本。在注册期间可以提供更多的元数据标签。当您搜索模型时,会用到这些标签。机器学习支持任何可以使用Python 3.5.2或更高版本加载的模型。

打包和调试模型

在您将一个模型部署到产品中之前,它& # 39;打包成一个Docker映像。在大多数情况下,映像创建是在部署期间在后台自动进行的。您可以手动指定图像。

如果在部署中遇到问题,可以在本地开发环境中部署,以便进行故障排除和调试。

转换和优化模型

将您的模型转换为开放式神经网络交换(ONNX)可能会提高性能。平均而言,转换为ONNX可以使性能翻倍。

有关ONNX与机器学习的更多信息,请参见创建和加速机器学习模型。

使用模型

经过训练的机器学习模型被部署为云中或本地的端点。部署使用CPU、GPU进行推理。

将模型部署为端点时,您需要提供以下项目:

  • 用于对提交给服务或设备的数据进行评分的模型。
  • 参赛剧本。该脚本接受请求,使用模型对数据进行评分,并返回响应。
  • 描述模型和条目脚本所需的pip和conda依赖关系的机器学习环境。
  • 模型和输入脚本所需的任何其他资产,例如文本和数据。

您还需要提供目标部署平台的配置。例如,虚拟机系列类型、可用内存和内核数量。创建映像时,还会添加Azure机器学习所需的组件。例如,运行web服务所需的资产。

批量评分

批处理端点支持批处理评分。有关更多信息,请参见端点。

在线端点

  • Notify, automate, and alert on events in the machine learning lifecycle
  • Machine Learning publishes key events to Azure Event Grid, which can be used to notify and automate on events in the machine learning lifecycle. For more information, see Use Event Grid.
  • Automate the machine learning lifecycle

You can use GitHub and Azure Pipelines to create a continuous integration process that trains a model. In a typical scenario, when a data scientist checks a change into the Git repo for a project, Azure Pipelines starts a training job. The results of the job can then be inspected to see the performance characteristics of the trained model. You can also create a pipeline that deploys the model as a web service.

  • The Machine Learning extension makes it easier to work with Azure Pipelines. It provides the following enhancements to Azure Pipelines:
  • Enables workspace selection when you define a service connection.
  • Enables release pipelines to be triggered by trained models created in a training pipeline.

For more information on using Azure Pipelines with Machine Learning, see:

Next steps

Learn more by reading and exploring the following resources:

  • Create multiple versions of an endpoint for a deployment
  • Perform A/B testing by routing traffic to different deployments within the endpoint.
  • Switch between endpoint deployments by updating the traffic percentage in endpoint configuration.

Analytics

Microsoft Power BI supports using machine learning models for data analytics. For more information, see Machine Learning integration in Power BI (preview).

Capture the governance data required for MLOps

Machine Learning gives you the capability to track the end-to-end audit trail of all your machine learning assets by using metadata. For example:

  • Machine Learning datasets help you track, profile, and version data.
  • Interpretability allows you to explain your models, meet regulatory compliance, and understand how models arrive at a result for specific input.
  • Machine Learning Job history stores a snapshot of the code, data, and computes used to train a model.
  • The Machine Learning Model Registry captures all the metadata associated with your model. For example, metadata includes which experiment trained it, where it's being deployed, and if its deployments are healthy.
  • Integration with Azure allows you to act on events in the machine learning lifecycle. Examples are model registration, deployment, data drift, and training (job) events.

Notify, automate, and alert on events in the machine learning lifecycle

Machine Learning publishes key events to Azure Event Grid, which can be used to notify and automate on events in the machine learning lifecycle. For more information, see Use Event Grid.

Automate the machine learning lifecycle

You can use GitHub and Azure Pipelines to create a continuous integration process that trains a model. In a typical scenario, when a data scientist checks a change into the Git repo for a project, Azure Pipelines starts a training job. The results of the job can then be inspected to see the performance characteristics of the trained model. You can also create a pipeline that deploys the model as a web service.

The Machine Learning extension makes it easier to work with Azure Pipelines. It provides the following enhancements to Azure Pipelines:

  • Enables workspace selection when you define a service connection.
  • Enables release pipelines to be triggered by trained models created in a training pipeline.

For more information on using Azure Pipelines with Machine Learning, see:

Next steps

Learn more by reading and exploring the following resources: