Deploy a model as an online endpoint

Learn to deploy a model to an online endpoint, using Azure Machine Learning Python SDK v2.

In this tutorial, we use a model trained to predict the likelihood of defaulting on a credit card payment. The goal is to deploy this model and show its use.

The steps you'll take are:

This video shows how to get started in Azure Machine Learning studio so that you can follow the steps in the tutorial. The video shows how to create a notebook, create a compute instance, and clone the notebook. The steps are also described in the following sections.

Prerequisites

STANDARD_DS3_v2STANDARD_F4s_v2

Set your kernel

Python 3.10 - SDK v2

Create handle to workspace

ml_clientml_client

In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:

  1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
  2. Copy the value for workspace, resource group and subscription ID into the code.
  3. You'll need to copy one value, close the area and paste, then come back for the next one.
from azure.ai.ml import MLClient from azure.identity import DefaultAzureCredential # authenticate credential = DefaultAzureCredential() # Get a handle to the workspace ml_client = MLClient(     credential=credential,     subscription_id="<SUBSCRIPTION_ID>",     resource_group_name="<RESOURCE_GROUP>",     workspace_name="<AML_WORKSPACE_NAME>", )

Register the model

If you already completed the earlier training tutorial, Train a model, you've registered an MLflow model as part of the training script and can skip to the next section.

If you didn't complete the training tutorial, you'll need to register the model. Registering your model before deployment is a recommended best practice.

path

The SDK automatically uploads the files and registers the model.

#从azure.ai.ml.entities导入必要的库从azure.ai.ml.constants导入模型导入资产类型#提供模型详细信息,包括模型文件的#路径,如果您& # 39;我把它们保存在本地。ml flow _ Model = Model(path = & # 34;。/deploy/credit _ defaults _ model/& # 34;,type =资产类型。MLFLOW_MODEL,name = & # 34credit _ defaults _ model & # 34,描述= & # 34;从本地文件创建的MLflow模型。",)#注册模型ml _ client . models . create _ or _ update(ml flow _ model)

确认模型已注册

您可以查看 模型 Azure Machine Learning studio中的页面来标识您注册的模型的最新版本。

Screenshot shows the registered model in studio.

或者,下面的代码将检索最新的版本号供您使用。

registered _ model _ name = & # 34credit _ defaults _ model & # 34#让& # 39;s挑选模型最新版本latest _ model _ version = max([int(m . version)for m in ml _ client . models . list(name = registered _ model _ name)])print(latest _ model _ version)

现在您已经有了一个注册的模型,您可以创建一个端点和部署。下一节将简要介绍这些主题的一些关键细节。

端点和部署

在您训练了机器学习模型之后,您需要部署它,以便其他人可以使用它进行推理。为此,Azure机器学习允许您创建 端点 并补充 部署 对他们来说。

一;一个 端点在此上下文中,是一个HTTPS路径,它为客户端提供了一个接口,用于向经过训练的模型发送请求(输入数据),并接收从模型返回的推理(评分)结果。端点提供:

  • 使用& # 34;密钥或令牌& # 34;基于授权
  • TLS(SSL)终止
  • 稳定得分的endpoint-name.region.inference.ml.azure.com URI

A 部署 是托管进行实际推理的模型所需的一组资源。

Azure Machine Learning supports no-code deployment of a model created and logged with MLflow. This means that you don't have to provide a scoring script or an environment during model deployment, as the scoring script and environment are automatically generated when training an MLflow model. If you were using a custom model, though, you'd have to specify the environment and scoring script during deployment.

Deploy the model to the endpoint ManagedOnlineDeploymentfrom azure.ai.ml.entities import ManagedOnlineDeployment # Choose the latest version of our registered model for deployment model = ml_client.models.get(name=registered_model_name, version=latest_model_version) # define an online deployment # if you run into an out of quota error, change the instance_type to a comparable VM that is available.\ # Learn more on https://azure.microsoft.com/en-us/pricing/details/machine-learning/. blue_deployment = ManagedOnlineDeployment( name="blue", endpoint_name=online_endpoint_name, model=model, instance_type="Standard_DS3_v2", instance_count=1, )

MLClient

# create the online deployment blue_deployment = ml_client.online_deployments.begin_create_or_update(     blue_deployment ).result() # blue deployment takes 100% traffic # expect the deployment to take approximately 8 to 10 minutes. endpoint.traffic = {"blue": 100} ml_client.online_endpoints.begin_create_or_update(endpoint).result()
Check the status of the endpoint
You can check the status of the endpoint to see whether the model was deployed without error:
# return an object that contains metadata for the endpoint endpoint = ml_client.online_endpoints.get(name=online_endpoint_name) # print a selection of the endpoint's metadata print(     f"Name: {endpoint.name}\nStatus: {endpoint.provisioning_state}\nDescription: {endpoint.description}" )
# existing traffic details print(endpoint.traffic) # Get the scoring URI print(endpoint.scoring_uri)
Test the endpoint with sample data

Now that the model is deployed to the endpoint, you can run inference with it. Let's create a sample request file following the design expected in the run method in the scoring script.

import os # Create a directory to store the sample request file. deploy_dir = "./deploy" os.makedirs(deploy_dir, exist_ok=True)

Now, create the file in the deploy directory. The cell below uses IPython magic to write the file into the directory you just created.

%%writefile {deploy_dir}/sample-request.json { "input_data": { "columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22], "index": [0, 1], "data": [ [20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8] ] } }

MLClientinvoke

endpoint_namerequest_filedeployment_name

We'll test the blue deployment with the sample data.

# test the blue deployment with the sample data ml_client.online_endpoints.invoke( endpoint_name=online_endpoint_name, deployment_name="blue", request_file="./deploy/sample-request.json", )

Get logs of the deployment
Check the logs to see whether the endpoint/deployment were invoked successfully If you face errors, see Troubleshooting online endpoints deployment.
logs = ml_client.online_deployments.get_logs(     name="blue", endpoint_name=online_endpoint_name, lines=50 ) print(logs)
Create a second deployment

green

#选择要部署的模型。这里我们使用最新版本的注册模型model = ml _ client . models . get(name = registered _ model _ name,version=latest_model_version) #使用更强大的实例类型定义在线部署#如果遇到配额不足的错误,请将实例类型更改为可用的可比虚拟机。\ #在https://azure . Microsoft . com/en-us/pricing/details/machine-learning/上了解更多信息。green _ deployment = ManagedOnlineDeployment(name = & # 34;绿色& # 34;,endpoint _ name = online _ endpoint _ name,model=model,instance _ type = & # 34标准_ F4s _ v2 & # 34,instance_count=1,)#创建联机部署#预计部署大约需要8到10分钟green _ deployment = ml _ client . online _ deployments . begin _ create _ or _ update(green _ deployment)。结果()

扩展部署以处理更多流量
MLClientgreeninstance_count

在下面的代码中,你& # 39;我将手动增加虚拟机实例。但是,请注意,也可以自动缩放在线端点。Autoscale会自动运行适量的资源来处理应用程序的负载。托管在线端点通过与Azure monitor自动缩放功能集成来支持自动缩放。若要配置自动缩放,请参见自动缩放联机端点。

#更新部署的定义green _ deployment . instance _ count = 2 #更新部署#预计部署大约需要8到10分钟ml _ client . online _ deployments . begin _ create _ or _ update(green _ deployment)。结果()

更新部署的流量分配

绿色

endpoint.traffic = { & # 34蓝色& # 34;: 80, "绿色& # 34;:20 } ml _ client . online _ endpoints . begin _ create _ or _ update(endpoint)。结果()
您可以通过多次调用端点来测试流量分配:
#您可以在范围(30)内为I调用多次端点:ml _ client . online _ endpoints . invoke(endpoint _ name = online _ endpoint _ name,request _ file = & # 34。/deploy/sample-request . JSON & # 34;, )

# test the blue deployment with the sample data
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="blue",
    request_file="./deploy/sample-request.json",
)

Get logs of the deployment

Check the logs to see whether the endpoint/deployment were invoked successfully If you face errors, see Troubleshooting online endpoints deployment.

logs = ml_client.online_deployments.get_logs(
    name="blue", endpoint_name=online_endpoint_name, lines=50
)
print(logs)

Create a second deployment

green
# picking the model to deploy. Here we use the latest version of our registered model
model = ml_client.models.get(name=registered_model_name, version=latest_model_version)
# define an online deployment using a more powerful instance type
# if you run into an out of quota error, change the instance_type to a comparable VM that is available.\
# Learn more on https://azure.microsoft.com/en-us/pricing/details/machine-learning/.
green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name=online_endpoint_name,
    model=model,
    instance_type="Standard_F4s_v2",
    instance_count=1,
)
# create the online deployment
# expect the deployment to take approximately 8 to 10 minutes
green_deployment = ml_client.online_deployments.begin_create_or_update(
    green_deployment
).result()

Scale deployment to handle more traffic

MLClientgreeninstance_count

In the following code, you'll increase the VM instance manually. However, note that it is also possible to autoscale online endpoints. Autoscale automatically runs the right amount of resources to handle the load on your application. Managed online endpoints support autoscaling through integration with the Azure monitor autoscale feature. To configure autoscaling, see autoscale online endpoints.

# update definition of the deployment
green_deployment.instance_count = 2
# update the deployment
# expect the deployment to take approximately 8 to 10 minutes
ml_client.online_deployments.begin_create_or_update(green_deployment).result()

Update traffic allocation for deployments

greenblue
endpoint.traffic = {"blue": 80, "green": 20}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

You can test traffic allocation by invoking the endpoint several times:

# You can invoke the endpoint several times
for i in range(30):
    ml_client.online_endpoints.invoke(
        endpoint_name=online_endpoint_name,
        request_file="./deploy/sample-request.json",
    )
green
logs = ml_client.online_deployments.get_logs(
    name="green", endpoint_name=online_endpoint_name, lines=50
)
print(logs)

View metrics using Azure Monitor

You can view various metrics (request numbers, request latency, network bytes, CPU/GPU/Disk/Memory utilization, and more) for an online endpoint and its deployments by following links from the endpoint's Details page in the studio. Following these links will take you to the exact metrics page in the Azure portal for the endpoint or deployment.

Screenshot showing links on the endpoint details page to view online endpoint and deployment metrics.

If you open the metrics for the online endpoint, you can set up the page to see metrics such as the average request latency as shown in the following figure.

Screenshot showing online endpoint metrics in the Azure portal.

For more information on how to view online endpoint metrics, see Monitor online endpoints.

Send all traffic to the new deployment

green
endpoint.traffic = {"blue": 0, "green": 100}
ml_client.begin_create_or_update(endpoint).result()

Delete the old deployment

Remove the old (blue) deployment:

ml_client.online_deployments.begin_delete(
    name="blue", endpoint_name=online_endpoint_name
).result()

Clean up resources

If you aren't going use the endpoint and deployment after completing this tutorial, you should delete them.

ml_client.online_endpoints.begin_delete(name=online_endpoint_name).result()

Delete everything

Use these steps to delete your Azure Machine Learning workspace and all compute resources.

If you don't plan to use any of the resources that you created, delete them so you don't incur any charges:

  1. In the Azure portal, select Resource groups on the far left.

  2. From the list, select the resource group that you created.

  3. Select Delete resource group.

    Screenshot of the selections to delete a resource group in the Azure portal.

  4. Enter the resource group name. Then select Delete.

Next Steps