Deploying Vision-Language Models (VLMs) and Large Language Models (LLMs) on Azure Machine Learning Studio Endpoints
A brief step-by-step discussion on deploying VLM/LLM on Azure ML Studio
Deploying Vision-Language Models (VLMs) and Large Language Models (LLMs) on Azure Machine Learning Studio endpoints enables seamless integration of advanced AI capabilities into your applications. This article provides a comprehensive guide to setting up your Azure ML environment, preparing your models, and deploying them as scalable endpoints.
👋 Introduction to Donut
We will deploy the Donut model on Azure Machine Learning Studio as an endpoint. But before we dive into the deployment process, let’s take a closer look at what the Donut model is —
Donut 🍩 or Document Understanding Transformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing).
The Donut model is excellent for tackling document parsing problems. You can check out the model code in this GitHub repository. Make sure to download these files to your local machine.
👨🍳 Preparing Model for deployment
Head over to the GitHub link mentioned above and download the donut
directory or copy its contents locally. The donut_deployment
is our project directory, and the model code resides in the donut
directory.
Now that we have the necessary code, it’s time to download the model. We are going to download and use the naver-clova-ix/donut-base-finetuned-cord-v2
from Hugging Face.
Use the DonutModel
class from the model.py
module to download and save the model locally.
# download_model.ipynb
from donut.model import DonutModel
donut_model = DonutModel.from_pretrained("naver-clova-ix/donut-base-finetuned-cord-v2")
donut_model.save_pretrained("model")
Your project structure should look something like this —
donut_deployment/
├── donut/
│ ├── __init__.py
│ ├── _version.py
│ ├── model.py
│ └── util.py
└── model/
├── pytorch_model.bin
└── config.json
👏 Fantastic! With all the necessary items in place, it’s time to push the model to Azure.
Note: Depending on your specific requirements, the model may vary. However, the method remains the same; you’ll have a parent directory containing your inference code and the model you wish to upload.
🏈 Deploy on Azure ML Studio
Machine Learning Studio is an AI platform developed by Microsoft. It simplifies the development of AI/ML applications and features repositories of models that can be easily deployed with just a few clicks.
🏗 Create Workspace
Head over to ml.azure.com and create a workspace to get started. When you first visit the site, you will see a screen similar to the one below.
You can create a new workspace, or if you already have one, feel free to skip this step and proceed to your workspace.
Give a name, choose your subscription, resource group and region.
Once you’re ready with the workspace, click on it, and you’ll see a page similar to the one below. It will showcase different models and notebook samples to get you started. However, since we have our own model, we won’t be using any of those.
Navigate to the “Models” section on the sidebar. Here, we’ll upload our code and register our model.
You’ll find a list of any registered models and an option to register a new one. Click on “Register” and select “From local files.” This action will open a form similar to the one shown below:
Click on “Browse” and select “Browse folder.” Upload the donut_deployment
directory, which contains both the model and the code.
Next, we need to fill out the model configuration page with the model name, description, version, and tags (if needed).
After completing the model configuration, simply hit “Register.”
Once the model registration is complete, you should see your model listed in the model list.
🌴 Prepare model environment
On the sidebar, navigate to “Environments” and select “Custom environments”. Then, create a new environment.
Give it a name and description, and select “Use existing Docker image with Conda file”.
Next, you will be prompted with a conda.yml
file. Here, we need to specify the project dependencies.
The image below illustrates the dependencies of this project, but feel free to add or remove based on your requirements under “dependencies”.
I’m sharing the content here so that you can quickly copy and get going.
# conda.yml
channels:
- anaconda
- conda-forge
dependencies:
- python=3.10.13
- pip=22.1.2
- pip:
- transformers==4.25.1
- pytorch-lightning==1.6.4
- timm==0.5.4
- zss
- nltk==3.8.1
- safetensors==0.4.2
- sentencepiece==0.2.0
- datasets==2.18.0
- azureml-inference-server-http
name: donut
These are the important steps required. Once we’ve completed these, click “Next” review the content, and then click “Create”. Your environment should be ready in a few minutes.
Once the environment is created, head over to the “Environments” tab to see the environment listed there.
🤗 We’re almost done. The next step is to create the endpoint.
📶 Create Endpoint
From the sidebar, select “Endpoints”. You should see a page similar to the one below —
Click on “Create” and select the model from the list. Give a name to your endpoint and provide a description. Select “Managed compute” type and choose “Key-based authentication.” Then, click “Next.”
Leave the model and deployment config as default.
Next, you should see the “Code” and “Environment” tabs for inference.
❗️This is very important. Here, we need to add a score.py
script which is responsible for inference.
Create a score.py
script and add the following code to the script —
# score.py
import sys
from pathlib import Path
import os
import base64
import logging
import json
from PIL import Image
import torch
import io
sys.path.append(
os.path.join(os.path.join(os.getenv("AZUREML_MODEL_DIR"), "donut_deployment"))
)
from donut import DonutModel
# model inference function
def extract_information(input_img, pretrained_model):
try:
output = pretrained_model.inference(image=input_img, prompt="<s_cord-v2>")[
"predictions"
][0]
return output
except Exception as e:
print(e)
return None
# handle base64 image
def handle_encoded_image(image_base64: str):
img_bytes = base64.b64decode(image_base64)
return Image.open(io.BytesIO(img_bytes))
def init():
"""
This function is called when the container is initialized/started, typically after create/update of the deployment.
You can write the logic here to perform init operations like caching the model in memory
"""
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# Please provide your model's folder name if there is one
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "donut_deployment/model")
model = DonutModel.from_pretrained(model_path)
logging.info("Init complete")
def run(raw_data):
logging.info("Donut model: request received")
image_base64 = json.loads(raw_data)["image"]
image = handle_encoded_image(image_base64)
logging.info("Image object is ready")
model.encoder.to(torch.float32)
model.eval()
output = extract_information(image, model)
logging.info("Image processed")
return {"output": output}
The above code is self-explanatory.
handle_encoded_image
handles base64 images andextract_information
uses the model to extract information. And the rest,init
andrun
are used by azure to run inference.
Great! Now, add this script to the endpoint environment. Select the environment type as “Custom environment” . Once it’s done, click “Next.”
Select the “compute instance”. Based on my observation, Standard_E2_v3 with 2 cores, 16GB ram
works fine with this model. So we are going with that. You can choose others if you want.
Click “Next” , leave the traffic configuration as default, and then click “Create” . Your endpoint should be ready in a few minutes.
🚅 Inference
Our endpoint is ready, and now it’s time to run a test.
From the sidebar, select the “Endpoint,” and you should see your endpoint listed. Select it, and retrieve the URL and the authentication token/key for inference.
We are going to send images as base64, so make sure to convert your images before sending.
Here’s an example for sending a request to the endpoint that we have just created using the requests
package.
# inference.ipynb
import base64
import requests
import json
from PIL import Image
import io
# image to base64
def convert_to_base64(img_path):
with open(img_path, "rb") as f:
image_file = base64.b64encode(f.read()).decode("utf-8")
return image_file
url = "<endpoint-url>"
def extract(img_base64):
payload = {}
headers = {
"Content-type": "application/json",
"Authorization": "Bearer <key-token>",
}
payload["image"] = img_base64
response = requests.post(url=url, headers=headers, data=json.dumps(payload))
return json.dumps(response.json(), indent=4, sort_keys=True)
img_base64 = convert_to_base64("image.png")
extract(img_base64)
"""
{'menu': {'nm': 'PKT AYAM', 'price': '33,000'},
'sub_total': {'subtotal_price': '33,000', 'tax_price': '3,300'},
'total': {'total_price': '36,300',
'cashprice': '50,000',
'changeprice': '13,700',
'menuqty_cnt': '1.00xITEMS'}}
"""
Once you’ve completed testing, make sure to remove the endpoint to prevent any unwanted cost accumulation.
In this guide, we’ve walked through the process of deploying a custom model on Azure Machine Learning Studio endpoints. From setting up the workspace to creating the environment, configuring dependencies, and finally creating and testing the endpoint, we’ve covered every step to ensure a smooth deployment process.
Remember, while Azure offers flexibility and scalability, it’s essential to monitor usage and costs. Always ensure to remove endpoints and resources when they’re no longer needed to avoid unnecessary expenses.
I hope this guide has been informative and helpful in your journey to deploying AI models on Azure. With the right tools and knowledge, you can unlock the full potential of your AI projects and drive innovation in your organization.
🤗 Happy LLM Deploying!
Follow🔔 Dipankar Medhi for more such awesome AI and LLM related content.