What’s next after Machine Learning application Prototyping.

A sneak peek of what ML application deployment looks like.

Sebastien Sime
7 min readJul 14, 2021
The Whole world part under ML prototype code

Must of Junior Data Scientists, like it is indeed required, spend a significant amount of time of ML prototyping, but it was quite a surprise to discover that ML code only represents about 5 to 10% of all the required code for a proof of concept alone.

The general process when developing a ML application can be depicted as it follows:

Even if considerable efforts are generally involved in early stages of the process, the deployment stage alone imply maintaining, testing, versioning, monitoring etc.

Moreover, lots of questions about the desired degree of automation and the deployment strategy have to be answered when deploying. The deployment strategy will be different depending on whether we want to launch a new product, we want to automate a process or we are replacing a previous ML application.

So without going deeper into the process (which would be totally out of scope), we can ask ourselves what deploying a ML application looks like at a lower prospective? To answer this question we will deploy a computer vision model trained to detect common objects in picture.

A Little bit of strategy first.

As it is advised, let’s just quickly define our strategy before dig deeper simply to state what we are trying to achieve and explain some general concepts for those not familiar with what will follow.

Deploying an ML model or application means make the model ready to be literally served or used by other applications or users like ourselves. To only focus on deployment, we will assume we already have a functional machine learning model or app that takes inputs (pictures in our cases) and provide some outputs (detect the objects on this pictures).

Right after the prototyping stage, the model is only on our PC and we are interested in allowing others (users or applications) to use our model without interfering into the process. This generally imply the usage of a client-server architecture you perhaps heard about, because what we need is to host our model on a server (like in a restaurant) where clients (others users or applications) will send their requests either to retrieve information or provide data to be processed.

Server — Client abstraction

Request will be sent through an URL or an endpoint acting like a connection plug to our system. We can then see that, in addition to our ML model, we will need something to define and handle URL (uniform resource location). So right after the prototyping stage, the deployment strategy would generally be the following:

  1. Save the model and dependencies (or requirements which are the list of all the modules libraries used in the prototyping stage).
  2. Define functions that will load the model and the requirements.
  3. Wrap the model into an application (Flask or FastAPI in Python) that will handle communication i.e. provide connection plugs and handle requests).
  4. Using a tech to host the application and requirements (eg. Docker which is a kind of virtual machine, or locally).
  5. Make the tech available to the world either directly from your PC or using a cloud solution (AWS, GCP, Azure, etc.) to make it automated and scalable.

The stack that we will use.

Now that buzz words are no more only buzz words we can take a step further by defining our tools for the task. We will mainly use Python as the main language. As the computer vision model we will use cvlib which is a really powerful open source python library that use OpenCV and Tensorflow behind the scene.

For simplicity a local PC will be used to host the model and the application. The later will be built using fastAPI that can be used to create web servers to host your models very easily. Additionally, this platform is quite fast and it has a built-in client that can be used to interact with the server.

Finally the technical stuffs.

Just to be clear, the application that we want to build will take a picture and provide on the given picture a boundary box framing the detected object in addition to the label and the level of confidence for the prediction.

Application input and outputs

When deploying an application, it is custom to store the uploaded pictures as well as the predicted ones for monitoring. Monitoring is a major concept when it comes to continuously asses the performance of the system over time.

For the demonstration, a local PC will be used as server and the application will be created using fastAPI.

Loading python libraries.

First we will load the libraries into our environment:

import io 
import os
from IPython.display import Image, display
import uvicorn
import numpy as np
import nest_asyncio
from enum import Enum
from fastapi import FastAPI, UploadFile, File, HTTPException
from fastapi.responses import StreamingResponse

Define the application on the server.

Let’s just have a look of the code and comment later.

# create the folder to store uploaded pictures
dir_name = "images_uploaded"
if not os.path.exists(dir_name):
os.mkdir(dir_name)
# You will interact with the API using an instance of the of the fastAPI class assigned to a variable called "app".
app = FastAPI(title='Deployment ML model 101')
# Useful when the options are pre-defined, a list of available models using Enum
class Model(str, Enum):
yolov3tiny = "yolov3-tiny"
yolov3 = "yolov3"
# By using @app.get("/") you are allowing the GET method to work for the / endpoint.
@app.get("/")
def home():
return "Congratulations! Your API is working as expected!"
# This endpoint handles all the logic necessary for the object detection to work. It requires the desired model, the confidence interval that we want to use and the image on which to perform object detection.
@app.post("/predict")
def prediction(model: Model, confidence: float, file: UploadFile = File(...)):
# 1. VALIDATE INPUT FILE: will produce FALSE if not in the extension list
filename = file.filename
fileExtension = filename.split(".")[-1] in ("jpg", "jpeg", "png")
if not fileExtension:
raise HTTPException(status_code=415, detail="Unsupported file provided.")

# 2. TRANSFORM RAW IMAGE INTO CV2 image

# Read image as a stream of bytes
image_stream = io.BytesIO(file.file.read())

# Start the stream from the beginning (position zero)
image_stream.seek(0)

# Write the stream of bytes into a numpy array
file_bytes = np.asarray(bytearray(image_stream.read()), dtype=np.uint8)

# Decode the numpy array as an image
image = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)


# 3. RUN OBJECT DETECTION MODEL

# Run object detection
#bbox, label, conf = cv.detect_common_objects(image, model=model)
bbox, label, conf = cv.detect_common_objects(image, confidence=confidence, model=model)

# Create image that includes bounding boxes and labels
output_image = draw_bbox(image, bbox, label, conf)

# Save it in a folder within the server
cv2.imwrite(f'images_uploaded/{filename}', output_image)


# 4. STREAM THE RESPONSE BACK TO THE CLIENT

# Open the saved image for reading in binary mode
file_image = open(f'images_uploaded/{filename}', mode="rb")

# Return the image as a stream specifying media type
return StreamingResponse(file_image, media_type="image/jpeg")

To start the server we can execute the following code:

# Code to launch the server
nest_asyncio.apply()
# Local Host address
host = "127.0.0.1"
# Spin up the server!
uvicorn.run(app, host=host, port=8000)

This is basically all the code that need to be hosted on the server side. Running this code will spin up the server, but causes the notebook to block until you manually interrupt the kernel.

Going to http://127.0.0.1:8000 will let you know if the API is up and running. To directly interact with the API you will need to go to the following address http://localhost:8000/docs to access the API web interface (see the picture below).

API web interface

Under the predict option, you will be able to provide the required parameters and inputs to run a prediction (by hitting the try it out button).

Predict button

Call the application from the client.

Without using the above interface it is possible to interact with the model API through a client interface. At this point the client could be any piece of tech that can send requests to the our server and decode the answer back.

Using Python and a jupyter Notebook as a minimal interface, we need:

  1. To build the URL that will connect to our API and send parameters and inputs
  2. Use request Python module to send post requests
  3. Design a function that will get the response, decode the results and plot the picture with the prediction.

As an example, we have the code below:

# Predictied file directory:
dir_name = "images_predicted"
if not os.path.exists(dir_name):
os.mkdir(dir_name)
# POST URL:
url = 'http://localhost:8000/predict?model=yolov3-tiny&confidence=0.5'
# Request function:
def server_request(url, image_file):
"""Function to the communicate with the server"""
files = {'file': image_file}
response = requests.post(url, files=files)
status_code = response.status_code
return response
# Decoding function:
def display_image_from_response(response):
"""Function to decode the response and plot the results"""
image_stream = io.BytesIO(response.content)
image_stream.seek(0)
file_bytes = np.asarray(bytearray(image_stream.read()), dtype=np.uint8)
image = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)
filename = "image_with_objects.jpeg"
cv2.imwrite(f'images_predicted/{filename}', image)
display(Image(f'images_predicted/{filename}'))

And then what’s next?

Well needless to say that by doing all this we only scratch the surface of what deployment looks like. However we now have a glimpse of what ML ops looks like. There is still much to do and to learn from here but I hope this post would be helpful for AI enthusiasts like myself. I’m eager to learn and to share more!

--

--