Using Azure Computer Vision and Azure functions to import paper documents at scale

Using Azure Computer Vision and Azure functions to import paper documents at scale editorial illustration
新しい名前でも変わらぬ高品質!HelloSign の名称が Dropbox Sign になりました。

The process of converting a collection of paper-based documents into a digital version can be a very daunting and time-consuming task. Data entry workers must gather paper documents, sort them, read them line by line, type necessary details into a computer, review their work for mistakes, and then deal with disposing of the original papers afterward to avoid further buildup.

Thankfully, there is no need for things to be so cumbersome in 2022! Using Azure Computer Vision, you can quickly analyze an image and extract text from it to create a digitized version. All your data entry workers need to do is just scan those documents as images and leave the rest to Azure Computer Vision.

In this tutorial, you'll learn how to build a paper digitization solution that works at scale using Azure Functions and Azure Computer Vision.

Azure Cognitive Services and Azure Computer Vision

If you've had any experience with machine learning/deep learning, you'll know that building any solution that performs any sort of cognitive operations, such as understanding text, detecting objects in an image, or transcribing audio, requires significant know-how.

This know-how is typically outside the developer's skill set and requires specialized machine learning engineers who know about different image and audio processing techniques.

With Azure Cognitive Services, however, there is no need to master all these complex skills. Azure encapsulates all cognitive operations into ready-made APIs that developers can use immediately without needing to understand the underlying machine learning/deep learning science.

One of the many services offered under Azure Cognitive Services is Azure Computer Vision. It enables a wide range of functionalities, such as:

  • Optical character recognition (OCR): used to extract printed and written text from images and documents.
  • Image understanding: used to extract a wide variety of visual features from an image.
  • Spatial analysis: used to understand people's movement in space in real-time.

Optical Character Recognition

Optical character recognition, which you'll focus on heavily in this article, is the process of identifying hand-written or printed text from images to computers. There is a wide range of applications for OCR, such as:

  • Vehicle plate recognition.
  • Digitizing books and unstructured documents.
  • Hand-written signature verification in banking systems.

You'll use the OCR service from the Azure Computer Vision resource when you work through the tutorial below.

Azure Functions

Traditionally, deploying a software application meant maintaining a full stack of infrastructure resources. This stack spans everything from networking, storage, and servers up to the application itself. As you can imagine, managing this complex stack of components consumes valuable time and resources from businesses.

Cloud computing, however, aims to reduce the amount of responsibility a business owns in its application by placing responsibility for particular infrastructure pieces on the vendor instead of the business. One example of this is serverless functions.

By using serverless functions, developers can quickly develop particular pieces of business logic without worrying about any underlying infrastructure stack. Serverless functions make development faster and deployments quicker since developers own only a small part of the application responsibility. The rest is left to the cloud vendor. Azure Functions, or Functions Apps, are Microsoft's implementation of serverless computing.

In this tutorial, you'll use Azure Functions in the solution you'll build soon as a communication mechanism to receive HTTP requests containing images that the OCR service will analyze and process.

Solution Architecture

The solution you’ll be implementing in this tutorial consists of three main elements:

  • An Azure Function triggered through an HTTP request (REST API).
  • An Azure Cognitive Service that performs the computer vision operations.
  • A client who consumes the Azure Function REST API.

Here’s a visual of the architecture for the project:

a diagram demonstrating the flow of the solution being built. The illustration corresponds with the steps below, but also has a sample block of text being passed in and a sample of the text output on the bottom. The only difference between the two blocks of text is the formatting.

As demonstrated in the image above, the solution you'll be building works as follows:

  1. A client sends an image with the text that he wants to recognize using OCR to Azure Function using the HTTP endpoint.
  2. The Azure Function sends the image to Azure Cognitive Services (in particular, the Computer Vision service) to recognize the text using the OCR capability.
  3. The Azure Cognitive Service returns the recognition result to Azure Function.
  4. Azure Function parses the returned result from Azure Cognitive Service and returns a JSON response with the response lines to the client.

Implementing Scalable OCR with Azure

This section provides step-by-step instructions to create your Azure Cognitive Services resource and Azure Function, develop the OCR recognition code, deploy your solution to Microsoft Azure, and test it using an API client.


Prerequisites

Before diving in, here are a few prerequisites you'll need to get started:


Step 1: Create Your Azure Cognitive Service

In Microsoft Azure, all cognitive services are accessible from the Cognitive Services resource, which is hosted as an API endpoint accessible by using a specific key.

In the Azure dashboard, click Create a resource. In the search bar, type "Cognitive Services." You'll get information about the cognitive services resource and a legal notice. Click Create.

You'll need to specify the following details about the cognitive service (refer to the image below for a completed example of this page):

  • Subscription: choose your paid or trial subscription, depending on how you created your Azure account.
  • Resource group: click create new to create a new resource group or choose an existing one. For this tutorial, I'll create a new one and call it "ocr-rg."
  • Region: choose the Azure region for your cognitive service. For best performance, choose the closest region to your geographical area. I'll select "West Europe."
  • Name: choose a name for your cognitive service. I'll put "ocr-cognitive-service." This name should be unique across Azure.
  • Pricing Tier: choose "Standard S0" as the pricing tier, or you can choose "Free F0" for the free tier.

Then click Review + create. You'll get a screen with your choices for validation, as shown in the image below. Make sure the validation passes and then click Create.

a screenshot of the final review screen when creating a resource in Microsft Azure.

Azure will take a few seconds to create the resource. After you see "Your deployment is complete," click Go to resource. Then click on Keys and Endpoint on the left. Here, you'll need to take note of Key 1 or Key 2 (either is fine, just be sure to keep them secure) and the endpoint URL:

a screenshot of the Keys and Endpoints page you land on after creating an Azure resource.

Step 2: Create a Function App

Next, in your Azure dashboard, click Create a resource once more. This time, search for "Function App" in the search bar. You'll get some information about the function app resource and a legal notice. Click Create.

Here, you'll need to specify the following details (refer to image below for completed version):

  • Subscription: choose the subscription where you created the resource group "ocr-rg."
  • Resource Group: select "ocr-rg."
  • Name: choose a name for your function app. I'll choose "ocr-function-app-0." This name should be unique across Azure.
  • Publish as: choose "Code."
  • Runtime stack: choose ".NET."
  • Version: choose "6." This is the latest .NET version, as of the time of writing the article.
  • Region: for best performance, choose the closest region to your geographical area. I'll select "West Europe."

Then, click Review + create. You'll see a screen with your choices for your review, as shown in the image below. Make sure the configurations are correct, and then click Create.

 a screenshot of the final review screen when creating an Azure Function.

Again, Azure will take a few seconds to create the resource. After it's finished, you'll see a message reading, "Your deployment is complete."


Step 3: Create a Function App Solution in Visual Studio

Now, you'll create a Function App in Visual Studio.

Launch Visual Studio 2022 and click on Create a new project. In the search box, type "Azure Functions." Choose the "Azure Functions Project" and then hit Next:

a screenshot of a new Azure Function being created in a Visual Studio IDE. The author is selected ‘Azure function’ from a dropdown before creating the project.

Give your project a name such as "OCRSolution" and then click Create.


Now, you'll need to choose a trigger to execute your Azure function. Since you want your Azure function to trigger whenever an image is sent to it using an API, you'll select "HTTP Trigger."

Then, change the authorization level on the left to "Anonymous" to keep things simple.

Note: do not do this in a production solution, since you'll need to authorize your consumers before using the function.


Make sure your settings match the ones pictured here:

the setup screen for creating a new Azure Function inside Visual Studio with .NET 6 in the dropdown and “http trigger” selected from the list.

Click Create. In a few seconds, Visual Studio will generate your Azure Function.


Step 4: Develop the OCR Function App

Next, rename the auto-generated function in the project to "OCRFunction" (and remember to rename the function class file as well). Also, delete everything in the OCRFunction.cs file (make it a blank file).

You'll need to add a NuGet package to use the cognitive services Computer Vision client from your computer. To do so, follow these steps:

  1. Right-click on the Solution Explorer (this may differ in Visual Studio for Mac)
  2. Select Manage NuGet Packages for Solution (this may differ in Visual Studio for Mac)
  3. Switch to the Browse tab
  4. Type `Microsoft.Azure.CognitiveServices.Vision.ComputerVision`
  5. Select the NuGet package from the list
  6. Select your project to install your package, "OCRSolution"
  7. Choose version "7.0.1"
  8. Click Install
a screenshot of the Computer Vision NuGet package being added to the OCRSolution project in this article. Each step from the section above has a red rectangle around the corresponding part of the user interface in Visual Studio.


Then, you'll add the following namespaces in the OCRFunction.cs file:

	
      using System.IO;
      using System.Threading.Tasks;
      using Microsoft.AspNetCore.Mvc;
      using Microsoft.Azure.WebJobs;
      using Microsoft.Azure.WebJobs.Extensions.Http;
      using Microsoft.AspNetCore.Http;
      using Microsoft.Extensions.Logging;
      using Newtonsoft.Json;
      using System;
      using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
      using System.Threading;
      using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
      using System.Collections.Generic;using System.Text;
	

These namespaces include ordinary C# namespaces as well as the `Microsoft.Azure.CognitiveServices.Vision.ComputerVision` namespace for Azure Computer Vision.


In the OCRFunction class, you'll define variables for your cognitive service `endpoint` and `key`, which you noted earlier when you created the service in the Azure portal (refer to “Step 1: Create Your Azure Cognitive Service” for reference). You'll also define an instance of `ComputerVisionClient`, representing the Computer Vision client in C#:

	
      static string endpoint = "YOUR-END-POINT";
      static string key = "YOUR-KEY";
      static ComputerVisionClient client; 
	

Next, you'll define a private function and call it `Authenticate`. This function simply takes an instance of `ComputerVisionClient` and ties it to a key and endpoint.

	
 private static ComputerVisionClient Authenticate(string endpoint, string key)
 {
 	ComputerVisionClient client =new ComputerVisionClient(new ApiKeyServiceClientCredentials(key)){ Endpoint = endpoint };
 	return client;
 }
	


Then, you'll define another function and name it `ReadImage`. This function takes a `ComputerVisionClient` instance and `IFormFile` (standard .NET file format in HTTP requests) and does the following:

  • Creates a Stream from the IFormFile.
  • Uses the Computer Vision client to call the `RecognizePrintedTextInStreamAsync` function, which calls the Azure Cognitive Service to perform the OCR operation. (Note: there are many other functions under `ComputerVisionClient` that you can explore yourself in Microsoft's docs).
  • The `RecognizePrintedTextInStreamAsync` returns `ocrResult`, representing the result of the OCR operation from the image.
  • The `ocrResult` object consists of multiple regions in a picture. For each region, there are multiple lines and, in each line, there are multiple words.
  • The function calls another function (which you'll define soon) called `ExtractWordsFromOcrResult`. This function converts the `ocrResult` objects to a list of strings. Each string represents a line.
	
 public static async Task<List<string>> ReadImage(ComputerVisionClient client, IFormFile file)
 {
 	List<string> extractedLines;
 	using (var stream = file.OpenReadStream())
 	{
 		var ocrResult = await client.RecognizePrintedTextInStreamAsync(true, stream);
 		extractedLines = ExtractWordsFromOcrResult(ocrResult);
 	}
 	return extractedLines;
 } 
	

Next, you'll define a function called `ExtractWordsFromOcrResult`, which simply takes an `ocrResult` and iterates the region to extract each line. Then, from each line, it extracts each word. Words are concatenated together to form lines, and those lines are added to a `List<string>` object to represent the lines in a region.


Note: for the sake of simplicity, this tutorial assumes that the image contains only one region.

	
private static List<string> ExtractWordsFromOcrResult(OcrResult ocrResult)
{

	var result = new List<string>();
	foreach (var line in ocrResult.Regions[0].Lines) //Assume only one region
	{
		var lineText = new StringBuilder();
		foreach (var word in line.Words)
	{
	lineText.Append(word.Text + " ");
	}
		result.Add(lineText.ToString());
	}
	return result;
}
	

Finally, you'll develop the actual Azure Function. The function does the following:

  • Receives the images through an HTTP Post request.
  • Parses then converts the HTTP request to a .NET IFormFile and uses the `ReadImage` function you defined earlier to perform the OCR operation. (Note: the function performs this for every file in the HTTP request, which enables batch operations, and returns the result to the requesting client.)
  • Returns back the JSON response containing OCR objects to the client.

	
[FunctionName("OCRFunction")]
public static async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = null)] HttpRequest req,ILogger log)
{
client = Authenticate(endpoint, key);
try
 {
  var formdata = await req.ReadFormAsync();
  var files = req.Form.Files;
  var resultCollection = new List<List<string>>();
  foreach (var file in files)
  {
  var fileRsult= await ReadImage(client, file);
  resultCollection.Add(fileRsult);
   } return new OkObjectResult(resultCollection);
    } catch (Exception ex)
     {
      return new BadRequestObjectResult(ex);
      }
}
	

Step 5: Deploy the OCR Function App to Azure

After you've developed the OCR Function, it is now time to deploy it to Microsoft Azure.

To start, right-click on the OCRSolution project in the Solution Explorer:

 a screenshot of the OCRSolution project being selected in the Solution Explorer of Visual Studio.


Click Publish to launch a publication wizard. Choose your publish target as Azure since you are publishing to Microsoft Azure. Then, click Next. Here, choose whether you want your Azure Function to run on Windows or Linux. Select "Windows" and click Next. Note: in other scenarios, you may choose Windows or Linux, depending on whether your code has platform-specific functionalities; however, it does not matter in this case.


Next, you'll choose the subscription, resource group, and the name of the Azure function you created in Microsoft Azure earlier (Note: make sure to log in using the same Microsoft account you used in the Microsoft Azure portal). This is where Visual Studio will deploy your Azure function. Then, click Finish:

a screenshot of the Publish screen for the Azure Function being selected from Visual Studio. There’s are red rectangles around areas to emphasize that the Azure Function being selected is the one created in the preceding steps.


After a short delay, you'll get a "Ready to publish" message, indicating that the wizard successfully created a publish profile for your Azure Function. Confirm the details and then hit Publish. Note: using the "Site" link provided at the bottom of the screen, you can double-check your publish configuration and the deployment URL. You'll want to make a note of the URL now, as you will use it again later.

 a screenshot of the final review screen for publishing a new Azure functions. Red rectangles emphasize the function’s meta information as well as the name and link for the function.


Soon, you'll get an output in the Output window in Visual Studio to indicate that the Azure function was published successfully to Azure:

a screenshot of the console in video studio with a log indicating that the function was successfully published to Azure.


Step 6: Test the OCR Function App Using Postman

Finally, you'll test your OCR Function using the Postman client you downloaded earlier. Follow these steps to do so:

  1. Launch Postman and click +New Collection. Name your collection "OCRFunction."
  2. Under that collection, click Add requests and name your request "OCRFunction" as well.
  3. Click on the request on the left and change the request type to "POST."
  4. Paste your API URL, which you noted during deployment, appended with "/api/OCRFunction." For example: `https://ocr-function-app-0.azurewebsites.net/api/OCRFunction`.
  5. Click on the Body tab.
  6. Select "form-data" as your body type. This enables you to attach files in your HTTP request.
  7. Add a key named "File" and make sure the type is "File" as well.
  8. In the "VALUE" field, you can select one or multiple files. In this scenario, each file corresponds to an image. You'll upload three now, as found in the URLs below. Go ahead and download these images, inspect them, and upload them in the values field:
a screenshot of postman with red rectangles around places that correspond with the numbered steps above this image.


Finally, click Send to test your Azure function. Your OCR result should look like this:

a screenshot of the response from the Azure Function we just deployed. The text has been pulled from an image using OCR.

Of course, you are free to process resulting OCR results as you need in your business case. You may store them in a database, send them using email APIs, or upload them to a storage account.


You can find the final solution, along with the test images, in this GitHub repo. You can also find the full source code in the Appendix at the end of this article for easy reference.

Note: Once you’re finished, you’ll want to clean up the Azure resources you created to ensure in order to avoid unnecessary charges. To do so, complete the following steps:

  • Go to the Microsoft Azure Portal.
  • In the top search bar, type “ocr-rg” (or the name of the resource group you used, if you used a different name).
  • Click on the resource group that pops up in the search result.
  • Click on “Delete resource group.”
  • You’ll get a prompt to type the resource group name to confirm; type “ocr-rg” (or the name of the resource group you used, if you used a different name).
  • Click “Delete,” and in a few seconds, Microsoft Azure will delete the resources

Conclusion

In this tutorial, you learned about Azure Cognitive Services, Azure Computer Vision, and Optical Character Recognition. You also learned about Azure Functions and how to use them to build serverless applications quickly and easily.

Next, you learned the architecture of the solution you built during the tutorial, which consists of an Azure Function API that receives an image, sends it to Azure Cognitive Services for OCR recognition, and returns the JSON response to the client.

Finally, you learned how to create the solution components in Microsoft Azure and how to develop, deploy, and test Azure Functions.

It is worth mentioning that Azure Computer Vision has a rich SDK that can perform many operations. To learn more about it, visit Microsoft MSDN.

Since you are working with digital documents, an e-signature tool is a handy solution to easily and quickly sign your documents. Be sure to check out HelloSign to take your signatures to the next level!

Get news, insights and posts directly in your inbox

ありがとうございます。送信内容を受け取りました。
フォームの送信中にエラーが発生しました。
Oops! Something went wrong while submitting the form.