Handwriting Recognition using Logic Apps, Cognitive Services, E-mail, SharePoint and Teams

Posted Sep 20, 2019

By Rafferty Uy 3 min read

We have a lot of Surface Hubs in our new office which we often use as digital whiteboards. After the meeting, we then send the writings to our mailboxes which we receive as image attachments. The challenge with this is that the text in image attachments is often unsearchable. So we thought, hey… isn’t this really simple to solve using Cognitive Services and Logic Apps? Yes, it really is. We could:

Use Azure Logic Apps (or Microsoft Flow) to watch an inbox for e-mails with image attachments
Send the image to Microsoft Cognitive Services – Computer Vision
Extract the text and save it in a SharePoint list, this way we can use SharePoint’s built-in search feature to search for all our whiteboards
Send a notification to a teams channel

Actual Implementation

Most of the actions in this diagram are already self-descriptive, so I’ll just expand on a few actions.

E-mail Trigger

The trigger is when an e-mail arrives in our inbox with the subject containing the word “ocr”. We wanted to keep it simple so that it’s easy to type using the on-screen keyboard of the Surface Hub.

Computer Vision Cognitive Service

We used the Computer Vision Recognize Text API which required the image file to be in an accessible URL. So we first created/uploaded an image blob in an Azure Storage account.

Handling the Service Response

One of the things that got us is when there are images in the e-mail that doesn’t adhere to the service’s input requirements:

Supported image formats: JPEG, PNG and BMP.
Image file size must be less than 4MB.
Image dimensions must be at least 50 x 50, at most 4200 x 4200.

To handle this, we created a condition to check the status code of the HTTP POST request

In addition, we configured the condition to run even if the previous action fails

Extracting the Text from the Response

We’ve also learned that the Recognize Text API does not immediately output the text extract. The first call actually responds with an “Operation-Location” URL which we will need to make another HTTP GET request on. The Operation-Location URL is also not immediately available, so we had to delay by 10 seconds before doing so.

Finally, we created an Azure function to read the response

  
public static class ParseRecognizeTextJsonResponse
{
    [FunctionName("ParseRecognizeTextJsonResponse")]
    public static async Task<IActionResult> Run(
        [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
        ILogger log)
    {
        log.LogInformation("C# HTTP trigger function processed a request.");

        string name = req.Query["name"];

        //get data from request body
        string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
        dynamic data = JsonConvert.DeserializeObject(requestBody);
        name = name ?? data?.name;

        //get array of lines object
        JArray linesData = data["recognitionResult"]["lines"];

        //Append each read line to a final output string
        string returnString = "";
        foreach (JObject item in linesData)
        {
            string text = (string)item["text"];
            returnString = returnString + text + "\n";
        }

        //Check for correctly formatted query before returning result
        return name != null
            ? (ActionResult)new OkObjectResult(returnString)
            : new BadRequestObjectResult("Please pass a name on the query string or in the request body");
    }
}

SharePoint and Teams

The rest is easy, we added a connector to our SharePoint list and MS Teams channel. If there’s anything complex here, it’s creating a nice custom adaptive card on teams. This site of adaptive card samples helped a lot.

Output

It works!

RazType

This post is licensed under CC BY 4.0 by the author.