AN INTRODUCTION TO MICROSOFT COGNITIVE SERVICES – VISION API

Microsoft’s tag line, “GIVE YOUR APPS A HUMAN SIDE” for their Cognitive Services best describe what they can do. In short, Microsoft’s Cognitive Services provide pre-written and well tested artificial intelligence algorithms which can be integrated into your apps just by adding a few lines of code.

Once you integrate Cognitive Services into your app, your app has the ability to SEE, RECOGNIZE, HEAR and even understand the SENTIMENT in your words.

Synegrate wanted to test these services, so we got our hands dirty by doing a simple little POC. We decided to test the Vision API’s Optical Character Recognition capabilities.

Vision API filters out actionable information from the image provided to it. In our case we provided it with images which contained text so as to test the character recognition capabilities of Vision API. Subsequently we saved the text along with the image both in an Azure SQL database. The database provides full text search capability on the text saved, allowing us to query the image on the basis of words contained in them.

Our Synegrate Vision API Proof of Concept

For our POC we will create two console apps:

The first console app would simulate a service to process the images / documents via Vision API and to upload to our Azure SQL Database. Full text indexing will index the text and link it to the associated image / document.

The second console app is a simple query tool, that will utilize SQL full text indexing to query for the stored images.

Signing Up

The first thing you would need is to subscribe to the Vision API service in Azure. You will need a valid Microsoft account, your school or personal outlook account will suffice for that.

To signup for the Vision API, browse to the following link: https://www.microsoft.com/cognitive-services/en-us/sign-up

Once you have signed in to your Microsoft account and subscribed to the Vision API .You will have an account key as shown in the screen shot below. Copy the key and save it somewhere; you will need this key later on to call the Vision API.

Creating the apps

1. Create a console app in Visual Studio 2015. You will need to add the Microsoft.ProjectOxford.Vision assembly to your project. You can get this assembly from NuGet by running this command in Package Manager Console:

Install-Package Microsoft.ProjectOxford.Vision

2. Next, add the following using directives to your Program.cs File:

using Microsoft.ProjectOxford.Vision;

using Microsoft.ProjectOxford.Vision.Contract;

3. To call vision API you simply need a POST request. You can either pass the image URL or the whole image as raw bytes. Just make sure that your image is up to the requirements below.

Image requirements:

  • Supported image formats: JPEG, PNG, GIF, BMP.

  • Image file size must be less than 4MB.

  • Image dimensions must be between 40 x 40 and 3200 x 3200 pixels, and the image cannot be larger than 100 megapixels.

    Microsoft.ProjectOxford.Vision assembly provides us with a VisionClient class which takes care of creating the post request.

4. Initialize the VisionClient instance, you will need to pass in the subscription key you saved

VisionServiceClient VisionServiceClient = new VisionServiceClient(apiSubscriptionKey);

5. Read the image from a file and then call the RecognizeTextAsync method using (Stream imageFileStream = File.OpenRead(imageFilePath))

{

                // Call Vision API

                Console.WriteLine(“Analyzing text in the image….”);

                OcrResults ocrResult = await VisionServiceClient.RecognizeTextAsync(imageFileStream, language);

}

6. Read and log the result

private static string GetOcrText(OcrResults results)

{

            StringBuilder stringBuilder = new StringBuilder();

            if (results != null && results.Regions != null)

            {

                stringBuilder.Append(“Text: “);

                stringBuilder.AppendLine();

                foreach (var item in results.Regions)

                {

                    foreach (var line in item.Lines)

                    {

                        foreach (var word in line.Words)

                        {

                            stringBuilder.Append(word.Text);

                            stringBuilder.Append(” “);

                        }

                        stringBuilder.AppendLine();

                    }

                    stringBuilder.AppendLine();

                }

            }

            return stringBuilder.ToString();

 }

Upload Status

7. Next step is to save the image and the text to our Azure SQL database. For this purpose we have to create a SQL database in SQL Azure. Follow the link below on how to create a SQL database in Azure.

https://azure.microsoft.com/en-in/documentation/articles/sql-database-get-started/

Create a table which will contain the image, text and File name of the uploaded image / document.

CREATE TABLE [dbo].[ImageData](

       [id_num] [int] IDENTITY(1,1) NOT NULL,

       [img_name] [nvarchar](max) NULL,

       [picture] [image] NULL,

       [text_content] [text] NULL

)

Next in order to make full text search available on the text stored in the image we will create a Full Text Index on text_content field of our image. Run the following SQL commands:

CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT;

CREATE UNIQUE INDEX ui_ukImageData ON ImageData (id_num);

CREATE FULLTEXT INDEX ON ImageData (text_content) KEY INDEX ui_ukImageData ON ftCatalog;

The two code snippets below show how to interact with the database to store and query the data:

Store Image / Document:

using (var connection = new SqlConnection(connectionString))
{
   connection.Open();
   Console.WriteLine(“Image Uploading to the database….”);
   SqlParameter parameter;

   using (var command = new SqlCommand())
   {
      command.Connection = connection;
      command.CommandType = CommandType.Text;
      command.CommandText = @”
         INSERT INTO ImageData
         (img_name,
         picture,
         text_content)
         OUTPUT
         INSERTED.id_num
         VALUES
         (@img_name,
         @picture,
         @text_content); “;

    parameter = new SqlParameter(“@img_name”, SqlDbType.NVarChar);
    parameter.Value =filInfo.Name;
    command.Parameters.Add(parameter);

    parameter = new SqlParameter(“@picture”, SqlDbType.Image);
    parameter.Value = imgBytes;
    command.Parameters.Add(parameter);

    parameter = new SqlParameter(“@text_content”, SqlDbType.Text);
    parameter.Value = GetOcrText(ocrResult);
    command.Parameters.Add (parameter);

    int rowId = (int)command.ExecuteScalar ();
    Console.WriteLine(“Image Upload Complete”);

   }

 
}

Query Image / Document Text:

The following code will query the indexed text, retrieve and open the images in the default image viewer:

using (var connection = new SqlConnection(connectionString))
{
   connection.Open();

   SqlDataReader rdr = null;
   using (var command = new SqlCommand())
   {
      command.Connection = connection;
      command.CommandType = CommandType.Text;
      command.CommandText = @”
         SELECT *
         FROM ImageData
        WHERE FREETEXT (text_content, ‘” + words + “‘); “;

      rdr = command.ExecuteReader();

      while (rdr.Read ())
      {

         imgbytes = (byte[])rdr[“picture”];
         filename = rdr[“img_name”].ToString();
         text = rdr[“text_content”].ToString();
         Guid guid = Guid.NewGuid();

         FileStream fs = new FileStream(guid.ToString() + filename,
         FileMode.CreateNew, FileAccess.Write);
         fs.Write(imgbytes, 0, imgbytes.Length);
         fs.Flush();
         fs.Close();
         Process.Start(guid.ToString() + filename);

      }

   }
}

Conclusion

Clearly Microsoft’s Cognitive Services are easy to use. The Vision API is just one of many different Artificial Intelligence APIs provide by Microsoft. We look forward to see how this new platform matures and the types of apps appearing in the marketplace, leveraging this technology.

Leave a Reply

Your email address will not be published. Required fields are marked *