Skills Measured

Max : CV and NLP

Building, Managing and Deploying Azure AI Solutions

Plan and manage an Azure AI solution
Implement decision support solutions
Implement computer vision solutions
Implement natural language processing solutions
Implement knowledge mining and document intelligence solutions
Implement generative AI solutions

Requirements :

Requirements definition and design
Development
Deployment
Integration
Maintenance
Performance tuning
Monitoring

Use Representational State Transfer (REST) APIs and SDKs to build secure image processing, video processing, natural language processing, knowledge mining, and generative AI solutions on Azure

Study and Prep

Details

AI-102 => Designing and implementing a Microsoft Azure AI Solution
Certification - Azure AI Associate Engineer Certification

Exam time : 2 hours (Including 20 mins survey)
About 42 Questions

There is no hands on coding, but you need to understand the code. You can access the hands on lab. You can choose the language (C# or Python)

Tips

Think very logically. The services are designed for developers to leverage
The naming scheme will also be very straightforward

Tested Skills

How are you competent with AI Engineering.
Development (C# or Python) - Develop in these languages
API / SDK
- JSON Body
- PUT/GET and other requests to a URI
- SDKs abstracts the APIs
DevOps
- Source and Version Control (Git)
- Pipelines (CI / CD)

Responsible AI

Considerations for Responsible AI

Fairness - All people should be treated the same (Gender, Race etc)
Reliability and Safety - Models like medical diagnosis and self driving cars
Privacy and Security - The public domain data used for training should not expose personal information
Inclusiveness - All systems should empower everyone (No disability)
Transparency - Understand how the model is working, what are its limitations, whats the purpose
Accountability - Legal and Ethically sound (People should be accountable for AI systems)

What is AI?

A Software that mimics human like capabilities

Visual Perception / Computer Vision - Image and Video Understanding, Extracting text from a picture etc
Language / NLP - Semantic Meaning
Speech - Conversational AI
Decision Making - Anomalies in trends

AI vs ML vs DS

AI is built on top of ML, and ML is all data and algorithms. This data is used to train prediction models.

After model testing and tuning, we can deploy the models and providing a new image we can infer from it

ML is built on top of DS, which uses Math and statistics to analyse data.

Azure ML Service

You have labelled data, that you split into train-test. You use that data and train the model, which can then be deployed from there itself. When you give a new data for inference, you can then use it to retrain and fine tune the model.

Azure Cognitive Services (Now AI Services)

For Visual Perception (CV)

Image Analysis - Information about the picture (Object Recognition, Text recognition, Segmentation etc)
Video Analysis - Breaking up into frames
1. AI Video Indexer
Image Classification
Object Detection - Bounding Boxes
Facial Analysis - Information of the face
OCR - Text extraction from images

For Language

Language Understanding
1. Question Answering
2. Text Analysis
Translation
Named Entity recognition
Custom text classification

For Speech

Speech to Text
Text to Speech
Machine Translation - Language
Speaker Recognition

For Decision Making

Anomaly Detection (Content Safety)
Content Moderation
Content Personalization - Recommendation Systems

For Knowledge Mining and Document Intelligence

AI search
Document intelligence
Custom document intelligence
Custom skills

For GenAI

Azure OpenAI service
DALL-E image generation

Using them Together

All of these services are going to be used with each other. They can, but usually do not run separately

Pre-Built Solutions on Azure

These are built using other services, and trained on large data

Form Recognizer - Recognize forms inputs, read it and put it back into the database
Metrics Advisor - Respond to critical metrics
Video Analyser for media
Immersive Reader - Making text more accessible
Bot Service - Conversational Interaction
- Can be in Web Chat, Email Interactions, Teams (Same bot, different channels)
Azure Cognitive Search - You ingest some data, and the indexer creates an index from it. Then, you can do different types of search against that index.
- You do not need raw index. You can enrich it. Eg : Take an image, run it through OCR. Or translate it and then index etc.
- Enhance the index with insights derived from the source data.

Deploying to Azure

To leverage a service, we need an Azure Subscription. A Subscription is a billing Boundary for many things

Provisioning an Azure AI Service Resource

Deploying Services

You Provision an Instance of your service into you Azure Cloud Subscription. Within the Subscription, you can use many of the different Regions.
This will give you your Cognitive Service Resource

Cloud Shell - Interact with the Azure Services using PowerShell and the AzureCLI
Display all regions - az account list-locations -o table
- You'll get eastus1 (East US 1), london1 (London) etc.

Resources

When we deploy a service and get the resource provisioned, we have a choice. The resource can be :

Multi-Service : One resource that supports multiple different AI services
- Has a single end point with a single access credential
- Manage Multiple Resources through single credential
- Makes it easy to access different instances
- Does not have a free tier
Single-Service :
- Has different endpoints for every single instance
- Manage Billing separately
- You have different instances. So, you have have different endpoints, different keys, and even different regions.
- [!] Training may require a different resource. So, you may need one service for training and other for deployment

Eg : Custom Vision - Gives you the option to break up prediction and training into different instances.

Aspects of Resource (Consuming a Service)

A resource has an the following :

Endpoint URI - This is the only way you can interact with a service
Region / Location
2 Subscription Keys - So that the usage of the service is not interrupted
- Eg : You can use Key 1 while rotating Key 2, then use the new Key 2 while rotating Key 1
- Some services use Tokens (Handled by SDKs), some use Azure AD

Securing AI Services

Securing Keys

Regenerating Keys
- Body needs the key name
You can have an Azure key Vault and store your keys in the vault. Then, apply vault based access control
- Only certain service principles can get access to the keys
Token based Auth (Done automatically in SDKs)
Microsoft Entra ID
Managed Entities
Using Service Principals

Network Protection

Remember that the service is talking to the Endpoint URI.
This means that this URI can have its own Firewall

You can then restrict it to certain public IPs
- Whitelist them
Or certain Service Endpoints.
- A Service Endpoint lets you control and let in only certain subnets and prevents the traffic from the rest of the subnets even in a Virtual Network
You can also use Private Endpoints
- This completely prevents public endpoints, and creates a virtual NIC in your VNET, that points to a certain instance

Uses the CognitiveServicesManagement Service tag (Services like Anomaly Detection and Azure OpenAI supports these)

RestAPIs and SDKs (Using a Service)

How do you use these services? Assume that you have the Region, the Endpoint URI and the Key.

REST APIs HTTP Endpoints

We are creating a JSON payload wrapped in HTTP.
You send this request to the endpoint.
The endpoint sends a response back
There are various requests you can send : PUT, POST, GET etc.

You need to mention the following in your request (App settings) :

{
	"CognitiveServicesEndpoint" : "Your Endpoint",
	"CognitiveServicesKey" : "Your Key"
}

You program will take these settings and construct a payload from it.

You can do this with Postman, CURL etc.
The disadvantage is that you lose some benefits (No native function, excess code etc.)

SDKs (Software Development Kits)

Available in C#, Python, Node, .NET, Go, Java
You still need to give the Endpoints and Keys, but it is abstracting everything else. You can then code in a specific, language friendly way.
- Makes the program smaller
- You just need to create a client that will talk with the endpoint credentials, then call the inbuilt SKD functions to perform tasks.
- You do not need to mess with the JSONs.

Pricing and Monitoring

Outside of the free Single Service, Azure Services are consumption based. You pay based on the number of interactions you do to it

Check out the pricing page
- It tells you based on the region, service and how many transactions you do
- More transactions, the cheaper the service

Multi service does not have any free option.

Alerts and Budgets

Can be set on :

Actual Spend you've done so far
Forecasted spending
1. Based on the current trend line
2. Can be alerted by call, wayput, email etc.

You can also do alerts based on Metrics. When it hits a metric, it will give the alert
It can also be by cost.
Logs - Can be around audit, request and response, trace etc
- Can be enabled in Diagnostic Settings (Not on by default)
- You can send these to Log Analytics Workspace (Azure Monitor Logs), Events Hub, Storage account etc
  - You can also send them to an SIM (Security Information Management) system

Key Point

The key point is that we are creating an azure cloud resource. We can call this from any other services that we want (On premise, or any other service), but this service is running on the cloud and we pay per use of this service

Deploying to Containers

If you have a car, can you reliably assume that you will always have a high speed connection to use the service and get the response?

Important

Sometimes, you need the service running on the Edge / On-Premisus. The Cloud is not always a good fit

Maybe because the latency is too long
Maybe because of security concerns

The other option for provisioning is using Containers

Containers

Azure AI Containers

They have a Host (Usually Linux)
They create some user-mode sandboxed environments (Isolated from each other). Each of these environments runs a particular container image.
You can get these images from a Registry
- In this case, it is called MCR / Microsoft Artifact Registry (Access it here)
You can create container images.
- Images are fundamentally just layers
- If someone else creates a DockerFile, the first line is "FROM". This means that the person takes an image from the registry. Then, you can build a new image from this
- Once you build your image, you can then publish to the registry again.
- Then, you can run the image in your local container host.
Azure Cognitive Services - Set of container images that you can use.
When you run them, you need to specify some parameters
- APIKEY
- Billing - You need Endpoint URI of the service
- Accept EULA (You are using Microsofts IP)
This still needs some internet connectivity (To accept the EULA and Billing info), but after that it is mostly local.
- You need to talk to Azure for the billing purposes
This has a lot of containers in it, including speech vision etc

Container Deployment

Docker server
Azure container Instance
Azure Kubernetes Service Cluster

Deploying and Using the AI Services Container

Download the specific container for the Azure AI services API and deploy it to a container host.
Client applications submit data to the endpoint and get results just like they would normally through azure endpoints
Some usage metrics are periodically sent over to azure AI services to calculate billing for the service

Some Containers

Example : docker pull <URI>

Some Language Containers

Feature	Image
Key Phrase Extraction	mcr.microsoft.com/azure-cognitive-services/textanalytics/keyphrase
Language Detection	mcr.microsoft.com/azure-cognitive-services/textanalytics/language
Sentiment Analysis	mcr.microsoft.com/azure-cognitive-services/textanalytics/sentiment
Named Entity Recognition	mcr.microsoft.com/product/azure-cognitive-services/textanalytics/language/about
Text Analytics for health	mcr.microsoft.com/product/azure-cognitive-services/textanalytics/healthcare/about
Translator	mcr.microsoft.com/product/azure-cognitive-services/translator/text-translation/about
Summarization	mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization

Speech Containers

Feature	Image
Speech to text	mcr.microsoft.com/product/azure-cognitive-services/speechservices/speech-to-text/about
Custom Speech to text	mcr.microsoft.com/product/azure-cognitive-services/speechservices/custom-speech-to-text/about
Neural Text to speech	mcr.microsoft.com/product/azure-cognitive-services/speechservices/neural-text-to-speech/about
Speech language detection	mcr.microsoft.com/product/azure-cognitive-services/speechservices/language-detection/about

Vision Containers

Feature	Image
Read OCR	mcr.microsoft.com/product/azure-cognitive-services/vision/read/about
Spatial analysis	mcr.microsoft.com/product/azure-cognitive-services/vision/spatial-analysis/about

Now, we'll see the services in detail

Visual Perception Service (Image Analysis)

Send and image and get a JSON response (With some confidence level report)

Image Analysis

Image Description / Captioning : A car is on the road
- Determining an appropriate caption for an image, and identifying relevant "tags" that can be used as keywords to indicate its subject.
- Category / Tags : Vehicle, car, sun etc.
Object Detection : Bounding Box Coordinates and Object confidence
Content Moderation
Celebrity / People Recognition : You need to get special permissions when it comes to these
- Presence, Location and Specific Features of them
Thumbnail Generation
Image metadata, color, and type analysis
- Determining the format and size of an image, its dominant color palette, and whether it contains clip art
Category identification
- Identifying an appropriate categorization for the image, and if it contains any known landmarks.
Background removal
- Detecting the background in an image and output the image with the background transparent or a greyscale alpha matte image.
- !Screenshot 2024-05-25 at 3.34.45 PM.png

Content Moderation rating
- Determine if the image includes any adult or violent content.
Optical character recognition - reading text in the image.
Smart thumbnail generation
- Identifying the main region of interest in the image to create a smaller "thumbnail" version.

Analyzing an Image

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
    endpoint=os.environ["ENDPOINT"],
    credential=AzureKeyCredential(os.environ["KEY"])
)

result = client.analyze(
    image_url="<url>",
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.READ],
    gender_neutral_caption=True,
    language="en",
)

Available visual features are contained in the VisualFeatures enum:

VisualFeatures.TAGS: Identifies tags about the image, including objects, scenery, setting, and actions
VisualFeatures.OBJECTS: Returns the bounding box for each detected object
VisualFeatures.CAPTION: Generates a caption of the image in natural language
VisualFeatures.DENSE_CAPTIONS: Generates more detailed captions for the objects detected
VisualFeatures.PEOPLE: Returns the bounding box for detected people
VisualFeatures.SMART_CROPS: Returns the bounding box of the specified aspect ratio for the area of interest
VisualFeatures.READ: Extracts readable text

Output :

{
  "apim-request-id": "abcde-1234-5678-9012-f1g2h3i4j5k6",
  "modelVersion": "<version>",
  "denseCaptionsResult": {
    "values": [
      {
        "text": "a house in the woods",
        "confidence": 0.7055229544639587,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 640,
          "h": 640
        }
      },
      {
        "text": "a trailer with a door and windows",
        "confidence": 0.6675070524215698,
        "boundingBox": {
          "x": 214,
          "y": 434,
          "w": 154,
          "h": 108
        }
      }
    ]
  },
  "metadata": {
    "width": 640,
    "height": 640
  }
}

Video Analysis

Service : Azure Video Indexer and Video Analyzer Service

Facial Recognition (Requires Limited Access approval)
OCR
Speech (Transcription of spoken dialogue)
Topics - Main theme identification
Sentiment
Labels - What all can be applied (Themes and Objects, for example)
Content Moderation - Adult or violent
Scene Segmentation
Custom ones : Train it to identify certain people (Can work on brands, products, characters, speech etc)

The Video Analyzer service provides a portal website that you can use to upload, view, and analyze videos interactively

Custom Insights

Azure Video Indexer includes predefined models that can recognize well-known celebrities, do OCR, and transcribe spoken phrases into text

Custom models :

People - add images of faces of people you want to recognize in your video.
Language - If your organization uses specific terminology that may not be in common usage, you can train a custom model to detect and transcribe it.
Brands - You can train a model to recognize specific names as brands, for example to identify products, projects, or companies that are relevant to your business.

Video Analyzer Widgets and APIs

2 ways to integrate with custom applications

Azure Video Indexer Widgets - The widgets used in the Azure Video Indexer portal to play, analyze, and edit videos can be embedded in your own custom HTML interfaces. You can use this technique to share insights from specific videos with others without giving them full access to your account in the Azure Video Indexer portal.

Azure Video Indexer API

https://api.videoindexer.ai/Auth/<location>/Accounts/<accountId>/AccessToken

Sample Response

{
    "accountId": "SampleAccountId",
    "id": "30e66ec1b1",
    "partition": null,
    "externalId": null,
    "metadata": null,
    "name": "test3",
    "description": null,
    "created": "2018-04-25T16=50=00.967+00=00",
    "lastModified": "2018-04-25T16=58=13.409+00=00",
    "lastIndexed": "2018-04-25T16=50=12.991+00=00",
    "privacyMode": "Private",
    "userName": "SampleUserName",
    "isOwned": true,
    "isBase": true,
    "state": "Processing",
    "processingProgress": "",
    "durationInSeconds": 13,
    "thumbnailVideoId": "30e66ec1b1",
    "thumbnailId": "55848b7b-8be7-4285-893e-cdc366e09133",
    "social": {
        "likedByUser": false,
        "likes": 0,
        "views": 0
    },
    "searchMatches": [],
    "indexingPreset": "Default",
    "streamingPreset": "Default",
    "sourceLanguage": "en-US"
}

Can be deployed with Deploy with ARM templates
First generate the token and then use the token to get response from an API
- Index a Video, then apply

Image Classification

Predict the class label based on main subject.

Uses Azure AI Custom Vision
2 tasks in the solution
- Use existing (labeled) images to train an Azure AI Custom Vision model.
- Create a client application that submits new images to your model to generate predictions.
You need to provision 2 kinds of resources (Needs training to work (TRAINING and PREDICTION phases))
- Training Resource, which can be an
  - Azure AI Services resource.
  - Azure AI Custom Vision (Training) resource.
- Prediction Resource, used by client to query model
Types of Classification
- Multi-class : One image, one class (Most prominent one)
- Multi-Label : One image, multiple classes
Use SDKs and APIs to upload images, label, train etc
Steps
1. Create an image classification project for your model and associate it with a training resource.
2. Upload images, assigning class label tags to them.
3. Review and edit tagged images.
4. Train and evaluate a classification model.
5. Test a trained model.
6. Publish a trained model to a prediction resource.

Object Detection

Gets the coordinates along with the class

Facial Analysis (Detect, Analyse and Recognize)

You need to ensure privacy, transparency and all things equally (Not gender and age are removed)

Two ways :

General CV (Azure AI Vision Service) only tells you if there is a face and its general location.
The Face Service goes a step further
- Comprehensive facial feature and attribute analysis (including head pose, presence of spectacles, blur, facial landmarks, occlusion and others)
- Comparison and Verification
- Identification
- Exposure on the picture etc

!faceservice.png
!moreface.png

Face Service

The Face service provides functionality that you can use for:

Face detection - for each detected face, the results include an ID that identifies the face and the bounding box coordinates indicating its location in the image.
Face attribute analysis - you can return a wide range of facial attributes, including:
- Head pose (pitch, roll, and yaw orientation in 3D space)
- Glasses (NoGlasses, ReadingGlasses, Sunglasses, or Swimming Goggles)
- Blur (low, medium, or high)
- Exposure (underExposure, goodExposure, or overExposure)
- Noise (visual noise in the image)
- Occlusion (objects obscuring the face)
- Accessories (glasses, headwear, mask)
- QualityForRecognition (low, medium, or high)
Facial landmark location - coordinates for key landmarks in relation to facial features (for example, eye corners, pupils, tip of nose, and so on)
Face comparison - you can compare faces across multiple images for similarity (to find individuals with similar facial features) and verification (to determine that a face in one image is the same person as a face in another image)
Facial recognition - you can train a model with a collection of faces belonging to specific individuals, and use the model to identify those people in new images.
Facial liveness - liveness can be used to determine if the input video is a real stream or a fake to prevent bad intentioned individuals from spoofing the recognition system.
You can provision Face as a single-service resource, or you can use the Face API in a multi-service Azure AI Services resource.

If you want to use the identification, recognition, and verification features of Face, you'll need to apply for the Limited Access policy and get approval before these features are available.

Detecting with the Vision Service

Call the Analyze Image function (SDK or equivalent REST method), specifying People as one of the visual features to be returned.

{ 
  "modelVersion": "2023-10-01",
  "metadata": {
    "width": 400,
    "height": 600
  },
  "peopleResult": {
    "values": [
      {
        "boundingBox": {
          "x": 0,
          "y": 56,
          "w": 101,
          "h": 189
        },
        "confidence": 0.9474349617958069
      },
      {
        "boundingBox": {
          "x": 402,
          "y": 96,
          "w": 124,
          "h": 156
        },
        "confidence": 0.9310565276194865
      },
    ...
    ]
  }
}

OCR

From images and documents, printed and handwritten
Has 2 APIs as well

READ (Document Intelligence) API - Read from Images and PDF Documents
1. Can be huge volumes of data (Entire books)
2. Handles images and PDFs
3. Uses context and structure of the document to improve accuracy
4. The initial function call returns an asynchronous operation ID, which must be used in a subsequent call to retrieve the results
  1. Asynch call - For each page or image, you send an asynch call and then retrieve results for that call. This helps handle volume
5. Examples include: receipts, articles, and invoices
Image Analysis (AI Vision Service) API
1. Just can do it, not very good
2. Used for general, unstructured documents
3. Handles low volume
4. Synchronous call (Single API)
5. Examples include: street signs, handwritten notes, and store signs.

Language Understanding

To interpret human intent from natural language. Uses Azure AI Language Service

Two types :

Learned - Needs to be trained
1. CLU (Custom Language Understanding) - Trained on a certain Intent with different Utterances for each intent. Optionally, you can add some Entities (Age, Currency, Temperature, Email etc are all prebuilt).
  1. You need to train, test, deploy, review and retrain
2. Custom Named Entities - Person, Names, Places, Things etc.
  1. You need to custom label this
  2. You need a diverse set of labels
3. Custom Text Classification - Classify Documents into custom groups
  1. Can be Single and Multiple Labels (Single -> One document, one label only)
Preconfigured
1. Summarization - Given some text, kind the key sentences that has the overall meaning
2. PII (Personal Identifiable Information) Detection - IP Address, Emails, Home Address, Street Address, Names, Health etc (Finds it)
3. QnA
4. Text Analysis

AI Language Project Lifecycle

!extraction-development-lifecycle.png

Creating an entity extraction model typically follows a similar path to most Azure AI Language service features:

Define entities: Understanding the data and entities you want to identify, and try to make them as clear as possible. For example, defining exactly which parts of a bank statement you want to extract.
Tag data: Label, or tag, your existing data, specifying what text in your dataset corresponds to which entity. This step is important to do accurately and completely, as any wrong or missed labels will reduce the effectiveness of the trained model. A good variation of possible input documents is useful. For example, label bank name, customer name, customer address, specific loan or account terms, loan or account amount, and account number.
Train model: Train your model once your entities are labeled. Training teaches your model how to recognize the entities you label.
View model: After your model is trained, view the results of the model. This page includes a score of 0 to 1 that is based on the precision and recall of the data tested. You can see which entities worked well (such as customer name) and which entities need improvement (such as account number).
Improve model: Improve your model by seeing which entities failed to be identified, and which entities were incorrectly extracted. Find out what data needs to be added to your model's training to improve performance. This page shows you how entities failed, and which entities (such as account number) need to be differentiated from other similar entities (such as loan amount).
Deploy model: Once your model performs as desired, deploy your model to make it available via the API. In our example, you can send to requests to the model when it's deployed to extract bank statement entities.
Extract entities: Use your model for extracting entities. The lab covers how to use the API, and you can view the API reference for more details.

Considerations for Data Selection and Refining Entities

For the best performance, you'll need to use both high quality data to train the model and clearly defined entity types.

High quality data will let you spend less time refining and yield better results from your model.

Diversity - use as diverse of a dataset as possible without losing the real-life distribution expected in the real data. You'll want to use sample data from as many sources as possible, each with their own formats and number of entities. It's best to have your dataset represent as many different sources as possible.
Distribution - use the appropriate distribution of document types. A more diverse dataset to train your model will help your model avoid learning incorrect relationships in the data.
Accuracy - use data that is as close to real world data as possible. Fake data works to start the training process, but it likely will differ from real data in ways that can cause your model to not extract correctly.

Entities need to also be carefully considered, and defined as distinctly as possible. Avoid ambiguous entities (such as two names next to each other on a bank statement), as it will make the model struggle to differentiate. If having some ambiguous entities is required, make sure to have more examples for your model to learn from so it can understand the difference.

Keeping your entities distinct will also go a long way in helping your model's performance. For example, trying to extract something like "Contact info" that could be a phone number, social media handle, or email address would require several examples to correctly teach your model. Instead, try to break them down into more specific entities such as "Phone", "Email", and "Social media" and let the model label whichever type of contact information it finds.

Learned - Evaluation Metrics

You need to tag data, train it and test it. Remember that you always need to train-test-split

You might get False Positives or False Negatives

Recall - $T P / (T P + F N)$ => Eg : You have identified 3 correct, one wrong and missed 3, then recall is 3/6 = 0.5
Precision - $T P / (T P + F P)$ => In the above case, you have identified 3 correct and 1 wrong, so precision is 3/4 = 0.75
F1 Score - $\frac{2. P . R}{P + R}$

Question and Answering

You need to create a Knowledge Base of question and answer pairs that can be queried using natural language input.

You can pass in FAQs, Files, Built in Chit-Chat QnA pairs etc. This will have the questions and the answers. You can also pass in synonyms of the words. From here, you can create a bot that will query the base and get the answer.

It will also return the source of the answer and other prompts that are applicable.
You can ask it to return multiple answers (Like top 3)
You can ask follow-up questions (Like cancel my reservation)

QnA vs Language Understanding
!qnaservice.png

Using a KB

{
  "question": "What do I need to do to cancel a reservation?",
  "top": 2,
  "scoreThreshold": 20,
  "strictFilters": [
    {
      "name": "category",
      "value": "api"
    }
  ]
}

Property	Description
question	Question to send to the knowledge base.
top	Maximum number of answers to be returned.
scoreThreshold	Score threshold for answers returned.
strictFilters	Limit to only answers that contain the specified metadata.

Output

{
  "answers": [
    {
      "score": 27.74823341616769,
      "id": 20,
      "answer": "Call us on 555 123 4567 to cancel a reservation.",
      "questions": [
        "How can I cancel a reservation?"
      ],
      "metadata": [
        {
          "name": "category",
          "value": "api"
        }
      ]
    }
  ]
}

Improving QnA

Use Active Learning

Synonyms

{
    "synonyms": [
        {
            "alterations": [
                "reservation",
                "booking"
                ]
        }
    ]
}

Text Analysis

Language Detection
- Works with Documents of Single Phrases
- Max doc size : 5,120 characters
- Each collection is restricted to 1,000 items (IDs)

Key Phrase Extraction

Identify main points around context of the document
Works best for larger documents (Max size : 5120 chars)

Sample input

{
    "kind": "KeyPhraseExtraction",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
              "id": "1",
              "language": "en",
              "text": "You must be the change you wish 
                       to see in the world."
            },
            {
              "id": "2",
              "language": "en",
              "text": "The journey of a thousand miles 
                       begins with a single step."
            }
        ]
    }
}

Sample output

{
    "kind": "KeyPhraseExtractionResults",
    "results": {
    "documents": [   
        {
         "id": "1",
         "keyPhrases": [
           "change",
           "world"
         ],
         "warnings": []
       },
       {
         "id": "2",
         "keyPhrases": [
           "miles",
           "single step",
           "journey"
         ],
         "warnings": []
       }
],
    "errors": [],
    "modelVersion": "2021-06-01"
    }
}

Sentiment Analysis

Sample input

{
  "kind": "SentimentAnalysis",
  "parameters": {
    "modelVersion": "latest"
  },
  "analysisInput": {
    "documents": [
      {
        "id": "1",
        "language": "en",
        "text": "Good morning!"
      }
    ]
  }
}

Sample output

{
  "kind": "SentimentAnalysisResults",
  "results": {
    "documents": [
      {
        "id": "1",
        "sentiment": "positive",
        "confidenceScores": {
          "positive": 0.89,
          "neutral": 0.1,
          "negative": 0.01
        },
        "sentences": [
          {
            "sentiment": "positive",
            "confidenceScores": {
              "positive": 0.89,
              "neutral": 0.1,
              "negative": 0.01
            },
            "offset": 0,
            "length": 13,
            "text": "Good morning!"
          }
        ],
        "warnings": []
      }
    ],
    "errors": [],
    "modelVersion": "2022-11-01"
  }
}

Overall document sentiment is based on sentences:
- If all sentences are neutral, the overall sentiment is neutral.
- If sentence classifications include only positive and neutral, the overall sentiment is positive.
- If the sentence classifications include only negative and neutral, the overall sentiment is negative.
- If the sentence classifications include positive and negative, the overall sentiment is mixed.

NER - Identify places, names, time periods, organizations etc

Supported Entities

Sample input

{
  "kind": "EntityRecognition",
  "parameters": {
    "modelVersion": "latest"
  },
  "analysisInput": {
    "documents": [
      {
        "id": "1",
        "language": "en",
        "text": "Joe went to London on Saturday"
      }
    ]
  }
}

Sample output

{
	"kind": "EntityRecognitionResults",
	 "results": {
		  "documents":[
			  {
				  "entities":[
				  {
					"text":"Joe",
					"category":"Person",
					"offset":0,
					"length":3,
					"confidenceScore":0.62
				  },
				  {
					"text":"London",
					"category":"Location",
					"subcategory":"GPE",
					"offset":12,
					"length":6,
					"confidenceScore":0.88
				  },
				  {
					"text":"Saturday",
					"category":"DateTime",
					"subcategory":"Date",
					"offset":22,
					"length":8,
					"confidenceScore":0.8
				  }
				],
				"id":"1",
				"warnings":[]
			  }
		  ],
		  "errors":[],
		  "modelVersion":"2021-01-15"
	}
}

Entity Linking - If different entities share the same name, then it will differentiate

Identifying specific entities by providing reference links to some Knowledge Base.
Wikipedia provides the knowledge base for the Text Analytics service.

Sample input

{
  "kind": "EntityLinking",
  "parameters": {
    "modelVersion": "latest"
  },
  "analysisInput": {
    "documents": [
      {
        "id": "1",
        "language": "en",
        "text": "I saw Venus shining in the sky"
      }
    ]
  }
}

Sample output

{
  "kind": "EntityLinkingResults",
  "results": {
    "documents": [
      {
        "id": "1",
        "entities": [
          {
            "bingId": "89253af3-5b63-e620-9227-f839138139f6",
            "name": "Venus",
            "matches": [
              {
                "text": "Venus",
                "offset": 6,
                "length": 5,
                "confidenceScore": 0.01
              }
            ],
            "language": "en",
            "id": "Venus",
            "url": "https://en.wikipedia.org/wiki/Venus",
            "dataSource": "Wikipedia"
          }
        ],
        "warnings": []
      }
    ],
    "errors": [],
    "modelVersion": "2021-06-01"
  }
}

Translation

Language Detection (Overlaps)
One-to-many Translation
- Can be combined with ReadAPI
Word Alignment
Profanity Filtering
Transliteration - Native script to alternative script (Like dialects)
Custom Translations

#More Language Translation Tasks

Speech

#Creating Speech Enabled Apps with Azure

Use Azure AI Speech Service

Speech to Text

Can be spoken or from a file

!translate-speech-small.png

2 ways to detect :

Speech-to-Text API - Primary
Speech-to-Text Short Audio API - For upto 60s

Configuration Files :

SpeechConfig - Location, Key
AudioConfig - Ovrerides default input

These will go and create a SpeechRecognizer object. This returns things like The duration and the text

Requirements

Use a SpeechTranslationConfig object to encapsulate the information required to connect to your Azure AI Speech resource. Specifically, its location and key.
The SpeechTranslationConfig object is also used to specify the speech recognition language (the language in which the input speech is spoken) and the target languages into which it should be translated.
Optionally, use an AudioConfig to define the input source for the audio to be transcribed. By default, this is the default system microphone, but you can also specify an audio file.
Use the SpeechTranslationConfig, and AudioConfig to create a TranslationRecognizer object. This object is a proxy client for the Azure AI Speech translation API.
Use the methods of the TranslationRecognizer object to call the underlying API functions. For example, the RecognizeOnceAsync() method uses the Azure AI Speech service to asynchronously translate a single spoken utterance.
Process the response from Azure AI Speech. In the case of the RecognizeOnceAsync() method, the result is a SpeechRecognitionResult object that includes the following properties:
- Duration
- OffsetInTicks
- Properties
- Reason
- ResultId
- Text
- Translations

If the operation was successful, the Reason property has the enumerated value RecognizedSpeech, the Text property contains the transcription in the original language. You can also access a Translations property which contains a dictionary of the translations (using the two-character ISO language code, such as "en" for English, as a key).

Text to Speech

The same Config files as above
2 APIs :

Text to Speech API
Text to Speech long audio API : For Batch operations

You can also add Audio Format, Voices etc to the configs.

This creates a SpeechSynthesizer object.

Inputs are given in two forms :

Text
SSML (Speech Synthesis Markup Language) - Includes pauses, pronunciations, pitch, rate, speaking styles etc.

Speech / Machine Translation

Use Azure AI Translator Service

Speech to text, but you have the source and the target language.
You need to use the short codes here.

Takes in the codes and speech, and we get the original language text but also the modified language texts
You can also Synthesize Translations (Speak out the translated text)
- You can trigger it based on an event (For one output language)
- Or do it manually

To use translation via the SDK, you need:

The location in which the resource is deployed (for example, eastus)
One of the keys assigned to your resource.

Speech Synthesis from Translations

The TranslationRecognizer returns translated transcriptions of spoken input - essentially translating audible speech to text.

You can also synthesize the translation as speech to create speech-to-speech translation solutions. There are two ways you can accomplish this.

You can only use event-based synthesis when translating to a single target language.

Event-based synthesis

When you want to perform 1:1 translation (translating from one source language into a single target language), you can use event-based synthesis to capture the translation as an audio stream. To do this, you need to:

Specify the desired voice for the translated speech in the TranslationConfig. Create an event handler for the TranslationRecognizer object's Synthesizing event. In the event handler, use the GetAudio() method of the Result parameter to retrieve the byte stream of translated audio. The specific code used to implement an event handler varies depending on the programming language you're using. See the C# and Python examples in the Speech SDK documentation.

Manual synthesis

Manual synthesis is an alternative approach to event-based synthesis that doesn't require you to implement an event handler. You can use manual synthesis to generate audio translations for one or more target languages.

Manual synthesis of translations is essentially just the combination of two separate operations in which you:

Use a TranslationRecognizer to translate spoken input into text transcriptions in one or more target languages.
Iterate through the Translations dictionary in the result of the translation operation, using a SpeechSynthesizer to synthesize an audio stream for each language.

Other Speech Services

Speaker Recognition - Based on individual voices
Intent Recognition - Understand meaning of input

Decision Making

Anomaly Detection

Can be of two types :

Single Signal : Some large variance in a signal
Multi-Signal : When we are trying to find out something based on a set of signals together (If there is some correlation between then)

Content Moderation

Images - is it adult classified, racially classified etc
Text
1. Profanity (Offensive, Sexually explicit, sexually suggestive)
2. Data leakage (Names and phones)

Content Personalization

Best idea for individual users.

Eg : You can use Reinforcement Learning here (If you rate that you don't like a recommended movie, then it will improve next time)

Note

Until now, everything we have done includes something that we need to build before we use it. Now, we will look at solutions where we can directly use the things.

Prebuilt Solutions

These form a function without you having to do anything.

Uses Azure Document Intelligence Service

More here - #Azure Document Intelligence

Forms Recognizer

Prebuilt models to recognize forms

Uses OCR to recognise tables from the documents
- Subscribe to a resource - Azure AI service resource (or) Azure Document intelligence resource
Returns a JSON with location of text in bounding boxes, text content, tables, selection marks (also known as checkboxes or radio buttons), and document structure.
This will give you a result ID, and you can query it to get the actual result

Custom Model Building :

More work, you need to create a training set of forms
- You can use the Forms Recognizer studio
- Or you can use a Blob container (an Azure storage account) and retrieve the JSON files
With every document, there will be a Label and an OCR File (In JSON format)
- And one Fields JSON

Several Models exist, based on the specific form types

Invoice Model
Receipt Model
W2 Model - Extracts common fields and their values from the US Government's W2 tax declaration form
ID Document Model - US Drivers License
Business card model
Health insurance card model

If you don't have good structured data :

Read Model
- Detect the Language and classify it as Hand written or Printed Text
- For multi-page PDF or TIFF files, you can use the pages parameter in your request to fix a page range for the analysis
General Document Model (Extract text, keys, values, entities and selection marks from documents)
- Microsoft Azure AI Engineering Associate AI-102#Study and Prep#Prebuilt Solutions#Forms Recognizer#General Document Model
Layout Model (Extracts text and structure information from documents)
- Returns selection marks and tables from the input image or PDF file
- Good to use when you need to information about the general structure of the document
- Selection marks are extracted with their bounding box, a confidence indicator, and whether they're selected or not

What Prebuilt Models Extract

Text extraction. All the prebuilt models extract lines of text and words from hand-written and printed text.
Key-value pairs. Spans of text within a document that identify a label or key and its response or value are extracted by many models as key-values pairs. For example, a typical key might be Weight and its value might be 31 kg.
Entities. Text that includes common, more complex data structures can be extracted as entities. Entity types include people, locations, and dates.
Selection marks. Spans of text that indicate a choice can be extracted by some models as selection marks. These marks include radio buttons and check boxes.
Tables. Many models can extract tables in scanned forms included the data contained in cells, the numbers of columns and rows, and column and row headings. Tables with merged cells are supported.
Fields. Models trained for a specific form type identify the values of a fixed set of fields. For example, the Invoice model includes CustomerName and InvoiceTotal fields.

Input Requirements

JPEG, PNG, BMP, TIFF, or PDF format
- Read Model only accepts MS Office Files
The file must be smaller than 500 MB for the standard tier (S0), and 4 MB for the free (F0) tier.

Images must have dimensions between 50 x 50 pixels and 10,000 x 10,000 pixels.
PDF documents must have dimensions less than 17 x 17 inches or A3 paper size.
PDF documents must not be protected with a password.
The total size of the training data set must be 500 pages or less

Using the Model

You need

The service endpoint. This value is the URL where the service is published.
The API key. This value is a unique key that grants access

poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-document", docUrl)
result: AnalyzeResult = poller.result()

General Document Model

Extract values from structured, semi-structured, and unstructured documents
Only prebuilt model to support entity extraction
It can recognize entities such as people, organizations, and dates and it runs against the whole document, not just key-value pairs. This approach ensures that, when structural complexity has prevented the model extracting a key-value pair, an entity can be extracted instead. Remember, however, that sometimes a single piece of text might return both a key-value pair and an entity.

Types of Entities you can detect

Person. The name of a person.
PersonType. A job title or role.
Location. Buildings, geographical features, geopolitical entities.
Organization. Companies, government bodies, sports clubs, musical bands, and other groups.
Event. Social gatherings, historical events, anniversaries.
Product. Objects bought and sold.
Skill. A capability belonging to a person.
Address. Mailing address for a physical location.
Phone number. Dialing codes and numbers for mobile phones and landlines.
Email. Email addresses.
URL. Webpage addresses.
IP Address. Network addresses for computer hardware.
DateTime. Calendar dates and times of day.
Quantity. Numerical measurements with their units.

Other Resources

Using Financial Data and Receipts
More Resources

Metric Advisor

Discussed in Anomaly detection, just a prebuilt model.
Data Monitoring and Anomaly detection in Time Series Data (Tune model and alerts for metrics)

Video Analyzer for Media

Similar to Vision Service, does everything in one (Face detection, people tracking, shot detection, content labelling, emotion detection etc)

Immersive Reader

Make it easier to read

Bot Service

Conversational Interactions

User can input text, speech, visual input, button etc
It can maintain multi turn channels (memory, maintains state)
You can have multiple channels as well (Emails, Teams etc.)

Layers to this Solution : -> Azure Bot Service -> Exposed as a Bot Framework Service (API) -> Called by Bot Framework SDK

Available Templates :

Empty Bot - Basic Bot Selection
Echo Bot - Returns whatever you said back
Core Bot - Core functionality integration with Language Understanding Service
Hello Bot - Just replies hello

Activity Handlers : There are different event methods that you need to override to handle different types of activities

Turn Context : Show the text Received
Dialogues : ore complicated handling of state form multi-turns
Recognizer : Interpret the users input
Trigger : Respond to the intent
Language Generator : Formulate the output to the actual user
Local bot framework emulator : No need to go to cloud
Development Environment : Single bot, different channels

Others : Bot Framework Composer, Power Virtual Agents

Azure Cognitive / AI Services

ACS

We have huge amounts of data. To extract from it, and make it available in different ways for us to use, we need ACS.
ACS is a cloud based solution for the indexing and the querying on a wide range of data sources

Data can be in :

Blob
Tables in a Database
CosmosDB
Azure Data Factory - To bring in the data itself (You need Index Push API, but cant handle complex datatypes)

You can use Cognitive Services to enhance data (Like extracting text from image, or translation etc)

Azure Cognitive / AI Search

This is its own complete solution, not part of a multi-resource service

Tiers :

Free
Basic
Standard - Increasing scale and indexes
Storage Optimized (L)

Index

What this service does is create an index. From there, you need two things : Storage and Searching

You can get different Replicas of the index. This helps in Load-balancing of queries and prevents failure
Within a replica, you have different Partitions
- Every replica has the same number of partitions
Each of these is a Search Unit (3 Replicas, 4 partitions => 12 Search Units)

According to the guide, if you have just Read, then you need two replicas. If you have read and write, you need 3 (For 99.9% resilience)

You can have a maximum of 36 Search Units

Enrichment

Not just the raw data, but you can extract information from it.

Comes from a Skillset
Sometimes, you do not just want to add the enrichment to the index, but you also want to store the skillset and process to enrich something else
- You can then Project the enrichment into a Knowledge Store
  - KS can be JSON (for objects), Table (For relational schema data), Files (Image) etc.
  - #Knowledge Store with Azure AI Search
Indexer drives that mapping to the fields

You can also create Custom Skillets (#Creating Custom Skills for Azure AI Search) using Azure Functions :

This will return an endpoint URI, which points to a WebAPI Skill (To be consumed as part of your skill)
If you want a field returned, you can mark it as Retrievable
There is an autocomplete options, sentence suggestions (Using Suggestor, you can register certain fields)
You can use the Language Studio
You can use Lucene query parser instead of basic search
Geospatial Functions

Create an Azure AI Search Solution

Cloud based solution for indexing and querying a wide range of data sources, and creating comprehensive and high-scale search solutions.
- Index documents and data from a range of sources.
- Use cognitive skills to enrich index data.
- Store extracted insights in a knowledge store for analysis and integration.

#Azure AI Search Module

Azure OpenAI Services

LLMs like GPT3.5 and 4
Embeddings (Vectors based on information)
DallE

LLMs are trained on fast amounts of data, with lot of training time and resources

It is token based payment (Pay as you go)

We Prompt the LLMs, and it returns us an Inference
- Recall Vectors and similarities between words
Inference is predicting the next most likely tokens

Models are Read-Only

Use Cases :

Chat Completion - Conversation
Summarize
Sentiment
Code : Explain, comment, document, fix bugs, refractor code etc

The more specific the prompt, the better the response. So, apart from user prompt, you may also add a System Prompt.

This adds more complexity / information to the prompt
This process is called Grounding
This functions to make it clearer as to what is the expected result. For eg : You can say that "You are a good programmer. You must be nice and respectful..."

Using Current information

LLMs are trained on the data till Sep 2021, so how do we access beyond that / custom datasets?

You can describe an API and make it available to the model in the Grounding Process.

Basically, this will add the information itself into the request, and then the request itself.

Hence, you can upload and use your own data. OpenAI will then bring it in (as a Knowledge Base) that it can look up and give better results.

Summary

!AI-102-Whiteboard.png

Azure Document Intelligence

Uses Azure Document Intelligence - Vision API that extracts key-value pairs and table data from form documents.

Uses of Document Intelligence :
- Process automation
- Knowledge mining
- Industry-specific applications
Uses OCR and deep learning to extract text, key-value pairs, selection marks and tables from documents
Custom models can be trained through the Azure Document Intelligence Studio

!Screenshot 2024-05-25 at 5.33.05 PM.png
!Screenshot 2024-05-25 at 5.33.09 PM.png

Using the Document Intelligence Models - APIs

endpoint = "YOUR_DOC_INTELLIGENCE_ENDPOINT"
key = "YOUR_DOC_INTELLIGENCE_KEY"

model_id = "YOUR_CUSTOM_BUILT_MODEL_ID"
formUrl = "YOUR_DOCUMENT"

document_analysis_client = DocumentAnalysisClient(
    endpoint=endpoint, credential=AzureKeyCredential(key)
)

# Make sure your document's type is included in the list of document types the custom model can analyze
task = document_analysis_client.begin_analyze_document_from_url(model_id, formUrl)
result = task.result()

Sample output: Successful response has the analyzeResult key that contains the contents extracted and an array of pages containing info about the doc content.

{
	"status": "succeeded",
	"createdDateTime": "2023-10-18T23:39:50Z",
	"lastUpdatedDateTime": "2023-10-18T23:39:54Z",
	"analyzeResult": {
		"apiVersion": "2022-08-31",
		"modelId": "DocIntelModel",
		"stringIndexType": "utf16CodeUnit",
		"content": "Purchase Order\nHero Limited\nCompany Phone: 555-348-6512 Website: www.herolimited.com Email: accounts@herolimited.com\nPurchase Order\nDated As: 12/20/2020 Purchase Order #: 948284\nShipped To Vendor Name: Balozi Khamisi Company Name: Higgly Wiggly Books Address: 938 NE Burner Road Boulder City, CO 92848 Phone: 938-294-2949\nShipped From Name: Kidane Tsehaye Company Name: Jupiter Book Supply Address: 383 N Kinnick Road Seattle, WA 38383\nPhone: 932-299-0292\nDetails\nQuantity\nUnit Price\nTotal\nBindings\n20\n1.00\n20.00\nCovers Small\n20\n1.00\n20.00\nFeather Bookmark\n20\n5.00\n100.00\nCopper Swirl Marker\n20\n5.00\n100.00\nSUBTOTAL\n$140.00\nTAX\n$4.00\nTOTAL\n$144.00\nKidane Tsehaye\nManager\nKidane Tsehaye\nAdditional Notes: Do not Jostle Box. Unpack carefully. Enjoy. Jupiter Book Supply will refund you 50% per book if returned within 60 days of reading and offer you 25% off you next total purchase.",
		"pages": [
			{
				"pageNumber": 1,
				"angle": 0,
				"width": 1159,
				"height": 1486,
				"unit": "pixel",
				"words": [
					{
						"content": "Purchase",
						"polygon": [
							89,
							90,
							174,
							91,
							174,
							112,
							88,
							112
						],
						"confidence": 0.996,
						"span": {
							"offset": 0,
							"length": 8
						}
					},
					{
						"content": "Order",
						"polygon": [
							178,
							91,
							237,
							91,
							236,
							113,
							178,
							112
						],
						"confidence": 0.997,
						"span": {
							"offset": 9,
							"length": 5
						}
					},
                    ...

If the confidence values of the analyzeResult are low, try to improve the quality of your input documents
Confidence score of 80% or higher is acceptable for a low-risk application

Planning a Solution with Document Intelligence

To create an Azure AI Document Intelligence resource in Azure and obtain connection details, complete these steps:

In the Azure portal, select Create a resource.
In the Search services and marketplace box, type Document Intelligence and then press Enter.
In the Document intelligence page, select Create.
In the Create Document intelligence page, under Project Details, select your Subscription and either select an existing Resource group or create a new one.
Under Instance details, select a Region near your users.
In the Name textbox, type a unique name for the resource.
Select a Pricing tier and then select Review + create.
If the validation tests pass, select Create. Azure deploys the new Azure AI Document Intelligence resource.

Connect to Azure AI Document Intelligence

When you write an application that uses Azure AI Document Intelligence, you need two pieces of information to connect to the resource:

Endpoint. This is the URL where the resource can be contacted.
Access key. This is unique string that Azure uses to authenticate the call to Azure AI Document Intelligence.

To obtain these details:

In the Azure portal, navigate to the Azure AI Document Intelligence resource.
Under Resource Management, select Keys and Endpoint.
Copy either KEY 1 or KEY 2 and the Endpoint values and store them for use in your application code.

The following code shows how to use these connection details to connect your application to Azure AI Document Intelligence. In this example, a sample document at a specified URL is submitted for analysis to the general document model. Replace <endpoint> and <access-key> with the connection details you obtained from the Azure portal:

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult

endpoint = "<your-endpoint>"
key = "<your-key>"

docUrl = "<url-of-document-to-analyze>"

document_analysis_client = DocumentIntelligenceClient(endpoint=endpoint, 
    credential=AzureKeyCredential(key))

poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-document", docUrl)
result: AnalyzeResult = poller.result()

Summary :

Read - gets all text
General Document - Extract key-value pairs and tables
Layout - extract text, tables and structured information from forms

If you're using the Standard pricing tier, you can add up to 100 custom models into a single composed model. If you're using the Free pricing tier, you can only add up to 5 custom models.

Knowledge Store with Azure AI Search

AI Search allows to create search solution in which a pipeline of AI skills is used to enrich data and populate an index.
This supplements the source data with insights
- The language in which a document is written.
- Key phrases that might help determine the main themes or topics discussed in a document.
- A sentiment score that quantifies how positive or negative a document is.
- Specific locations, people, organizations, or landmarks mentioned in the content.
- AI-generated descriptions of images, or image text extracted by optical character recognition (OCR).

Knowledge Stores

While the index is the primary output of the indexing process, the enriched data it contains might also be useful in other ways.
- Since the index is essentially a collection of JSON objects, each representing an indexed record, it might be useful to export the objects as JSON files for integration into a data orchestration process using tools such as Azure Data Factory.
- You may want to normalize the index records into a relational schema of tables for analysis and reporting with tools such as Microsoft Power BI.
- Having extracted embedded images from documents during the indexing process, you might want to save those images as files.

Azure AI Search supports these scenarios by enabling you to define a knowledge store in the skillset that encapsulates your enrichment pipeline. The knowledge store consists of projections of the enriched data, which can be JSON objects, tables, or image files. When an indexer runs the pipeline to create or update an index, the projections are generated and persisted in the knowledge store

Note

Projections in Azure AI Search are ways to format and store the enriched data produced by AI skills during the indexing process

Define Projections

The projections of data to be stored in your knowledge store are based on the document structures generated by the enrichment pipeline in your indexing process. Each skill in your skillset iteratively builds a JSON representation of the enriched data for the documents being indexed, and you can persist some or all of the fields in the document as projections.

Using the Shaper Skill

The process of indexing incrementally creates a complex document that contains the various output fields from the skills in the skillset. This can result in a schema that is difficult to work with, and which includes collections of primitive data values that don't map easily to well-formed JSON.

To simplify the mapping of these field values to projections in a knowledge store, it's common to use the Shaper skill to create a new, field containing a simpler structure for the fields you want to map to projections.

For example, consider the following Shaper skill definition :

{
  "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
  "name": "define-projection",
  "description": "Prepare projection fields",
  "context": "/document",
  "inputs": [
    {
      "name": "file_name",
      "source": "/document/metadata_content_name"
    },
    {
      "name": "url",
      "source": "/document/url"
    },
    {
      "name": "sentiment",
      "source": "/document/sentimentScore"
    },
    {
      "name": "key_phrases",
      "source": null,
      "sourceContext": "/document/merged_content/keyphrases/*",
      "inputs": [
        {
          "name": "phrase",
          "source": "/document/merged_content/keyphrases/*"
        }
      ]
    }
  ],
  "outputs": [
    {
      "name": "output",
      "targetName": "projection"
    }
  ]
}

Output Projection :

{
    "file_name": "file_name.pdf",
    "url": "https://<storage_path>/file_name.pdf",
    "sentiment": 1.0,
    "key_phrases": [
        {
            "phrase": "first key phrase"
        },
        {
            "phrase": "second key phrase"
        },
        {
            "phrase": "third key phrase"
        },
        ...
    ]
}

The resulting JSON document is well-formed, and easier to map to a projection in a knowledge store than the more complex document that has been built iteratively by the previous skills in the enrichment pipeline.

Defining a KS

Create a knowledgeStore object in the skillsets that specifies the Azure Storage connection string for the storage account where you want to create projections, and the definitions of the projections themselves.

You can define object projections, table projections, and file projections depending on what you want to store; however note that you must define a separate projection for each type of projection, even though each projection contains lists for tables, objects, and files. Projection types are mutually exclusive in a projection definition, so only one of the projection type lists can be populated. If you create all three kinds of projection, you must include a projection for each type; as shown here:

"knowledgeStore": { 
      "storageConnectionString": "<storage_connection_string>", 
      "projections": [
        {
            "objects": [
                {
                "storageContainer": "<container>",
                "source": "/projection"
                }
            ],
            "tables": [],
            "files": []
        },
        {
            "objects": [],
            "tables": [
                {
                "tableName": "KeyPhrases",
                "generatedKeyName": "keyphrase_id",
                "source": "projection/key_phrases/*",
                },
                {
                "tableName": "docs",
                "generatedKeyName": "document_id", 
                "source": "/projection" 
                }
            ],
            "files": []
        },
        {
            "objects": [],
            "tables": [],
            "files": [
                {
                "storageContainer": "<container>",
                "source": "/document/normalized_images/*"
                }
            ]
        }
    ]
 }

For object and file projections, the specified container will be created if it does not already exist. An Azure Storage table will be created for each table projection, with the mapped fields and a unique key field with the name specified in the generatedKeyName property. These key fields can be used to define relational joins between the tables for analysis and reporting.

Summary

Object projections are JSON representations of the indexed documents.
File projections are JPEG files containing image data extracted from documents.
Table projections create a relational schema for the extracted data.

Creating Custom Skills for Azure AI Search

The input schema for a custom skill defines a JSON structure containing a record for each document to be processed. Each Document has a unique identified, and a data payload with one or more inputs, like this:

{
    "values": [
      {
        "recordId": "<unique_identifier>",
        "data":
           {
             "<input1_name>":  "<input1_value>",
             "<input2_name>": "<input2_value>",
             ...
           }
      },
      {
        "recordId": "<unique_identifier>",
        "data":
           {
             "<input1_name>":  "<input1_value>",
             "<input2_name>": "<input2_value>",
             ...
           }
      },
      ...
    ]
}

Output Schema :

{
    "values": [
      {
        "recordId": "<unique_identifier_from_input>",
        "data":
           {
             "<output1_name>":  "<output1_value>",
              ...
           },
         "errors": [...],
         "warnings": [...]
      },
      {
        "recordId": "< unique_identifier_from_input>",
        "data":
           {
             "<output1_name>":  "<output1_value>",
              ...
           },
         "errors": [...],
         "warnings": [...]
      },
      ...
    ]
}

Adding custom skills to a Skillset

To integrate a custom skill into your indexing solution, you must add a skill for it to a skillset using the Custom.WebApiSkill skill type.

The skill definition must:

Specify the URI to your web API endpoint, including parameters and headers if necessary.
Set the context to specify at which point in the document hierarchy the skill should be called
Assign input values, usually from existing document fields
Store output in a new field, optionally specifying a target field name (otherwise the output name is used)

{
    "skills": [
      ...,
      {
        "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
        "description": "<custom skill description>",
        "uri": "https://<web_api_endpoint>?<params>",
        "httpHeaders": {
            "<header_name>": "<header_value>"
        },
        "context": "/document/<where_to_apply_skill>",
        "inputs": [
          {
            "name": "<input1_name>",
            "source": "/document/<path_to_input_field>"
          }
        ],
        "outputs": [
          {
            "name": "<output1_name>",
            "targetName": "<optional_field_name>"
          }
        ]
      }
  ]
}

Azure AI Search Module

Capacity Management

Service Tiers and Capacity Management

Free (F). Use this tier to explore the service or try the tutorials in the product documentation.
Basic (B): Use this tier for small-scale search solutions that include a maximum of 15 indexes and 2 GB of index data.
Standard (S): Use this tier for enterprise-scale solutions. There are multiple variants of this tier, including S, S2, and S3; which offer increasing capacity in terms of indexes and storage, and S3HD, which is optimized for fast read performance on smaller numbers of indexes.
Storage Optimized (L): Use a storage optimized tier (L1 or L2) when you need to create large indexes, at the cost of higher query latency.

Replicas and Partitions

Depending on the pricing tier you select, you can optimize your solution for scalability and availability by creating replicas and partitions.

Replicas are instances of the search service - you can think of them as nodes in a cluster. Increasing the number of replicas can help ensure there is sufficient capacity to service multiple concurrent query requests while managing ongoing indexing operations.
Partitions are used to divide an index into multiple storage locations, enabling you to split I/O operations such as querying or rebuilding an index.
The combination of replicas and partitions you configure determines the search units used by your solution. Put simply, the number of search units is the number of replicas multiplied by the number of partitions (R x P = SU). For example, a resource with four replicas and three partitions is using 12 search units. You can learn more about pricing tiers and capacity management in the Azure AI Search documentation.

Understanding Search Components

4 main components

Data Source
- Contains data you want to search
- Supported Data Sources :
  - Unstructured files in Azure blob storage containers.
  - Tables in Azure SQL Database.
  - Documents in Cosmos DB.
- AI search can pull data from these
- Alternatively, applications can push JSON directly to an index, without pulling from a data source
Skillset
- Basic search can only get some things (Like table column from database or file metadata)
- Here, you can apply AI skills as part of indexing
- Enrich the source with new information that can be mapped to the index field
- Eg :
  - The language in which a document is written.
  - Key phrases that might help determine the main themes or topics discussed in a document.
  - A sentiment score that quantifies how positive or negative a document is.
  - Specific locations, people, organizations, or landmarks mentioned in the content.
  - AI-generated descriptions of images, or image text extracted by optical character recognition.
  - Custom skills that you develop to meet specific requirements.
Indexer
- Engine that drives the indexing process
- It takes the outputs extracted using the skills in the skillset, along with the data and metadata values extracted from the original data source, and maps them to fields in the index.
- Automatically runs when created, and can be scheduled to run on demand (To add more documents)
Index
- The searchable result of the indexing process
- It consists of a collection of JSON documents, with fields that contain the values extracted during indexing.
- Client applications can query the index to retrieve, filter, and sort information
- Each index field can be configured with the following attributes:
  - key: Fields that define a unique key for index records.
  - searchable: Fields that can be queried using full-text search.
  - filterable: Fields that can be included in filter expressions to return only documents that match specified constraints.
  - sortable: Fields that can be used to order the results.
  - facetable: Fields that can be used to determine values for facets (user interface elements used to filter the results based on a list of known field values).
  - retrievable: Fields that can be included in search results (by default, all fields are retrievable unless this attribute is explicitly removed).

Indexing Process

The indexing process works by creating a document for each indexed entity. During indexing, an enrichment pipeline iteratively builds the documents that combine metadata from the data source with enriched fields extracted by cognitive skills. You can think of each indexed document as a JSON structure, which initially consists of a document with the index fields you have mapped to fields extracted directly from the source data, like this:

document
- metadata_storage_name
- metadata_author
- content

When the documents in the data source contain images, you can configure the indexer to extract the image data and place each image in a normalized_images collection, like this:

document
- metadata_storage_name
- metadata_author
- content
- normalized_images
  - image0
  - image1
    Each skill adds fields to the document, so for example a skill that detects the language in which a document is written might store its output in a language field, like this:
document
- metadata_storage_name
- metadata_author
- content
- normalized_images
  - image0
  - image1
- language

For example, you could run an optical character recognition (OCR) skill for each image in the normalized images collection to extract any text they contain:

document
- metadata_storage_name
- metadata_author
- content
- normalized_images
  - image0
    - Text
  - image1
    - Text
- language
  The output fields from each skill can be used as inputs for other skills later in the pipeline, which in turn store their outputs in the document structure. For example, we could use a merge skill to combine the original text content with the text extracted from each image to create a new merged_content field that contains all of the text in the document, including image text.
document
- metadata_storage_name
- metadata_author
- content
- normalized_images
  - image0
    - Text
  - image1
    - Text
- language
- merged_content

The fields in the final document structure at the end of the pipeline are mapped to index fields by the indexer in one of two ways:

Fields extracted directly from the source data are all mapped to index fields. These mappings can be implicit (fields are automatically mapped to in fields with the same name in the index) or explicit (a mapping is defined to match a source field to an index field, often to rename the field to something more useful or to apply a function to the data value as it is mapped).
Output fields from the skills in the skillset are explicitly mapped from their hierarchical location in the output to the target field in the index.

Searching an Index - Full Text Search

Parse text based documents to find query terms
Search queries in Azure AI Search are based on the Lucene query syntax
- Provides a rich set of query operations for searching, filtering, and sorting data in indexes
- 2 Variants :
  - Simple : An intuitive syntax that makes it easy to perform basic searches that match literal query terms submitted by a user.
  - Full : An extended syntax that supports complex filtering, regular expressions, and other more sophisticated queries.
Client applications submit queries to Azure AI Search by specifying a search expression along with other parameters that determine how the expression is evaluated and the results returned. *Common Parameters : *
- search - A search expression that includes the terms to be found.
- queryType - The Lucene syntax to be evaluated (simple or full).
- searchFields - The index fields to be searched.
- select - The fields to be included in the results.
- searchMode - Criteria for including results based on multiple search terms. For example, suppose you search for comfortable hotel. A searchMode value of Any returns documents that contain "comfortable", "hotel", or both; while a searchMode value of All restricts results to documents that contain both "comfortable" and "hotel".

Four Stages of Query Processing :
1. Query parsing - The search expression is evaluated and reconstructed as a tree of appropriate subqueries. Subqueries might include term queries (finding specific individual words in the search expression - for example hotel), phrase queries (finding multi-term phrases specified in quotation marks in the search expression - for example, "free parking"), and prefix queries (finding terms with a specified prefix - for example air*, which would match airway, air-conditioning, and airport).
2. Lexical analysis - The query terms are analyzed and refined based on linguistic rules. For example, text is converted to lower case and nonessential stopwords (such as "the", "a", "is", and so on) are removed. Then words are converted to their root form (for example, "comfortable" might be simplified to "comfort") and composite words are split into their constituent terms.
3. Document retrieval - The query terms are matched against the indexed terms, and the set of matching documents is identified.
4. Scoring - A relevance score is assigned to each result based on a term frequency/inverse document frequency (TF/IDF) calculation.

Applying Filtering and Sorting (On Results)

Filter

2 Ways to apply :
- By including filter criteria in a simple search expression.
- By providing an OData filter expression as a $filter parameter with a full syntax search expression.

Suppose you want to find documents containing the text London that have an author field value of Reviewer

search=London+author='Reviewer'
queryType=Simple

Using OData : (Odata $filter expressions are case sensititve)

search=London
$filter=author eq 'Reviewer'
queryType=Full

Filtering by fields:

search=*
facet=author

or

search=*
$filter=author eq 'selected-facet-value-here'

Sorting Results

search=*
$orderby=last_modified desc

You need to make use of the orderby

Enhancing the Index

Search as you Type
1. Adding a suggester to an index
2. Two forms :
  1. Suggestions - retrieve and display a list of suggested results as the user types into the search box, without needing to submit the search query.
  2. Autocomplete - complete partially typed search terms based on values in index fields.
3. To implement one or both of these capabilities, create or update an index, defining a suggester for one or more fields.

After you've added a suggester, you can use the suggestion and autocomplete REST API endpoints or the .NET DocumentsOperationsExtensions.Suggest and DocumentsOperationsExtensions.Autocomplete methods to submit a partial search term and retrieve a list of suggested results or autocompleted terms to display in the user interface.

Custom Storing and Result Boosting
1. Relevance score is calculated based on term-frequency/inverse-document-frequency (TF/IDF) algorithm.
2. You can customize the way this score is calculated by defining a scoring profile that applies a weighting value to specific fields - essentially increasing the search score for documents when the search term is found in those fields. Additionally, you can boost results based on field values - for example, increasing the relevancy score for documents based on how recently they were modified or their file size.
Synonyms
1. To help users find the information they need, you can define synonym maps that link related terms together. You can then apply those synonym maps to individual fields in an index, so that when a user searches for a particular term, documents with fields that contain the term or any of its synonyms will be included in the results.

Additional Points

To enable a field to be included in the results, you must make it retrievable.

A data source where the data to be indexed is stored (though you can also push data directly into an index by using the API).
A skillset that defines and enrichment pipeline of cognitive skills to enrich the index data.
An index that defines fields, which the user can query.
An indexer that populates the fields in the index with values extracted from the source data.

Creating Speech Enabled Apps with Azure

Azure AI Speech provides APIs that you can use to build speech-enabled applications. This includes:

Speech to text: An API that enables speech recognition in which your application can accept spoken input.
Text to speech: An API that enables speech synthesis in which your application can provide spoken output.
Speech Translation: An API that you can use to translate spoken input into multiple languages.
Speaker Recognition: An API that enables your application to recognize individual speakers based on their voice.
Intent Recognition: An API that uses conversational language understanding to determine the semantic meaning of spoken input.

Provisioning a Resource : Create a resource. To use the SDK you need the following:

Location in which the resource is deployed
One of the keys assigned to your resource

Using the Speech to Text API

The Azure AI Speech service supports speech recognition through two REST APIs:

The Speech to text API, which is the primary way to perform speech recognition.
The Speech to text Short Audio API, which is optimized for short streams of audio (up to 60 seconds).

You can use either API for interactive speech recognition, depending on the expected length of the spoken input. You can also use the Speech to text API for batch transcription, transcribing multiple audio files to text as a batch operation.

You can learn more about the REST APIs in the Speech to text REST API documentation. In practice, most interactive speech-enabled applications use the Speech service through a (programming) language-specific SDK.

Using the Speech to Text SDK

!speech-to-text.png

Use a SpeechConfig object to encapsulate the information required to connect to your Azure AI Speech resource. Specifically, its location and key.
Optionally, use an AudioConfig to define the input source for the audio to be transcribed. By default, this is the default system microphone, but you can also specify an audio file.
Use the SpeechConfig and AudioConfig to create a SpeechRecognizer object. This object is a proxy client for the Speech to text API.
Use the methods of the SpeechRecognizer object to call the underlying API functions. For example, the RecognizeOnceAsync() method uses the Azure AI Speech service to asynchronously transcribe a single spoken utterance.
Process the response from the Azure AI Speech service. In the case of the RecognizeOnceAsync() method, the result is a SpeechRecognitionResult object that includes the following properties:
- Duration
- OffsetInTicks
- Properties
- Reason
- ResultId
- Text

If the operation was successful, the Reason property has the enumerated value RecognizedSpeech, and the Text property contains the transcription. Other possible values for Result include NoMatch (indicating that the audio was successfully parsed but no speech was recognized) or Canceled, indicating that an error occurred (in which case, you can check the Properties collection for the CancellationReason property to determine what went wrong).

Azure AI Speech SDK

As with speech recognition, in practice most interactive speech-enabled applications are built using the Azure AI Speech SDK.

The pattern for implementing speech synthesis is similar to that of speech recognition:

Use a SpeechConfig object to encapsulate the information required to connect to your Azure AI Speech resource. Specifically, its location and key.
Optionally, use an AudioConfig to define the output device for the speech to be synthesized. By default, this is the default system speaker, but you can also specify an audio file, or by explicitly setting this value to a null value, you can process the audio stream object that is returned directly.
Use the SpeechConfig and AudioConfig to create a SpeechSynthesizer object. This object is a proxy client for the Text to speech API.
Use the methods of the SpeechSynthesizer object to call the underlying API functions. For example, the SpeakTextAsync() method uses the Azure AI Speech service to convert text to spoken audio.
Process the response from the Azure AI Speech service. In the case of the SpeakTextAsync method, the result is a SpeechSynthesisResult object that contains the following properties:
- AudioData
- Properties
- Reason
- ResultId

When speech has been successfully synthesized, the Reason property is set to the SynthesizingAudioCompleted enumeration and the AudioData property contains the audio stream (which, depending on the AudioConfig may have been automatically sent to a speaker or file).

Configuring Audio Formats adn Voices

When synthesizing speech, you can use a SpeechConfig object to customize the audio that is returned by the Azure AI Speech service.

Audio Format

The Azure AI Speech service supports multiple output formats for the audio stream that is generated by speech synthesis. Depending on your specific needs, you can choose a format based on the required:

Audio file type
Sample-rate
Bit-depth

The supported formats are indicated in the SDK using the SpeechSynthesisOutputFormat enumeration. For example, SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm.

To specify the required output format, use the SetSpeechSynthesisOutputFormat method of the SpeechConfig object:

speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);

For a full list of supported formats and their enumeration values, see the Azure AI Speech SDK documentation.

Voices

The Azure AI Speech service provides multiple voices that you can use to personalize your speech-enabled applications. There are two kinds of voice that you can use:

Standard voices - synthetic voices created from audio samples.
Neural voices - more natural sounding voices created using deep neural networks.

Voices are identified by names that indicate a locale and a person's name - for example en-GB-George.

To specify a voice for speech synthesis in the SpeechConfig, set its SpeechSynthesisVoiceName property to the voice you want to use:

C#Copy

speechConfig.SpeechSynthesisVoiceName = "en-GB-George";

For information about voices, see the Azure AI Speech SDK documentation.

Speech Synthesis Markup Language

While the Azure AI Speech SDK enables you to submit plain text to be synthesized into speech (for example, by using the SpeakTextAsync() method), the service also supports an XML-based syntax for describing characteristics of the speech you want to generate. This Speech Synthesis Markup Language (SSML) syntax offers greater control over how the spoken output sounds, enabling you to:

Specify a speaking style, such as "excited" or "cheerful" when using a neural voice.
Insert pauses or silence.
Specify phonemes (phonetic pronunciations), for example to pronounce the text "SQL" as "sequel".
Adjust the prosody of the voice (affecting the pitch, timbre, and speaking rate).
Use common "say-as" rules, for example to specify that a given string should be expressed as a date, time, telephone number, or other form.
Insert recorded speech or audio, for example to include a standard recorded message or simulate background noise.

For example, consider the following SSML:

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
                     xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
        <mstts:express-as style="cheerful">
          I say tomato
        </mstts:express-as>
    </voice>
    <voice name="en-US-GuyNeural">
        I say <phoneme alphabet="sapi" ph="t ao m ae t ow"> tomato </phoneme>.
        <break strength="weak"/>Lets call the whole thing off!
    </voice>
</speak>

This SSML specifies a spoken dialog between two different neural voices, like this:

Ariana (cheerfully): "I say tomato:
Guy: "I say tomato (pronounced tom-ah-toe) ... Let's call the whole thing off!"

To submit an SSML description to the Speech service, you can use the SpeakSsmlAsync() method, like this:

speechSynthesizer.SpeakSsmlAsync(ssml_string);

For more information about SSML, see the Azure AI Speech SDK documentation.

More Language Translation Tasks

Define Intents, utterances and entities

Utterances are the phrases that a user might enter when interacting with an application that uses your language model. An intent represents a task or action the user wants to perform, or more simply the meaning of an utterance. You create a model by defining intents and associating them with one or more utterances.

For example, consider the following list of intents and associated utterances:

GetTime:
- "What time is it?"
- "What is the time?"
- "Tell me the time"
GetWeather:
- "What is the weather forecast?"
- "Do I need an umbrella?"
- "Will it snow?"
TurnOnDevice
- "Turn the light on."
- "Switch on the light."
- "Turn on the fan"
None:
- "Hello"
- "Goodbye"
  In your model, you must define the intents that you want your model to understand, so spend some time considering the domain your model must support and the kinds of actions or information that users might request. In addition to the intents that you define, every model includes a None intent that you should use to explicitly identify utterances that a user might submit, but for which there is no specific action required (for example, conversational greetings like "hello") or that fall outside of the scope of the domain for this model.

After you've identified the intents your model must support, it's important to capture various different example utterances for each intent. Collect utterances that you think users will enter; including utterances meaning the same thing but that are constructed in different ways. Keep these guidelines in mind:

Capture multiple different examples, or alternative ways of saying the same thing
Vary the length of the utterances from short, to medium, to long
Vary the location of the noun or subject of the utterance. Place it at the beginning, the end, or somewhere in between
Use correct grammar and incorrect grammar in different utterances to offer good training data examples
The precision, consistency and completeness of your labeled data are key factors to determining model performance.
- Label precisely: Label each entity to its right type always. Only include what you want extracted, avoid unnecessary data in your labels.
- Label consistently: The same entity should have the same label across all the utterances.
- Label completely: Label all the instances of the entity in all your utterances.

Entities are used to add specific context to intents. For example, you might define a TurnOnDevice intent that can be applied to multiple devices, and use entities to define the different devices.

Consider the following utterances, intents, and entities:

Utterance	Intent	Entities
What is the time?	GetTime
What time is it in London?	GetTime	Location (London)
What's the weather forecast for Paris?	GetWeather	Location (Paris)
Will I need an umbrella tonight?	GetWeather	Time (tonight)
What's the forecast for Seattle tomorrow?	GetWeather	Location (Seattle), Time (tomorrow)
Turn the light on.	TurnOnDevice	Device (light)
Switch on the fan.	TurnOnDevice	Device (fan)

You can split entities into a few different component types:

Learned entities are the most flexible kind of entity, and should be used in most cases. You define a learned component with a suitable name, and then associate words or phrases with it in training utterances. When you train your model, it learns to match the appropriate elements in the utterances with the entity.
List entities are useful when you need an entity with a specific set of possible values - for example, days of the week. You can include synonyms in a list entity definition, so you could define a DayOfWeek entity that includes the values "Sunday", "Monday", "Tuesday", and so on; each with synonyms like "Sun", "Mon", "Tue", and so on.
Prebuilt entities are useful for common types such as numbers, datetimes, and names. For example, when prebuilt components are added, you will automatically detect values such as "6" or organizations such as "Microsoft". You can see this article for a list of supported prebuilt entities.

Custom NER

Custom vs In-Built NER

<YOUR-ENDPOINT>/language/analyze-text/jobs?api-version=<API-VERSION>

Placeholder Value Example < YOUR-ENDPOINT > The endpoint for your API request https://.cognitiveservices.azure.com < AIP-VERSION > The version of the API you are calling 2023-05-01

The body contains several documents.
Sample Response :

<...>
"entities":[
    {
        "text":"Seattle",
        "category":"Location",
        "subcategory":"GPE",
        "offset":45,
        "length":7,
        "confidenceScore":0.99
    },
    {
        "text":"next week",
        "category":"DateTime",
        "subcategory":"DateRange",
        "offset":104,
        "length":9,
        "confidenceScore":0.8
    }
]
<...>

A full list of recognized entity categories is available in the NER docs. Examples of when you'd want custom NER include specific legal or bank data, knowledge mining to enhance catalog search, or looking for specific text for audit policies. Each one of these projects requires a specific set of entities and data it needs to extract.

Extracting Entities

To submit an extraction task, the API requires the JSON body to specify which task to execute. For custom NER, the task for the JSON payload is CustomEntityRecognition.

Your payload will look similar to the following JSON:

{
    "displayName": "string",
    "analysisInput": {
        "documents": [
            {
                "id": "doc1", 
                "text": "string"
            },
            {
                "id": "doc2",
                "text": "string"
            }
        ]
    },
    "tasks": [
        {
            "kind": "CustomEntityRecognition",
            "taskName": "MyRecognitionTaskName",
            "parameters": {
            "projectName": "MyProject",
            "deploymentName": "MyDeployment"
            }
        }
    ]
}

Project Limits

The Azure AI Language service enforces the following restrictions:

Training - at least 10 files, and not more than 100,000
Deployments - 10 deployment names per project
APIs
- Authoring - this API creates a project, trains, and deploys your model. Limited to 10 POST and 100 GET per minute
- Analyze - this API does the work of actually extracting the entities; it requests a task and retrieves the results. Limited to 20 GET or POST
Projects - only 1 storage account per project, 500 projects per resource, and 50 trained models per project
Entities - each entity can be up to 500 characters. You can have up to 200 entity types.

See the Service limits for Azure AI Language page for detailed information.

Label your Data

Labeling, or tagging, your data correctly is an important part of the process to create a custom entity extraction model. Labels identify examples of specific entities in text used to train the model. Three things to focus on are:

Consistency - Label your data the same way across all files for training. Consistency allows your model to learn without any conflicting inputs.
Precision - Label your entities consistently, without unnecessary extra words. Precision ensures only the correct data is included in your extracted entity.
Completeness - Label your data completely, and don't miss any entities. Completeness helps your model always recognize the entities present.
!tag-entity-screenshot.png

How to Label your Data

Language Studio is the most straight forward method for labeling your data. Language Studio allows you to see the file, select the beginning and end of your entity, and specify which entity it is.

Each label that you identify gets saved into a file that lives in your storage account with your dataset, in an auto-generated JSON file. This file then gets used by the model to learn how to extract custom entities. It's possible to provide this file when creating your project (if you're importing the same labels from a different project, for example) however it must be in the Accepted custom NER data formats. For example:


{
  "projectFileVersion": "{DATE}",
  "stringIndexType": "Utf16CodeUnit",
  "metadata": {
    "projectKind": "CustomEntityRecognition",
    "storageInputContainerName": "{CONTAINER-NAME}",
    "projectName": "{PROJECT-NAME}",
    "multilingual": false,
    "description": "Project-description",
    "language": "en-us",
    "settings": {}
  },
  "assets": {
    "projectKind": "CustomEntityRecognition",
    "entities": [
      {
        "category": "Entity1"
      },
      {
        "category": "Entity2"
      }
    ],
    "documents": [
      {
        "location": "{DOCUMENT-NAME}",
        "language": "{LANGUAGE-CODE}",
        "dataset": "{DATASET}",
        "entities": [
          {
            "regionOffset": 0,
            "regionLength": 500,
            "labels": [
              {
                "category": "Entity1",
                "offset": 25,
                "length": 10
              },
              {
                "category": "Entity2",
                "offset": 120,
                "length": 8
              }
            ]
          }
        ]
      },
      {
        "location": "{DOCUMENT-NAME}",
        "language": "{LANGUAGE-CODE}",
        "dataset": "{DATASET}",
        "entities": [
          {
            "regionOffset": 0,
            "regionLength": 100,
            "labels": [
              {
                "category": "Entity2",
                "offset": 20,
                "length": 5
              }
            ]
          }
        ]
      }
    ]
  }
}

Field	Description
documents	Array of labeled documents
location	Path to file within container connected to the project
language	Language of the file
entities	Array of present entities in the current document
regionOffset	Inclusive character position for start of text
regionLength	Length in characters of the data used in training
category	Name of entity to extract
labels	Array of labeled entities in the files
offset	Inclusive character position for start of entity
length	Length in characters of the entity
dataset	Which dataset the file is assigned to

Train and Evaluate your Model

Training and evaluating your model is an iterative process of adding data and labels to your training dataset to teach the model more accurately. To know what types of data and labels need to be improved, Language Studio provides scoring in the View model details page on the left hand pane.

Individual entities and your overall model score are broken down into three metrics to explain how they're performing and where they need to improve.

Interpreting Metrics

Ideally we want our model to score well in both precision and recall, which means the entity recognition works well. If both metrics have a low score, it means the model is both struggling to recognize entities in the document, and when it does extract that entity, it doesn't assign it the correct label with high confidence.

If precision is low but recall is high, it means that the model recognizes the entity well but doesn't label it as the correct entity type.

If precision is high but recall is low, it means that the model doesn't always recognize the entity, but when the model extracts the entity, the correct label is applied.

Confusion Matrix

On the same View model details page, there's another tab on the top for the Confusion matrix. This view provides a visual table of all the entities and how each performed, giving a complete view of the model and where it's falling short.

The confusion matrix allows you to visually identify where to add data to improve your model's performance.

Custom Text Classification Model

Types of Classification Projects

Custom text classification assigns labels, which in the Azure AI Language service is a class that the developer defines, to text files. For example, a video game summary might be classified as "Adventure", "Strategy", "Action" or "Sports".

Custom text classification falls into two types of projects:

Single label classification - you can assign only one class to each file. Following the above example, a video game summary could only be classified as "Adventure" or "Strategy".
Multiple label classification - you can assign multiple classes to each file. This type of project would allow you to classify a video game summary as "Adventure" or "Adventure and Strategy".

When creating your custom text classification project, you can specify which project you want to build.

Single vs Multiple Label Projects

Beyond the ability to put files into multiple classifications, the key differences with multiple label classification projects are labeling, considerations for improving your model, and the API payload for classification tasks.

Labeling data

In single label projects, each file is assigned one class during the labeling process; class assignment in Azure AI Language only allows you to select one class.

When labeling multiple label projects, you can assign as many classes that you want per file. The impact of the added complexity means your data has to remain clear and provide a good distribution of possible inputs for your model to learn from.

Labeling data correctly, especially for multiple label projects, is directly correlated with how well your model performs. The higher the quality, clarity, and variation of your data set is, the more accurate your model will be.

Evaluating and Improving Model

Measuring predictive performance of your model goes beyond how many predictions were correct. Correct classifications are when the actual label is x and the model predicts a label x. In the real world, documents result in different kinds of errors when a classification isn't correct:

False positive - model predicts x, but the file isn't labeled x.
False negative - model doesn't predict label x, but the file in fact is labeled x.

These metrics are translated into three measures provided by Azure AI Language:

Recall - Of all the actual labels, how many were identified; the ratio of true positives to all that was labeled.
Precision - How many of the predicted labels are correct; the ratio of true positives to all identified positives.
F1 Score - A function of recall and precision, intended to provide a single score to maximize for a balance of each component

Tip

Learn more about the Azure AI Language evaluation metrics, including exactly how these metrics are calculated

With a single label project, you can identify which classes aren't classified as well as others and find more quality data to use in training your model. For multiple label projects, figuring out quality data becomes more complex due to the matrix of possible permutations of combined labels.

For example, let's your model is correctly classifying "Action" games and some "Action and Strategy" games, but failing at "Strategy" games. To improve your model, you'll want to find more high quality and varied summaries for both "Action and Strategy" games, as well at "Strategy" games to teach your model how to differentiate the two. This challenge increases exponentially with more possible classes your model is classifying into.

API Payload

Azure AI Language provides a REST API to build and interact with your model, using a JSON body to specify the request. This API is abstracted into multiple language-specific SDKs, however for this module we'll focus our examples on the base REST API.

To submit a classification task, the API requires the JSON body to specify which task to execute. You'll learn more about the REST API in the next unit, but worth familiarizing yourself with parts of the required body.

Single label classification models specify a project type of customSingleLabelClassification

{
  "projectFileVersion": "<API-VERSION>",
  "stringIndexType": "Utf16CodeUnit",
  "metadata": {
    "projectName": "<PROJECT-NAME>",
    "storageInputContainerName": "<CONTAINER-NAME>",
    "projectKind": "customSingleLabelClassification",
    "description": "Trying out custom multi label text classification",
    "language": "<LANGUAGE-CODE>",
    "multilingual": true,
    "settings": {}
  },
  "assets": {
    "projectKind": "customSingleLabelClassification",
        "classes": [
            {
                "category": "Class1"
            },
            {
                "category": "Class2"
            }
        ],
        "documents": [
            {
                "location": "<DOCUMENT-NAME>",
                "language": "<LANGUAGE-CODE>",
                "dataset": "<DATASET>",
                "class": {
                    "category": "Class2"
                }
            },
            {
                "location": "<DOCUMENT-NAME>",
                "language": "<LANGUAGE-CODE>",
                "dataset": "<DATASET>",
                "class": {
                    "category": "Class1"
                }
            }
        ]
    }
}

Multiple label classification models specify a project type of CustomMultiLabelClassification

{
  "projectFileVersion": "<API-VERSION>",
  "stringIndexType": "Utf16CodeUnit",
  "metadata": {
    "projectName": "<PROJECT-NAME>",
    "storageInputContainerName": "<CONTAINER-NAME>",
    "projectKind": "customMultiLabelClassification",
    "description": "Trying out custom multi label text classification",
    "language": "<LANGUAGE-CODE>",
    "multilingual": true,
    "settings": {}
  },
  "assets": {
    "projectKind": "customMultiLabelClassification",
    "classes": [
      {
        "category": "Class1"
      },
      {
        "category": "Class2"
      }
    ],
    "documents": [
      {
        "location": "<DOCUMENT-NAME>",
        "language": "<LANGUAGE-CODE>",
        "dataset": "<DATASET>",
        "classes": [
          {
            "category": "Class1"
          },
          {
            "category": "Class2"
          }
        ]
      },
      {
        "location": "<DOCUMENT-NAME>",
        "language": "<LANGUAGE-CODE>",
        "dataset": "<DATASET>",
        "classes": [
          {
            "category": "Class2"
          }
        ]
      }
    ]
  }
}

Splitting Datasets

When labeling your data, you can specify which dataset you want each file to be:

Training - The training dataset is used to actually train the model; the data and labels provided are fed into the machine learning algorithm to teach your model what data should be classified to which label. The training dataset will be the larger of the two datasets, recommended to be about 80% of your labeled data.
Testing - The testing dataset is labeled data used to verify you model after it's trained. Azure will take the data in the testing dataset, submit it to the model, and compare the output to how you labeled your data to determine how well the model performed. The result of that comparison is how your model gets scored and helps you know how to improve your predictive performance.

During the Train model step, there are two options for how to train your model.

Automatic split - Azure takes all of your data, splits it into the specified percentages randomly, and applies them in training the model. This option is best when you have a larger dataset, data is naturally more consistent, or the distribution of your data extensively covers your classes.
Manual split - Manually specify which files should be in each dataset. When you submit the training job, the Azure AI Language service will tell you the split of the dataset and the distribution. This split is best used with smaller datasets to ensure the correct distribution of classes and variation in data are present to correctly train your model.

To use the automatic split, put all files into the training dataset when labeling your data (this option is the default). To use the manual split, specify which files should be in testing versus training during the labeling of your data.

Deployment Options

Azure AI Language allows each project to create both multiple models and multiple deployments, each with their own unique name. Benefits include ability to:

Test two models side by side
Compare how the split of datasets impact performance
Deploy multiple versions of your model

Note

Each project has a limit of ten deployment names

During deployment you can choose the name for the deployed model, which can then be selected when submitting a classification task:

<...>
  "tasks": [
    {
      "kind": "CustomSingleLabelClassification",
      "taskName": "MyTaskName",
      "parameters": {
        "projectName": "MyProject",
        "deploymentName": "MyDeployment"
      }
    }
  ]
<...>

Conversational Language Understanding Bot (Speech)

Prebuilt Capacities

Preconfigured

Summarization
Named Entity Recognition
Personally Identifiable Information
Key Phrase Extraction
Sentiment Analysis
Language Detection

Learned Features

Conversational Language Understanding (CLU) -
Custom Named Entity Recognition
Custom Text Classification
Question Answering

Resources for Building your Model

First, you'll need to create your Azure AI Language resource in the Azure portal. Then:

Search for Azure AI services.
Find and select Language Service.
Select Create under the Language Service.
Fill out the necessary details, choosing the region closest to you geographically (for best performance) and giving it a unique name.

Once that resource has been created, you'll need a key and the endpoint. You can find that on the left side under Keys and Endpoint of the resource overview page.

Alternatively you can use Language Studio for the same

RestAPI

You'll need to submit a request to the appropriate URI for each step, and then send another request to get the status of that job.

For example, if you want to deploy a model for a conversational language understanding project, you'd submit the deployment job, and then check on the deployment job status.

Authentication
For each call to your Azure AI Language resource, you authenticate the request by providing the following header.

Key	Value
Ocp-Apim-Subscription-Key	The key to your resource

Request Deployment
Submit a POST request to the following endpoint.

{ENDPOINT}/language/authoring/analyze-conversations/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}?api-version={API-VERSION}

Placeholder Value Example The endpoint of your Azure AI Language resource https://.cognitiveservices.azure.com The name for your project. This value is case-sensitive myProject The name for your deployment. This value is case-sensitive staging The version of the API you're calling 2022-05-01

Include the following body with your request.

{
  "trainedModelLabel": "{MODEL-NAME}",
}

Placeholder	Value
	The model name that will be assigned to your deployment. This value is case-sensitive.

Successfully submitting your request will receive a 202 response, with a response header of operation-location. This header will have a URL with which to request the status, formatted like this:

{ENDPOINT}/language/authoring/analyze-conversations/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version={API-VERSION}

Get Deployment Status
Submit a GET request to the URL from the response header above. The values will already be filled out based on the initial deployment request.

{ENDPOINT}/language/authoring/analyze-conversations/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version={API-VERSION}

Placeholder	Value
	The endpoint for authenticating your API request
	The name for your project (case-sensitive)
	The name for your deployment (case-sensitive)
	The ID for locating your model's training status, found in the header value detailed above in the deployment request
	The version of the API you're calling
The response body will give the deployment status details. The status field will have the value of succeeded when the deployment is complete.

{
    "jobId":"{JOB-ID}",
    "createdDateTime":"String",
    "lastUpdatedDateTime":"String",
    "expirationDateTime":"String",
    "status":"running"
}

For a full walkthrough of each step with example requests, see the conversational understanding quickstart.

Querying Language Models

To query your model for a prediction, you can use SDKs in C# or Python, or use the REST API.

Using SDKs

To query your model using an SDK, you first need to create your client. Once you have your client, you then use it to call the appropriate endpoint.

language_client = TextAnalyticsClient(
            endpoint=endpoint,
            credential=credentials)
response = language_client.extract_key_phrases(documents = documents)[0]

Other language features, such as the conversational language understanding, require the request be built and sent differently.

result = client.analyze_conversation(
    task={
        "kind": "Conversation",
        "analysisInput": {
            "conversationItem": {
                "participantId": "1",
                "id": "1",
                "modality": "text",
                "language": "en",
                "text": query
            },
            "isLoggingEnabled": False
        },
        "parameters": {
            "projectName": cls_project,
            "deploymentName": deployment_slot,
            "verbose": True
        }
    }
)

Querying the RESTAPI

To query your model using REST, create a POST request to the appropriate URL with the appropriate body specified. For built in features such as language detection or sentiment analysis, you'll query the analyze-text endpoint.

Tip
Remember each request needs to be authenticated with your Azure AI Language resource key in the Ocp-Apim-Subscription-Key header

{ENDPOINT}/language/:analyze-text?api-version={API-VERSION}

Placeholder	Value
	The endpoint for authenticating your API request
	The version of the API you're calling

Within the body of that request, you must specify the kind parameter, which tells the service what type of language understanding you're requesting.

If you want to detect the language, for example, the JSON body would look something like the following.

{
    "kind": "LanguageDetection",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "text": "This is a document written in English."
            }
        ]
    }
}

Other language features, such as the conversational language understanding, require the request be routed to a different endpoint. For example, the conversational language understanding request would be sent to the following.

{ENDPOINT}/language/:analyze-conversations?api-version={API-VERSION}

Placeholder	Value
	The endpoint for authenticating your API request
	The version of the API you're calling
That request would include a JSON body similar to the following.

{
  "kind": "Conversation",
  "analysisInput": {
    "conversationItem": {
      "id": "1",
      "participantId": "1",
      "text": "Sample text"
    }
  },
  "parameters": {
    "projectName": "{PROJECT-NAME}",
    "deploymentName": "{DEPLOYMENT-NAME}",
    "stringIndexType": "TextElement_V8"
  }
}

Placeholder	Value
	The name of the project where you built your model
	The name of your deployment

Sample Response

The query response from an SDK will in the object returned, which varies depending on the feature (such as in response.key_phrases or response.Value). The REST API will return JSON that would be similar to the following.

{
    "kind": "KeyPhraseExtractionResults",
    "results": {
        "documents": [{
            "id": "1",
            "keyPhrases": ["modern medical office", "Dr. Smith", "great staff"],
            "warnings": []
        }],
        "errors": [],
        "modelVersion": "{VERSION}"
    }
}

For other models like conversational language understanding, a sample response to your query would be similar to the following.

{
  "kind": "ConversationResult",
  "result": {
    "query": "String",
    "prediction": {
      "topIntent": "intent1",
      "projectKind": "Conversation",
      "intents": [
        {
          "category": "intent1",
          "confidenceScore": 1
        },
        {
          "category": "intent2",
          "confidenceScore": 0
        }
      ],
      "entities": [
        {
          "category": "entity1",
          "text": "text",
          "offset": 7,
          "length": 4,
          "confidenceScore": 1
        }
      ]
    }
  }
}

The SDKs for both Python and C# return JSON that is very similar to the REST response.

For full documentation on features, including examples and how-to guides, see the Azure AI Language documentation documentation pages.

Using the RestAPI for Language Services

The REST API available for the Azure AI Language service allows for CLI development of Azure AI Language projects in the same way that Language Studio provides a user interface for building projects. Language Studio is explored further in this module's lab.

Pattern of using the API

The API for the Azure AI Language service operates asynchronously for most calls. In each step we submit a request to the service first, then check back with the service via a subsequent call to get the status or result.

With each request, a header is required to authenticate your request:

Expand table

Key	Value
Ocp-Apim-Subscription-Key	The key to your Azure AI Language resource

Submit initial request

The URL to submit the request to varies on which step you are on, but all are prefixed with the endpoint provided by your Azure AI Language resource.

For example, to train a model, you would create a POST to the URL that would look something like the following:

<YOUR-ENDPOINT>/language/analyze-text/projects/<PROJECT-NAME>/:train?api-version=<API-VERSION>

Placeholder Value Example < YOUR-ENDPOINT > The endpoint for your API request https://.cognitiveservices.azure.com < PROJECT-NAME > The name for your project (value is case-sensitive) myProject

The following body would be attached to the request:

    {
        "modelLabel": "<MODEL-NAME>",
        "trainingConfigVersion": "<CONFIG-VERSION>",
        "evaluationOptions": {
            "kind": "percentage",
            "trainingSplitPercentage": 80,
            "testingSplitPercentage": 20
        }
    }

Key	Value
< YOUR-MODEL >	Your model name.
trainingConfigVersion	The model version to use to train your model.
runValidation	Boolean value to run validation on the test set.
evaluationOptions	Specifies evaluation options.
kind	Specifies data split type. Can be percentage if you're using an automatic split, or set if you manually split your dataset
testingSplitPercentage	Required integer field only if type is percentage. Specifies testing split.
trainingSplitPercentage	Required integer field only if type is percentage. Specifies training split.

The response to the above request will be a 202, meaning the request was successful. Grab the location value from the response headers, which will look similar to the following URL:

<ENDPOINT>/language/analyze-text/projects/<PROJECT-NAME>/train/jobs/<JOB-ID>?api-version=<API-VERSION>

Key	Value
< JOB-ID >	Identifier for your request

This URL is used in the next step to get the training status.

Get Training Status

To get the training status, use the URL from the header of the request response to submit a GET request, with same header that provides our Azure AI Language service key for authentication. The response body will be similar to the following JSON:

{
  "result": {
    "modelLabel": "<MODEL-NAME>",
    "trainingConfigVersion": "<CONFIG-VERSION>",
    "estimatedEndDateTime": "2023-05-18T15:47:58.8190649Z",
    "trainingStatus": {
      "percentComplete": 3,
      "startDateTime": "2023-05-18T15:45:06.8190649Z",
      "status": "running"
    },
    "evaluationStatus": {
      "percentComplete": 0,
      "status": "notStarted"
    }
  },
  "jobId": "<JOB-ID>",
  "createdDateTime": "2023-05-18T15:44:44Z",
  "lastUpdatedDateTime": "2023-05-18T15:45:48Z",
  "expirationDateTime": "2023-05-25T15:44:44Z",
  "status": "running"
}

Training a model can take some time, so periodically check back at this status URL until the response status returns succeeded. Once the training has succeeded, you can view, verify, and deploy your model.

Consuming a deployed model

Using the model to classify text follows the same pattern as outlined above, with a POST request submitting the job and a GET request to retrieve the results.

Submit text for classification

To use your model, submit a POST to the analyze endpoint at the following URL:

<ENDPOINT>/language/analyze-text/jobs?api-version=<API-VERSION>

Placeholder Value Example < YOUR-ENDPOINT > The endpoint for your API request https://.cognitiveservices.azure.com

Important
Remember to include your resource key in the header for Ocp-Apim-Subscription-Key

The following JSON structure would be attached to the request:

{
  "displayName": "Classifying documents",
  "analysisInput": {
    "documents": [
      {
        "id": "1",
        "language": "<LANGUAGE-CODE>",
        "text": "Text1"
      },
      {
        "id": "2",
        "language": "<LANGUAGE-CODE>",
        "text": "Text2"
      }
    ]
  },
  "tasks": [
     {
      "kind": "<TASK-REQUIRED>",
      "taskName": "<TASK-NAME>",
      "parameters": {
        "projectName": "<PROJECT-NAME>",
        "deploymentName": "<DEPLOYMENT-NAME>"
      }
    }
  ]
}

Key	Value
< TASK-REQUIRED >	Which task you're requesting. The task is CustomMultiLabelClassification for multiple label projects, or CustomSingleLabelClassification for single label projects
< LANGUAGE-CODE >	The language code such as en-us.
< TASK-NAME >	Your task name.
< PROJECT-NAME >	Your project name.
< DEPLOYMENT-NAME >	Your deployment name.

The response to the above request will be a 202, meaning the request was successful. Look for the operation-location value in the response headers, which will look something like the following URL:

<ENDPOINT>/language/analyze-text/jobs/<JOB-ID>?api-version=<API-VERSION>

Key	Value
< YOUR-ENDPOINT >	The endpoint for your API request
< JOB-ID >	Identifier for your request

This URL is used to get your task results.

Get classification results

Submit a GET request to the endpoint from the previous request, with the same header for authentication. The response body will be similar to the following JSON:

{
  "createdDateTime": "2023-05-19T14:32:25.578Z",
  "displayName": "MyJobName",
  "expirationDateTime": "2023-05-19T14:32:25.578Z",
  "jobId": "xxxx-xxxxxx-xxxxx-xxxx",
  "lastUpdateDateTime": "2023-05-19T14:32:25.578Z",
  "status": "succeeded",
  "tasks": {
    "completed": 1,
    "failed": 0,
    "inProgress": 0,
    "total": 1,
    "items": [
      {
        "kind": "customSingleClassificationTasks",
        "taskName": "Classify documents",
        "lastUpdateDateTime": "2022-10-01T15:01:03Z",
        "status": "succeeded",
        "results": {
          "documents": [
            {
              "id": "<DOC-ID>",
              "class": [
                  {
                      "category": "Class_1",
                      "confidenceScore": 0.0551877357
                  }
              ],
              "warnings": []
            }
          ],
          "errors": [],
          "modelVersion": "2022-04-01"
        }
      }
    ]
  }
}

The classification result is within the items array's results object, for each document submitted.

Skills Measured

Study and Prep

Details

Tips

Tested Skills

Responsible AI

What is AI?

AI vs ML vs DS

Azure ML Service

Azure Cognitive Services (Now AI Services)

For Visual Perception (CV)

For Language

For Speech

For Decision Making

For Knowledge Mining and Document Intelligence

For GenAI

Pre-Built Solutions on Azure

Deploying to Azure

Provisioning an Azure AI Service Resource

Resources

Aspects of Resource (Consuming a Service)

Securing AI Services

Securing Keys

Network Protection

RestAPIs and SDKs (Using a Service)

REST APIs HTTP Endpoints

SDKs (Software Development Kits)

Pricing and Monitoring

Alerts and Budgets

Deploying to Containers

Containers

Container Deployment

Deploying and Using the AI Services Container

Some Containers

Some Language Containers

Speech Containers

Vision Containers

Visual Perception Service (Image Analysis)

Image Analysis

Analyzing an Image

Video Analysis

Custom Insights

Video Analyzer Widgets and APIs

Image Classification

Object Detection

Facial Analysis (Detect, Analyse and Recognize)

Face Service

More on Face Services

Detecting with the Vision Service

OCR

Language Understanding

AI Language Project Lifecycle

Considerations for Data Selection and Refining Entities

Learned - Evaluation Metrics

Question and Answering

Text Analysis

Translation

Speech

Speech to Text

Text to Speech

Speech / Machine Translation

Speech Synthesis from Translations

Event-based synthesis

Manual synthesis

Other Speech Services

Decision Making

Anomaly Detection

Content Moderation

Content Personalization

Prebuilt Solutions

Forms Recognizer

Input Requirements

Using the Model

General Document Model

Other Resources

Metric Advisor

Video Analyzer for Media

Immersive Reader

Bot Service

Azure Cognitive / AI Services