Skip to main content

GenAI Azure OpenAI Inference API - Getting Started

Overview

Welcome to the GenAI Azure OpenAI Inference API documentation. This API is designed to provide robust Generative AI Inference capabilities, amplifying the embeddings and completions within our system through TrustNest's integration with Azure OpenAI. This document outlines the necessary information for understanding, interacting with, and execute your first API call using the Developer Portal.

warning

GenAI Azure OpenAI Inference API is protected by MFA and requires a subscription key.

Requirements

tip

If you are using a thalesgroup.com identity, please have look to the 2 troubleshooting doc:

Access to Trustnest APIM

First, open your favorite browser and access to https://trustnest.developer.azure-api.net/

You should see :

img

Sign in for the first time

Click on "Sign in" buttun or "Sign in with AAD" in the top menu:

img

Click on "Azure Active Directory", you should be redirected to thalesdigital.io SSO:

img

note

During the first access, an additional step will ask you to fill/confirm your email. this is normal ! Give it and continue the steps.

Get a subscription key for Beta API Product

To access to an API, you should have a valid subscription key. Go to Product (on the top menu).

img

Select Beta API,

img

Click on "Subscribe" (Yellow Button), then you should be redirected to the postIT item: "Subscribe to Trustnest APIM":

img

Fill with:

  • TDFaccountID
  • APIM offer. For OpenAI: choose Openai API (APIM Beta API)

Click on Request

info

This item will follow the approval process and escalated to support level 2. A subscription key will be configured directly to APIM by level 2.

Once the subscription key is created, you should see it in the Product Page. Click on Beta API Product Page:

img

API Schema: GenAI Azure OpenAI Inference v1

Unavailable Endpoints

  • /deployments/{deployment-id}/completions (This endpoint is unavailable, consider using /deployments/{deployment-id}/chat/completions instead.)

Active Endpoints

  • /deployments/{deployment-id}/embeddings
    Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.

  • /deployments/{deployment-id}/chat/completions
    Creates a completion for the chat message.

Available Azure OpenAI Models

ModelDeployment NameDecriptionsVersionMax Context Window
gpt-35-turbogpt35-4kGeneral-purpose model of the GPT-3 family06134k token limit
gpt-35-turbo-16kgpt35-16kGeneral-purpose model of the GPT-3 family061316k token limit
gpt-4gpt4-8kGeneral-purpose model of the GPT-4 family06138k token limit
gpt-4-32kgpt4-32kGeneral-purpose model of the GPT-4 family061332k token limit
text-embedding-ada-002text-embedding-ada-002Second-generation embedding model (denoted by -002 in the model ID)28k token limit

Versioning

warning

We support only the stable versions of Azure OpenAI API.

API versioning is handled via the api-version query parameter. The format is YYYY-MM-DD. The current supported Azure OpenAI API version is 2023-05-15.

API Endpoints

Utilizing Azure API Management Developer Portal

The Azure API Management Developer Portal is a web-based interface that allows you to interact with the API without having to write any code. This section will walk you through the steps to create a chat completion using the Developer Portal.

Choose GenAI Azure OpenAI Inference API from the top menu:

GenAI Azure OpenAI Inference API Portal

Authorization section:

  • Authorization: Oauth-AuthorizationCodeFlow: Select "authorization_code" it will open a popup windows and get an AAD token using your active session (it's transparent for you).
  • TrustNest-Apim-Subscription-Key: Your subscription key for the API (see Get a subscription key for Beta API Product)

Deployment ID: The deployment ID of the model you wish to use. This can be found in the Available Azure OpenAI Models under the "Deployment ID" column.

Api Version: The version of the API you wish to use. This can be found in the Versioning section.

1. Create Chat Completion

Endpoint: /deployments/{deployment-id}/chat/completions

Creates a chat completion based on the provided prompt, parameters, and chosen model.

Request

You can use the following example to create a chat completion:

POST /deployments/{deployment-id}/chat/completions?api-version={api-version} HTTP/1.1
Host: trustnest.azure-api.net/genai-aoai-inference/v1
Content-Type: application/json
Cache-Control: no-cache
TrustNest-Apim-Subscription-Key: <YOUR_SUBSCRIPTION_KEY>
Authorization: Bearer <YOUR_AAD_TOKEN>

{
"messages": [{
"role": "user",
"content": "Hello!"
}],
"model": "gpt-35-turbo",
"max_tokens": 25,
"temperature": 0.3,
"top_p": 1
}
Response

You should receive a response similar to the following:

HTTP/1.1 200 OK
{
"id": "chatcmpl-88bRZB7bJjhiViAvAsB538aZ1THw3",
"object": "chat.completion",
"created": 1697061249,
"model": "gpt-35-turbo",
"choices": [{
"index": 0,
"finish_reason": "length",
"message": {
"role": "assistant",
"content": "Hi there! How can I assist you today?"
}
}],
"usage": {
"completion_tokens": 5,
"prompt_tokens": 9,
"total_tokens": 14
}
}

2. Get Embeddings

Endpoint: /deployments/{deployment-id}/embeddings

Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.

Request

You can use the following example to get embeddings:


POST /deployments/{deployment-id}/embeddings?api-version={api-version} HTTP/1.1
Host: trustnest.azure-api.net/genai-aoai-inference/v1
Content-Type: application/json
Cache-Control: no-cache
Authorization: Bearer <YOUR_AAD_TOKEN>
TrustNest-Apim-Subscription-Key: <YOUR_SUBSCRIPTION_KEY>

{
"input": "This is a test.",
"user": "string",
"input_type": "query",
"model": "text-embedding-ada-002"
}
Response

Yopu should receive a response similar to the following:

HTTP/1.1 200 OK

{
"object": "list",
"data": [{
"object": "embedding",
"index": 0,
"embedding": [
-0.003542061,
-0.0042601773,
0.0010812181,
...
]
}],
"model": "ada",
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}

Note: The text-embedding-ada-002 output dimensions are 1536.

Quota Limits

Please refer to the table below for quota limits for each model. It is important to note that the quota limits are for all the users of the model, not just your organization.

ModelDeployment NameQuota (Tokens per minute)
gpt-35-turbogpt35-4k300 k
gpt-35-turbo-16kgpt35-16k300 k
gpt-4gpt4-8k40 K
gpt-4-32kgpt4-32k80 K
text-embedding-ada-002text-embedding-ada-002350 K

Changelog

DateVersionDescription
2021-10-011.0.0Initial

Code Examples

To use the API in your favorite language, please refer to the following code examples:

Troubleshooting

Should you encounter issues while interacting with the Azure OpenAI services via TrustNest APIM, please refer to the Troubleshooting section for guidance on identifying and resolving common problems.

References