GenAI Azure OpenAI Inference API - Getting Started

Overview

Welcome to the GenAI Azure OpenAI Inference API documentation. This API is designed to provide robust Generative AI Inference capabilities, amplifying the embeddings and completions within our system through TrustNest's integration with Azure OpenAI. This document outlines the necessary information for understanding, interacting with, and execute your first API call using the Developer Portal.

warning

GenAI Azure OpenAI Inference API is protected by MFA and requires a subscription key.

Requirements

A valid subscription key for the GenAI Azure OpenAI Inference API (see Get a subscription key for Beta API Product).

tip

If you are using a thalesgroup.com identity, please have look to the 2 troubleshooting doc:

Access to Trustnest APIM

First, open your favorite browser and access to https://trustnest.developer.azure-api.net/

You should see :

Click on "Sign in" buttun or "Sign in with AAD" in the top menu:

Click on "Azure Active Directory", you should be redirected to thalesdigital.io SSO:

note

During the first access, an additional step will ask you to fill/confirm your email. this is normal ! Give it and continue the steps.

Get a subscription key for Beta API Product

To access to an API, you should have a valid subscription key. Go to Product (on the top menu).

Select Beta API,

Click on "Subscribe" (Yellow Button), then you should be redirected to the postIT item: "Subscribe to Trustnest APIM":

Fill with:

TDFaccountID
APIM offer. For OpenAI: choose Openai API (APIM Beta API)

Click on Request

info

This item will follow the approval process and escalated to support level 2. A subscription key will be configured directly to APIM by level 2.

Once the subscription key is created, you should see it in the Product Page. Click on Beta API Product Page:

API Schema: GenAI Azure OpenAI Inference v1

Unavailable Endpoints

/deployments/{deployment-id}/completions (This endpoint is unavailable, consider using /deployments/{deployment-id}/chat/completions instead.)

Active Endpoints

/deployments/{deployment-id}/embeddings
Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.
/deployments/{deployment-id}/chat/completions
Creates a completion for the chat message.

Available Azure OpenAI Models

Model	Deployment Name	Decriptions	Version	Max Context Window
gpt-35-turbo	gpt35-4k	General-purpose model of the GPT-3 family	0613	4k token limit
gpt-35-turbo-16k	gpt35-16k	General-purpose model of the GPT-3 family	0613	16k token limit
gpt-4	gpt4-8k	General-purpose model of the GPT-4 family	0613	8k token limit
gpt-4-32k	gpt4-32k	General-purpose model of the GPT-4 family	0613	32k token limit
text-embedding-ada-002	text-embedding-ada-002	Second-generation embedding model (denoted by -002 in the model ID)	2	8k token limit

Versioning

warning

We support only the stable versions of Azure OpenAI API.

API versioning is handled via the api-version query parameter. The format is YYYY-MM-DD. The current supported Azure OpenAI API version is 2023-05-15.

API Endpoints

Utilizing Azure API Management Developer Portal

The Azure API Management Developer Portal is a web-based interface that allows you to interact with the API without having to write any code. This section will walk you through the steps to create a chat completion using the Developer Portal.

Choose GenAI Azure OpenAI Inference API from the top menu:

GenAI Azure OpenAI Inference API Portal

Authorization section:

Authorization: Oauth-AuthorizationCodeFlow: Select "authorization_code" it will open a popup windows and get an AAD token using your active session (it's transparent for you).
TrustNest-Apim-Subscription-Key: Your subscription key for the API (see Get a subscription key for Beta API Product)

Deployment ID: The deployment ID of the model you wish to use. This can be found in the Available Azure OpenAI Models under the "Deployment ID" column.

Api Version: The version of the API you wish to use. This can be found in the Versioning section.

1. Create Chat Completion

Endpoint: /deployments/{deployment-id}/chat/completions

Creates a chat completion based on the provided prompt, parameters, and chosen model.

Request

You can use the following example to create a chat completion:

POST /deployments/{deployment-id}/chat/completions?api-version={api-version} HTTP/1.1
Host: trustnest.azure-api.net/genai-aoai-inference/v1
Content-Type: application/json
Cache-Control: no-cache
TrustNest-Apim-Subscription-Key: <YOUR_SUBSCRIPTION_KEY>
Authorization: Bearer <YOUR_AAD_TOKEN>

{
  "messages": [{
      "role": "user",
      "content": "Hello!"
  }],
  "model": "gpt-35-turbo",
  "max_tokens": 25,
  "temperature": 0.3,
  "top_p": 1
}

Response

You should receive a response similar to the following:

HTTP/1.1 200 OK
{
    "id": "chatcmpl-88bRZB7bJjhiViAvAsB538aZ1THw3",
    "object": "chat.completion",
    "created": 1697061249,
    "model": "gpt-35-turbo",
    "choices": [{
        "index": 0,
        "finish_reason": "length",
        "message": {
            "role": "assistant",
            "content": "Hi there! How can I assist you today?"
        }
    }],
    "usage": {
        "completion_tokens": 5,
        "prompt_tokens": 9,
        "total_tokens": 14
    }
}

2. Get Embeddings

Endpoint: /deployments/{deployment-id}/embeddings

Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.

Request

You can use the following example to get embeddings:

POST /deployments/{deployment-id}/embeddings?api-version={api-version} HTTP/1.1
Host: trustnest.azure-api.net/genai-aoai-inference/v1
Content-Type: application/json
Cache-Control: no-cache
Authorization: Bearer <YOUR_AAD_TOKEN>
TrustNest-Apim-Subscription-Key: <YOUR_SUBSCRIPTION_KEY>

{
    "input": "This is a test.",
    "user": "string",
    "input_type": "query",
    "model": "text-embedding-ada-002"
}

Response

Yopu should receive a response similar to the following:

HTTP/1.1 200 OK

{
    "object": "list",
    "data": [{
        "object": "embedding",
        "index": 0,
        "embedding": [
            -0.003542061,
            -0.0042601773,
            0.0010812181,
            ...
        ]
    }],
    "model": "ada",
    "usage": {
        "prompt_tokens": 5,
        "total_tokens": 5
    }
}

Note: The text-embedding-ada-002 output dimensions are 1536.

Quota Limits

Please refer to the table below for quota limits for each model. It is important to note that the quota limits are for all the users of the model, not just your organization.

Model	Deployment Name	Quota (Tokens per minute)
gpt-35-turbo	gpt35-4k	300 k
gpt-35-turbo-16k	gpt35-16k	300 k
gpt-4	gpt4-8k	40 K
gpt-4-32k	gpt4-32k	80 K
text-embedding-ada-002	text-embedding-ada-002	350 K

Changelog

Date	Version	Description
2021-10-01	1.0.0	Initial

Code Examples

To use the API in your favorite language, please refer to the following code examples:

Troubleshooting

Should you encounter issues while interacting with the Azure OpenAI services via TrustNest APIM, please refer to the Troubleshooting section for guidance on identifying and resolving common problems.

Overview​

Requirements​

Access to Trustnest APIM​

Sign in for the first time​

Get a subscription key for Beta API Product​

API Schema: GenAI Azure OpenAI Inference v1​

Unavailable Endpoints​

Active Endpoints​

Available Azure OpenAI Models​

Versioning​

API Endpoints​

Utilizing Azure API Management Developer Portal​

1. Create Chat Completion​

Request​

Response​

2. Get Embeddings​

Request​

Response​

Quota Limits​

Changelog​

Code Examples​

Troubleshooting​

References​

Overview

Requirements

Access to Trustnest APIM

Sign in for the first time

Get a subscription key for Beta API Product

API Schema: GenAI Azure OpenAI Inference v1

Unavailable Endpoints

Active Endpoints

Available Azure OpenAI Models

Versioning

API Endpoints

Utilizing Azure API Management Developer Portal

1. Create Chat Completion

Request

Response

2. Get Embeddings

Request

Response

Quota Limits

Changelog

Code Examples

Troubleshooting

References