Chat Completions

Create conversational responses using various AI models through a unified API.

Create Chat Completion

POST https://aiberm.com/v1/chat/completions

1curl https://aiberm.com/v1/chat/completions \
2-H "Content-Type: application/json" \
3-H "Authorization: Bearer YOUR_API_KEY" \
4-d '{
5  "model": "gpt-4",
6  "messages": [
7    {"role": "system", "content": "You are a helpful assistant."},
8    {"role": "user", "content": "What is the capital of France?"}
9  ],
10  "temperature": 0.7
11}'

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	ID of the model to use
`messages`	array	Yes	Array of message objects
`temperature`	number	No	Sampling temperature (0-2). Default: 1
`max_tokens`	integer	No	Maximum tokens to generate
`top_p`	number	No	Nucleus sampling parameter
`stream`	boolean	No	Whether to stream responses

Message Roles

Messages must include a role and content:

system - Sets the behavior/personality of the assistant
user - Messages from the end user
assistant - Previous responses from the AI

Streaming Responses

Enable streaming to receive responses incrementally:

1from openai import OpenAI
2 
3client = OpenAI(
4  api_key="YOUR_API_KEY",
5  base_url="https://aiberm.com/v1"
6)
7 
8stream = client.chat.completions.create(
9  model="gpt-4",
10  messages=[{"role": "user", "content": "Tell me a story"}],
11  stream=True
12)
13 
14for chunk in stream:
15  if chunk.choices[0].delta.content:
16      print(chunk.choices[0].delta.content, end="")

Best Practices

Optimize Your Requests

Set max_tokens to limit costs
Use appropriate temperature values (lower for factual, higher for creative)
Include system messages to guide behavior
Stream responses for better UX

Warning

Be mindful of token limits for each model. Longer conversations may need conversation history management.