Chat Completions
Create conversational responses using various AI models through a unified API.
Create Chat Completion
POST https://aiberm.com/v1/chat/completions
1curl https://aiberm.com/v1/chat/completions \2-H "Content-Type: application/json" \3-H "Authorization: Bearer YOUR_API_KEY" \4-d '{5 "model": "gpt-4",6 "messages": [7 {"role": "system", "content": "You are a helpful assistant."},8 {"role": "user", "content": "What is the capital of France?"}9 ],10 "temperature": 0.711}'Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | ID of the model to use |
messages | array | Yes | Array of message objects |
temperature | number | No | Sampling temperature (0-2). Default: 1 |
max_tokens | integer | No | Maximum tokens to generate |
top_p | number | No | Nucleus sampling parameter |
stream | boolean | No | Whether to stream responses |
Message Roles
Messages must include a role and content:
- system - Sets the behavior/personality of the assistant
- user - Messages from the end user
- assistant - Previous responses from the AI
Streaming Responses
Enable streaming to receive responses incrementally:
1from openai import OpenAI2 3client = OpenAI(4 api_key="YOUR_API_KEY",5 base_url="https://aiberm.com/v1"6)7 8stream = client.chat.completions.create(9 model="gpt-4",10 messages=[{"role": "user", "content": "Tell me a story"}],11 stream=True12)13 14for chunk in stream:15 if chunk.choices[0].delta.content:16 print(chunk.choices[0].delta.content, end="")Best Practices
Optimize Your Requests
- Set
max_tokensto limit costs - Use appropriate
temperaturevalues (lower for factual, higher for creative) - Include system messages to guide behavior
- Stream responses for better UX
Warning
Be mindful of token limits for each model. Longer conversations may need conversation history management.