Chat Completions

Create conversational responses using various AI models through a unified API.

Create Chat Completion

POST https://aiberm.com/v1/chat/completions

1curl https://aiberm.com/v1/chat/completions \
2-H "Content-Type: application/json" \
3-H "Authorization: Bearer YOUR_API_KEY" \
4-d '{
5 "model": "gpt-4",
6 "messages": [
7 {"role": "system", "content": "You are a helpful assistant."},
8 {"role": "user", "content": "What is the capital of France?"}
9 ],
10 "temperature": 0.7
11}'

Request Parameters

ParameterTypeRequiredDescription
modelstringYesID of the model to use
messagesarrayYesArray of message objects
temperaturenumberNoSampling temperature (0-2). Default: 1
max_tokensintegerNoMaximum tokens to generate
top_pnumberNoNucleus sampling parameter
streambooleanNoWhether to stream responses

Message Roles

Messages must include a role and content:

  • system - Sets the behavior/personality of the assistant
  • user - Messages from the end user
  • assistant - Previous responses from the AI

Streaming Responses

Enable streaming to receive responses incrementally:

1from openai import OpenAI
2 
3client = OpenAI(
4 api_key="YOUR_API_KEY",
5 base_url="https://aiberm.com/v1"
6)
7 
8stream = client.chat.completions.create(
9 model="gpt-4",
10 messages=[{"role": "user", "content": "Tell me a story"}],
11 stream=True
12)
13 
14for chunk in stream:
15 if chunk.choices[0].delta.content:
16 print(chunk.choices[0].delta.content, end="")

Best Practices

Optimize Your Requests
  • Set max_tokens to limit costs
  • Use appropriate temperature values (lower for factual, higher for creative)
  • Include system messages to guide behavior
  • Stream responses for better UX
Warning

Be mindful of token limits for each model. Longer conversations may need conversation history management.