Introduction
Gemini API offers a powerful suite of text chat models designed to handle a wide range of natural language processing tasks. This guide provides a comprehensive overview of the available models, their capabilities, and how to integrate them into your applications using the chat completions endpoint.
Available Models
Gemini API offers several model families with different capabilities, performance characteristics, and use cases:
Gemini 2.5 Pro Models
gemini-2.5-pro-preview-06-05 (Latest)
gemini-2.5-pro-preview-05-06
gemini-2.5-pro-preview-03-25
gemini-2.5-pro-exp-03-25
The Gemini 2.5 Pro models represent the most advanced and capable models in the lineup, offering superior reasoning, context understanding, and nuanced responses for complex tasks.
Gemini 2.5 Flash Models
gemini-2.5-flash-preview-05-20-thinking
gemini-2.5-flash-preview-05-20-nothinking
gemini-2.5-flash-preview-05-20
gemini-2.5-flash-preview-04-17-thinking
gemini-2.5-flash-preview-04-17-nothinking
gemini-2.5-flash-preview-04-17
The Flash variants provide faster response times while maintaining high quality, making them ideal for applications that require quick interactions.
Gemini 2.0 Models
gemini-2.0-pro-exp-02-05
gemini-2.0-flash-thinking-exp-1219
gemini-2.0-flash-thinking-exp-01-21
gemini-2.0-flash-lite-preview-02-05
gemini-2.0-flash-lite-001
gemini-2.0-flash-lite
gemini-2.0-flash-exp
gemini-2.0-flash-001
gemini-2.0-flash
These models offer a balance of performance and efficiency, with specialized variants for different use cases.
Making API Requests
The Gemini API uses a REST interface for chat completions. Here's how to structure your requests:
Endpoint
POST https://ai.burncloud.com/v1/chat/completions
Headers
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
Request Body Parameters
Parameter | Type | Description |
---|---|---|
| array | Array of message objects with |
| string | The specific model to use (from the list above) |
| string | Optional grouping parameter (e.g., "default") |
| boolean | Whether to stream the response (default: false) |
| object | Options for streaming (e.g., |
| boolean | Whether to include reasoning in response |
Example Request
curl --location 'https://ai.burncloud.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
"messages": [
{
"role": "user",
"content": "有人说猪会上,有人说不会,这里反应了一个什么哲学问题?请详细分析"
}
],
"model": "gemini-2.5-pro-preview-05-06",
"group": "default",
"stream": false,
"stream_options": {"include_usage": true},
"return_reasoning": true
}'
Response Format
The API returns a JSON response with the following structure:
{
"id": "chat-xxxxxxxx",
"object": "chat.completion",
"created": 1717000000,
"model": "gemini-2.5-pro-preview-05-06",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The model's response text..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 123,
"completion_tokens": 456,
"total_tokens": 579
},
"reasoning": "The model's reasoning process..." // Only if return_reasoning is true
}
Advanced Features
Streaming Responses
For applications requiring real-time interactions, enable the stream
parameter to receive chunks of the response as they're generated:
{
"stream": true,
"stream_options": {
"include_usage": true
}
}
Reasoning Traces
The return_reasoning
parameter provides insight into the model's thought process:
{
"return_reasoning": true
}
This is particularly useful for:
Debugging model responses
Educational applications
Transparency in decision-making processes
Fine-tuning prompts based on model reasoning
Model Selection Guide
For complex reasoning tasks: Use the latest Gemini 2.5 Pro models
For quick responses with good quality: Choose Gemini 2.5 Flash models
For cost-efficient operations: Consider Gemini 2.0 Flash Lite variants
For explicit reasoning traces: Use models with "-thinking" suffix
Best Practices
Start with clear instructions in your prompts to guide the model's responses
Test different models to find the best fit for your specific use case
Use streaming for interactive applications to improve perceived responsiveness
Monitor token usage to optimize costs and performance
Implement retry logic for handling rate limits and temporary errors
Conclusion
The Gemini API Text Chat Models provide flexible, powerful natural language processing capabilities for a wide range of applications. By understanding the different model options and how to effectively structure your requests, you can leverage these models to create sophisticated AI-powered experiences.
For more information, refer to the official documentation or contact support for specific implementation questions.