gemini API Text Chat Models Guide

Gemini API offers a powerful suite of text chat models designed to handle a wide range of natural language processing tasks. This guide provides a comprehensive overview of the available models, their capabilities, and how to integrate them into your applications using the chat completions endpoint.

Introduction

Available Models

Gemini API offers several model families with different capabilities, performance characteristics, and use cases:

Gemini 2.5 Pro Models

gemini-2.5-pro-preview-06-05 (Latest)
gemini-2.5-pro-preview-05-06
gemini-2.5-pro-preview-03-25
gemini-2.5-pro-exp-03-25

The Gemini 2.5 Pro models represent the most advanced and capable models in the lineup, offering superior reasoning, context understanding, and nuanced responses for complex tasks.

Gemini 2.5 Flash Models

gemini-2.5-flash-preview-05-20-thinking
gemini-2.5-flash-preview-05-20-nothinking
gemini-2.5-flash-preview-05-20
gemini-2.5-flash-preview-04-17-thinking
gemini-2.5-flash-preview-04-17-nothinking
gemini-2.5-flash-preview-04-17

The Flash variants provide faster response times while maintaining high quality, making them ideal for applications that require quick interactions.

Gemini 2.0 Models

gemini-2.0-pro-exp-02-05
gemini-2.0-flash-thinking-exp-1219
gemini-2.0-flash-thinking-exp-01-21
gemini-2.0-flash-lite-preview-02-05
gemini-2.0-flash-lite-001
gemini-2.0-flash-lite
gemini-2.0-flash-exp
gemini-2.0-flash-001
gemini-2.0-flash

These models offer a balance of performance and efficiency, with specialized variants for different use cases.

Making API Requests

The Gemini API uses a REST interface for chat completions. Here's how to structure your requests:

Endpoint

POST https://ai.burncloud.com/v1/chat/completions

Headers

Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

Request Body Parameters

Parameter	Type	Description
`messages`	array	Array of message objects with `role` and `content`
`model`	string	The specific model to use (from the list above)
`group`	string	Optional grouping parameter (e.g., "default")
`stream`	boolean	Whether to stream the response (default: false)
`stream_options`	object	Options for streaming (e.g., `include_usage`)
`return_reasoning`	boolean	Whether to include reasoning in response

Example Request

curl --location 'https://ai.burncloud.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
  "messages": [
    {
      "role": "user",
      "content": "有人说猪会上，有人说不会，这里反应了一个什么哲学问题？请详细分析"
    }
  ],
  "model": "gemini-2.5-pro-preview-05-06",
  "group": "default",
  "stream": false,
  "stream_options": {"include_usage": true},
  "return_reasoning": true
}'

Response Format

The API returns a JSON response with the following structure:

{
  "id": "chat-xxxxxxxx",
  "object": "chat.completion",
  "created": 1717000000,
  "model": "gemini-2.5-pro-preview-05-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The model's response text..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  },
  "reasoning": "The model's reasoning process..." // Only if return_reasoning is true
}

Advanced Features

Streaming Responses

For applications requiring real-time interactions, enable the stream parameter to receive chunks of the response as they're generated:

{
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

Reasoning Traces

The return_reasoning parameter provides insight into the model's thought process:

{
  "return_reasoning": true
}

This is particularly useful for:

Debugging model responses
Educational applications
Transparency in decision-making processes
Fine-tuning prompts based on model reasoning

Model Selection Guide

For complex reasoning tasks: Use the latest Gemini 2.5 Pro models
For quick responses with good quality: Choose Gemini 2.5 Flash models
For cost-efficient operations: Consider Gemini 2.0 Flash Lite variants
For explicit reasoning traces: Use models with "-thinking" suffix

Best Practices

Start with clear instructions in your prompts to guide the model's responses
Test different models to find the best fit for your specific use case
Use streaming for interactive applications to improve perceived responsiveness
Monitor token usage to optimize costs and performance
Implement retry logic for handling rate limits and temporary errors

Conclusion

The Gemini API Text Chat Models provide flexible, powerful natural language processing capabilities for a wide range of applications. By understanding the different model options and how to effectively structure your requests, you can leverage these models to create sophisticated AI-powered experiences.

For more information, refer to the official documentation or contact support for specific implementation questions.