Service APIs
API for AI-generated answers backed by real-time web search and verifiable sources
Overview
Brave AI Grounding API provides state-of-the-art AI-generated answers backed by verifiable sources from the web. This technology improves the accuracy, relevance, and trustworthiness of AI responses by grounding them in real-time search results. Under the hood, this same service powers Brave’s Ask Brave feature, which serves millions of answers every day.
Brave’s grounded answers demonstrate strong performance across a wide range of queries, from simple trivia questions to complex research inquiries. Notably, Brave achieves state-of-the-art (SOTA) performance on the SimpleQA benchmark without being specifically optimized for it—the performance emerges naturally from the system’s design.
Access to AI Grounding is available through the AI Grounding plan. Subscribe to AI Grounding to unlock these capabilities.
Key Features
Web-Grounded Answers
AI responses backed by real-time web search with verifiable citations
OpenAI SDK Compatible
Use the familiar OpenAI SDK for seamless integration
SOTA Performance
State-of-the-art results on SimpleQA benchmark
Streaming Support
Stream answers in real-time with progressive citations
Research Mode
Enable multi-search for thorough, research-grade answers
Rich Response Data
Get entities, citations, and structured data with answers
API Reference
AI Grounding API Documentation
View the complete API reference, including parameters and response schemas
Use Cases
AI Grounding is perfect for:
- AI Assistants & Chatbots: Build intelligent conversational interfaces with factual, cited responses
- Research Applications: Conduct thorough research with multi-search capabilities
- Question Answering Systems: Provide accurate answers with source attribution
- Knowledge Applications: Create tools that need up-to-date, verifiable information
- Content Generation: Generate well-researched content with citations
Endpoint
AI Grounding uses a single, OpenAI-compatible endpoint:
https://api.search.brave.com/res/v1/chat/completionsQuick Start
Basic Example with OpenAI SDK
from openai import OpenAI
client = OpenAI(
api_key="YOUR_BRAVE_SEARCH_API_KEY",
base_url="https://api.search.brave.com/res/v1",
)
completions = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What are the best things to do in Paris with kids?",
}
],
model="brave",
stream=False,
)
print(completions.choices[0].message.content)Streaming Example
For real-time responses, enable streaming with AsyncOpenAI:
from openai import AsyncOpenAI
import asyncio
client = AsyncOpenAI(
api_key="YOUR_BRAVE_SEARCH_API_KEY",
base_url="https://api.search.brave.com/res/v1",
)
async def stream_answer():
async for chunk in await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain quantum computing",
}
],
model="brave",
stream=True,
):
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
asyncio.run(stream_answer())Using cURL
While the OpenAI SDK is recommended, you can also use cURL:
curl -X POST "https://api.search.brave.com/res/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"stream": false, "messages": [{"role": "user", "content": "What is the second highest mountain?"}]}' \
-H "x-subscription-token: <YOUR_BRAVE_SEARCH_API_KEY>"Single vs Multiple Searches
The decision between single-search and multi-search significantly influences both cost efficiency and response time.
Single Search (Default)
- Speed: Answers typically stream in under 4.5 seconds on average
- Cost: Lower cost with minimal computational overhead
- Use Case: Ideal for real-time applications and most queries
- Performance: Median SimpleQA benchmark question answered with single search
Multiple Searches (Research Mode)
- Thoroughness: Model iteratively refines strategy with sequential searches
- Cost: Higher due to multiple API calls and larger context processing
- Time: Response times can extend to minutes
- Use Case: Best for background tasks prioritizing thoroughness over speed
Enable research mode by adding enable_research: true:
completions = client.chat.completions.create(
messages=[{"role": "user", "content": "History of quantum mechanics"}],
model="brave",
stream=True,
extra_body={
"enable_research": True,
}
)Performance note: On the SimpleQA benchmark, p99 questions required 53 queries analyzing 1000 pages over ~300 seconds. However, reasonable limits are in place based on real-world use cases.
Advanced Parameters
When using the OpenAI SDK, pass additional parameters via extra_body:
completions = client.chat.completions.create(
messages=[{"role": "user", "content": "History of Rome"}],
model="brave",
stream=True,
extra_body={
"country": "IT",
"language": "it",
"enable_entities": True,
"enable_citations": True,
"enable_research": False,
}
)All advanced parameters: entities, citations and research mode require streaming
mode to be true.
Available Parameters
- country (string): Target country for search results (default:
us) - language (string): Response language (default:
en) - enable_entities (bool): Include entity information in responses (default:
false) - enable_citations (bool): Include inline citations (default:
false) - enable_research (bool): Enable multi-search research mode (default:
false)
Response Format
Because AI Grounding uses custom messages with richer data than standard OpenAI responses, messages are stringified with special tags. When streaming, you’ll receive:
Standard Text
Regular answer content streamed as text.
Citations
<citation>{"start_index": 0, "end_index": 10, "number": 1, "url": "https://...", "favicon": "...", "snippet": "..."}</citation>Entity Items
<enum_item>{"uuid": "...", "name": "...", "href": "...", "original_tokens": "...", "citations": [...]}</enum_item>Usage Metadata
<usage>{ "X-Request-Requests": 1, "X-Request-Queries": 2, "X-Request-Tokens-In": 1234, "X-Request-Tokens-Out": 300, "X-Request-Requests-Cost": 0, "X-Request-Queries-Cost": 0.008, "X-Request-Tokens-In-Cost": 0.00617, "X-Request-Tokens-Out-Cost": 0.0015, "X-Request-Total-Cost": 0.01567 }</usage>Complete Streaming Example
Here’s a full example that handles all message types:
#!/usr/bin/env python
import asyncio
import json
from openai import AsyncOpenAI
client = AsyncOpenAI(
api_key="YOUR_BRAVE_SEARCH_API_KEY",
base_url="https://api.search.brave.com/res/v1",
)
async def main():
async for data in await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "albums from lady gaga",
}
],
model="brave",
stream=True,
extra_body={
"country": "us",
"language": "en",
"enable_citations": True,
"enable_research": False,
},
):
if choices := data.choices:
if delta := choices[0].delta.content:
if delta.startswith("<citation>") and delta.endswith("</citation>"):
# Parse citation
citation = json.loads(
delta.removeprefix("<citation>").removesuffix("</citation>")
)
print(f"[{citation['number']}]({citation['url']})", end="", flush=True)
elif delta.startswith("<enum_item>") and delta.endswith("</enum_item>"):
# Parse entity item
item = json.loads(
delta.removeprefix("<enum_item>").removesuffix("</enum_item>")
)
print("*", item["original_tokens"], end="", flush=True)
elif delta.startswith("<usage>") and delta.endswith("</usage>"):
# Parse usage metadata
usage = json.loads(
delta.removeprefix("<usage>").removesuffix("</usage>")
)
print("\n\nUsage:", usage)
else:
# Regular text content
print(delta, end="", flush=True)
if __name__ == "__main__":
asyncio.run(main())Pricing & Spending Limits
AI Grounding uses a usage-based pricing model:
Cost Calculation:
cost = (searches × $4/1000) + (input_tokens × $5/1000000) + (output_tokens × $5/1000000)Example:
- 2 searches
- 1,234 input tokens
- 300 output tokens
Cost = 2 × (4/1000) + (5/1000000) × 1234 + (5/1000000) × 300
= $0.01567Usage Metadata
With each answer, you’ll receive metadata on resource usage:
{
"X-Request-Requests": 1,
"X-Request-Queries": 2,
"X-Request-Tokens-In": 1234,
"X-Request-Tokens-Out": 300,
"X-Request-Requests-Cost": 0,
"X-Request-Queries-Cost": 0.008,
"X-Request-Tokens-In-Cost": 0.00617,
"X-Request-Tokens-Out-Cost": 0.0015,
"X-Request-Total-Cost": 0.01567
}When streaming, this metadata comes as the last message. For synchronous requests, the keys above are included in response headers.
Setting Limits
Control your spending by setting monthly credit limits in your account.
Limit behavior: Limits are checked before answering. If limits aren’t exceeded when a question starts, it will be answered in full even if it exceeds limits during processing. You’ll only be charged up to your imposed limit.
Rate Limits
- Default: 2 requests per second
- Need more? Contact searchapi-support@brave.com
AI Grounding vs Summarizer Search
Brave offers two complementary approaches for AI-powered search:
AI Grounding
Direct AI answers using OpenAI-compatible endpoint. Best for building chat interfaces and applications that need instant, grounded AI responses.
Summarizer Search
Two-step workflow that first retrieves search results, then generates summaries. Best when you need control over search results or want to use specialized summarizer endpoints.
When to use AI Grounding:
- Building conversational AI applications
- Need OpenAI SDK compatibility
- Want simple, single-endpoint integration
- Require research mode for exhaustive answers
When to use Summarizer Search:
- Need access to underlying search results
- Want to use specialized endpoints (title, enrichments, followups, etc.)
- Building applications with custom search result processing
- Prefer the traditional web search + summarization flow
Learn more about Summarizer Search.
Best Practices
Message Handling
- Always handle special message tags (
<citation>,<enum_item>,<usage>) - Parse JSON content within tags to extract structured data
- Display citations inline for better user trust
Streaming
- Use
AsyncOpenAIfor streaming responses - Display content progressively for better UX
- Handle usage metadata at the end of the stream
Research Mode
- Enable only when thoroughness is more important than speed
- Best for background processing or complex research queries
- Monitor usage as it can incur higher costs
Error Handling
- Implement retry logic for transient failures
- Check spending limits before critical operations
- Handle rate limit errors gracefully
Performance
- Use single-search mode (default) for most queries
- Cache responses when appropriate to minimize API calls
- Monitor usage metadata to optimize costs
Changelog
This changelog outlines all significant changes to the Brave AI Grounding API in chronological order.
2025-08-05
- Launch Brave AI Grounding API resource
- OpenAI SDK compatibility
- Support for single and multi-search modes
- SOTA performance on SimpleQA benchmark