Brave Search

Service APIs

API for AI-generated answers backed by real-time web search and verifiable sources

Overview

Brave AI Answers API provides state-of-the-art AI-generated answers backed by verifiable sources from the web. This technology improves the accuracy, relevance, and trustworthiness of AI responses by grounding them in real-time search results. Under the hood, this same service powers Brave’s Ask Brave feature, which serves millions of answers every day.

Brave’s grounded answers demonstrate strong performance across a wide range of queries, from simple trivia questions to complex research inquiries. Notably, Brave achieves state-of-the-art (SOTA) performance on the SimpleQA benchmark without being specifically optimized for it—the performance emerges naturally from the system’s design.

Access to the API is available through the Answers plan. Subscribe to Answers to unlock these capabilities.

Key Features

Web-Grounded Answers

AI responses backed by real-time web search with verifiable citations

OpenAI SDK Compatible

Use the familiar OpenAI SDK for seamless integration

SOTA Performance

State-of-the-art results on SimpleQA benchmark

Streaming Support

Stream answers in real-time with progressive citations

Research Mode

Enable multi-search for thorough, research-grade answers

Rich Response Data

Get entities, citations, and structured data with answers

API Reference

Answers API Documentation

View the complete API reference, including parameters and response schemas

Use Cases

Answers is perfect for:

AI Assistants & Chatbots: Build intelligent conversational interfaces with factual, cited responses
Research Applications: Conduct thorough research with multi-search capabilities
Question Answering Systems: Provide accurate answers with source attribution
Knowledge Applications: Create tools that need up-to-date, verifiable information
Content Generation: Generate well-researched content with citations

Endpoint

Answers uses a single, OpenAI-compatible endpoint:

https://api.search.brave.com/res/v1/chat/completions

Quick Start

Basic Example with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_BRAVE_SEARCH_API_KEY",
    base_url="https://api.search.brave.com/res/v1",
)

completions = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What are the best things to do in Paris with kids?",
        }
    ],
    model="brave",
    stream=False,
)

print(completions.choices[0].message.content)

Streaming Example

For real-time responses, enable streaming with AsyncOpenAI:

from openai import AsyncOpenAI
import asyncio

client = AsyncOpenAI(
    api_key="YOUR_BRAVE_SEARCH_API_KEY",
    base_url="https://api.search.brave.com/res/v1",
)

async def stream_answer():
    async for chunk in await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Explain quantum computing",
            }
        ],
        model="brave",
        stream=True,
    ):
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(stream_answer())

Using cURL

While the OpenAI SDK is recommended, you can also use cURL:

curl -X POST "https://api.search.brave.com/res/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"stream": false, "messages": [{"role": "user", "content": "What is the second highest mountain?"}]}' \
  -H "x-subscription-token: <YOUR_BRAVE_SEARCH_API_KEY>"

Single vs Multiple Searches

The decision between single-search and multi-search significantly influences both cost efficiency and response time.

Single Search (Default)

Speed: Answers typically stream in under 4.5 seconds on average
Cost: Lower cost with minimal computational overhead
Use Case: Ideal for real-time applications and most queries
Performance: Median SimpleQA benchmark question answered with single search

Multiple Searches (Research Mode)

Thoroughness: Model iteratively refines strategy with sequential searches
Cost: Higher due to multiple API calls and larger context processing
Time: Response times can extend to minutes
Use Case: Best for background tasks prioritizing thoroughness over speed

Enable research mode by adding enable_research: true:

completions = client.chat.completions.create(
    messages=[{"role": "user", "content": "History of quantum mechanics"}],
    model="brave",
    stream=True,
    extra_body={
        "enable_research": True,
    }
)

Performance note: On the SimpleQA benchmark, p99 questions required 53 queries analyzing 1000 pages over ~300 seconds. However, reasonable limits are in place based on real-world use cases.

Advanced Parameters

When using the OpenAI SDK, pass additional parameters via extra_body:

completions = client.chat.completions.create(
    messages=[{"role": "user", "content": "History of Rome"}],
    model="brave",
    stream=True,
    extra_body={
        "country": "IT",
        "language": "it",
        "enable_entities": True,
        "enable_citations": True,
        "enable_research": False,
    }
)

All advanced parameters: entities, citations and research mode require streaming mode to be true.

Available Parameters

country (string): Target country for search results (default: us)
language (string): Response language (default: en)
enable_entities (bool): Include entity information in responses (default: false)
enable_citations (bool): Include inline citations (default: false)
enable_research (bool): Enable multi-search research mode (default: false)

Response Format

Because Answers uses custom messages with richer data than standard OpenAI responses, messages are stringified with special tags. When streaming, you’ll receive:

Standard Text

Regular answer content streamed as text.

Citations

<citation>{"start_index": 0, "end_index": 10, "number": 1, "url": "https://...", "favicon": "...", "snippet": "..."}</citation>

Entity Items

<enum_item>{"uuid": "...", "name": "...", "href": "...", "original_tokens": "...", "citations": [...]}</enum_item>

Usage Metadata

<usage>{ "X-Request-Requests": 1, "X-Request-Queries": 2, "X-Request-Tokens-In": 1234, "X-Request-Tokens-Out": 300, "X-Request-Requests-Cost": 0, "X-Request-Queries-Cost": 0.008, "X-Request-Tokens-In-Cost": 0.00617, "X-Request-Tokens-Out-Cost": 0.0015, "X-Request-Total-Cost": 0.01567 }</usage>

Complete Streaming Example

Here’s a full example that handles all message types:

#!/usr/bin/env python

import asyncio
import json
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="YOUR_BRAVE_SEARCH_API_KEY",
    base_url="https://api.search.brave.com/res/v1",
)

async def main():
    async for data in await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "albums from lady gaga",
            }
        ],
        model="brave",
        stream=True,
        extra_body={
            "country": "us",
            "language": "en",
            "enable_citations": True,
            "enable_research": False,
        },
    ):
        if choices := data.choices:
            if delta := choices[0].delta.content:
                if delta.startswith("<citation>") and delta.endswith("</citation>"):
                    # Parse citation
                    citation = json.loads(
                        delta.removeprefix("<citation>").removesuffix("</citation>")
                    )
                    print(f"[{citation['number']}]({citation['url']})", end="", flush=True)

                elif delta.startswith("<enum_item>") and delta.endswith("</enum_item>"):
                    # Parse entity item
                    item = json.loads(
                        delta.removeprefix("<enum_item>").removesuffix("</enum_item>")
                    )
                    print("*", item["original_tokens"], end="", flush=True)

                elif delta.startswith("<usage>") and delta.endswith("</usage>"):
                    # Parse usage metadata
                    usage = json.loads(
                        delta.removeprefix("<usage>").removesuffix("</usage>")
                    )
                    print("\n\nUsage:", usage)

                else:
                    # Regular text content
                    print(delta, end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())

Pricing & Spending Limits

Answers uses a usage-based pricing model:

Cost Calculation:

cost = (searches × $4/1000) + (input_tokens × $5/1000000) + (output_tokens × $5/1000000)

Example:

2 searches
1,234 input tokens
300 output tokens

Cost = 2 × (4/1000) + (5/1000000) × 1234 + (5/1000000) × 300
     = $0.01567

Usage Metadata

With each answer, you’ll receive metadata on resource usage:

{
  "X-Request-Requests": 1,
  "X-Request-Queries": 2,
  "X-Request-Tokens-In": 1234,
  "X-Request-Tokens-Out": 300,
  "X-Request-Requests-Cost": 0,
  "X-Request-Queries-Cost": 0.008,
  "X-Request-Tokens-In-Cost": 0.00617,
  "X-Request-Tokens-Out-Cost": 0.0015,
  "X-Request-Total-Cost": 0.01567
}

When streaming, this metadata comes as the last message. For synchronous requests, the keys above are included in response headers.

Setting Limits

Control your spending by setting monthly credit limits in your account.

Limit behavior: Limits are checked before answering. If limits aren’t exceeded when a question starts, it will be answered in full even if it exceeds limits during processing. You’ll only be charged up to your imposed limit.

Rate Limits

Default: 2 requests per second
Need more? Contact searchapi-support@brave.com

Answers vs Summarizer Search

Brave offers two complementary approaches for AI-powered search:

Answers

Direct AI answers using OpenAI-compatible endpoint. Best for building chat interfaces and applications that need instant, grounded AI responses.

Summarizer Search

Two-step workflow that first retrieves search results, then generates summaries. Best when you need control over search results or want to use specialized summarizer endpoints.

When to use Answers:

Building conversational AI applications
Need OpenAI SDK compatibility
Want simple, single-endpoint integration
Require research mode for exhaustive answers

When to use Summarizer Search:

Need access to underlying search results
Want to use specialized endpoints (title, enrichments, followups, etc.)
Building applications with custom search result processing
Prefer the traditional web search + summarization flow

Learn more about Summarizer Search.

Best Practices

Message Handling

Always handle special message tags (<citation>, <enum_item>, <usage>)
Parse JSON content within tags to extract structured data
Display citations inline for better user trust

Streaming

Use AsyncOpenAI for streaming responses
Display content progressively for better UX
Handle usage metadata at the end of the stream

Research Mode

Enable only when thoroughness is more important than speed
Best for background processing or complex research queries
Monitor usage as it can incur higher costs

Error Handling

Implement retry logic for transient failures
Check spending limits before critical operations
Handle rate limit errors gracefully

Performance

Use single-search mode (default) for most queries
Cache responses when appropriate to minimize API calls
Monitor usage metadata to optimize costs

Changelog

This changelog outlines all significant changes to the Brave Answers API in chronological order.

2025-08-05

Launch Brave Answers API resource
OpenAI SDK compatibility
Support for single and multi-search modes
SOTA performance on SimpleQA benchmark