With Model Router, you can access the most popular models with a single endpoint and bill. Experiment with new models and scale your app without worrying about the underlying infrastructure.

Setup

Getting started with Model Router is simple. Generate an API key and drop it into your favorite framework.

Generate API key

API keys for Model Router are generated within your workspace. Generate a key by logging into the console and navigating to the Model Router page.

Connect via framework

Model Router integrates easily into the most popular frameworks.

Model Router is a drop-in replacement for OpenAI’s API.

import openai

# Configure with your Hypermode API key and Hypermode Model Router base url
client = openai.OpenAI(
    api_key="<HYPERMODE_API_KEY>",
    base_url="https://models.hypermode.host/v1",
)

# Set up the request
response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Dgraph?"},
    ],
    max_tokens=150,
    temperature=0.7,
)

# Print the response
print(response.choices[0].message.content)

Connect directly via API

You can also access the API directly.

import requests
import json

# Your Hypermode API key
api_key = "<HYPERMODE_API_KEY>"

# Use the Hypermode Model Router base url
base_url = "https://models.hypermode.host"

# API endpoint
endpoint = f"{base_url}/v1/chat/completions"

# Headers
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}

# Request payload
payload = {
    "model": "meta-llama/llama-4-scout-17b-16e-instruct",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Dgraph?"},
    ],
    "max_tokens": 150,
    "temperature": 0.7,
}

# Make the API request
response = requests.post(endpoint, headers=headers, data=json.dumps(payload))

# Check if the request was successful
if response.status_code == 200:
    # Parse and print the response
    response_data = response.json()
    print(response_data["choices"][0]["message"]["content"])
else:
    # Print error information
    print(f"Error: {response.status_code}")
    print(response.text)

Available models

Hypermode provides a variety of the most popular open source and commercial models.

We’re constantly evaluating model usage in determining new models to add to our catalog. Interested in using a model not listed here? Let us know at help@hypermode.com.

Generation

Large language models provide text generation and reasoning capabilities.

ProviderModelSlugModel Unit Multiplier
MetaLlama 4 Scoutmeta-llama/llama-4-scout-17b-16e-instruct1
MetaLlama 3.2meta-llama/llama-3.2-3b-instruct1
DeepSeekDeepSeek-R1-Distill-Llamadeepseek-ai/deepseek-r1-distill-llama-8b1
AnthropicClaude 3.7 Sonnetclaude-3-7-sonnet-202502191
AnthropicClaude 3.5 Sonnetclaude-3-5-sonnet-202410221
AnthropicClaude 3.5 Haikuclaude-3-5-haiku-202410221
AnthropicClaude 3.5 Sonnetclaude-3-5-sonnet-202406201
AnthropicClaude 3.5 Haikuclaude-3-5-haiku-202403071
AnthropicClaude 3.5 Opusclaude-3-5-opus-202402291
AnthropicClaude 3.5 Sonnetclaude-3-5-sonnet-202402291
AnthropicClaude 2.1claude-2.11
AnthropicClaude 2.0claude-2.01
OpenAIo1o11
OpenAIo1o1-2024-12-171
OpenAIo1-minio1-mini1
OpenAIo1-mini-2024-09-12o1-mini-2024-09-121
OpenAIo1-previewo1-preview1
OpenAIo1-preview-2024-09-12o1-preview-2024-09-121
OpenAIo3-minio3-mini1
OpenAIo3-mini-2025-01-31o3-mini-2025-01-311
OpenAIgpt-4.1gpt-4.11
OpenAIgpt-4.1-2025-04-14gpt-4.1-2025-04-141
OpenAIgpt-4.1-minigpt-4.1-mini1
OpenAIgpt-4.1-mini-2025-04-14gpt-4.1-mini-2025-04-141
OpenAIgpt-4.1-nanogpt-4.1-nano1
OpenAIgpt-4.1-nano-2025-04-14gpt-4.1-nano-2025-04-141
OpenAIgpt-4ogpt-4o1
OpenAIgpt-4o-2024-05-13gpt-4o-2024-05-131
OpenAIgpt-4o-2024-08-06gpt-4o-2024-08-061
OpenAIgpt-4o-2024-11-20gpt-4o-2024-11-201
OpenAIgpt-4o-audio-previewgpt-4o-audio-preview1
OpenAIgpt-4o-audio-preview-2024-10-01gpt-4o-audio-preview-2024-10-011
OpenAIgpt-4o-audio-preview-2024-12-17gpt-4o-audio-preview-2024-12-171
OpenAIgpt-4o-search-previewgpt-4o-search-preview1
OpenAIgpt-4o-search-preview-2025-03-11gpt-4o-search-preview-2025-03-111
OpenAIgpt-4o-minigpt-4o-mini1
OpenAIgpt-4o-mini-2024-07-18gpt-4o-mini-2024-07-181
OpenAIgpt-4-turbogpt-4-turbo1
OpenAIgpt-4-turbo-2024-04-09gpt-4-turbo-2024-04-091
OpenAIgpt-4-turbo-previewgpt-4-turbo-preview1
OpenAIgpt-4-0125-previewgpt-4-0125-preview1
OpenAIgpt-4-1106-previewgpt-4-1106-preview1
OpenAIgpt-4gpt-41
OpenAIgpt-4-0613gpt-4-06131
OpenAIgpt-4.5-previewgpt-4.5-preview1
OpenAIgpt-4.5-preview-2025-02-27gpt-4.5-preview-2025-02-271
OpenAIgpt-3.5-turbo-0125gpt-3.5-turbo-01251
OpenAIgpt-3.5-turbogpt-3.5-turbo1
OpenAIgpt-3.5-turbo-1106gpt-3.5-turbo-11061
OpenAIgpt-3.5-turbogpt-3.5-turbo1
OpenAIgpt-3.5-turbo-1106gpt-3.5-turbo-11061
OpenAIchatgpt-4o-latestchatgpt-4o-latest1
Googlegemini-pro-visiongemini-pro-vision1
Googlegemini-1.5-pro-latestgemini-1.5-pro-latest1
Googlegemini-1.5-pro-001gemini-1.5-pro-0011
Googlegemini-1.5-pro-002gemini-1.5-pro-0021
Googlegemini-1.5-progemini-1.5-pro1
Googlegemini-1.5-flash-latestgemini-1.5-flash-latest1
Googlegemini-1.5-flash-001gemini-1.5-flash-0011
Googlegemini-1.5-flash-001-tuninggemini-1.5-flash-001-tuning1
Googlegemini-1.5-flashgemini-1.5-flash1
Googlegemini-1.5-flash-002gemini-1.5-flash-0021
Googlegemini-1.5-flash-8bgemini-1.5-flash-8b1
Googlegemini-1.5-flash-8b-001gemini-1.5-flash-8b-0011
Googlegemini-1.5-flash-8b-latestgemini-1.5-flash-8b-latest1
Googlegemini-1.5-flash-8b-exp-0827gemini-1.5-flash-8b-exp-08271
Googlegemini-1.5-flash-8b-exp-0924gemini-1.5-flash-8b-exp-09241
Googlegemini-2.5-pro-exp-03-25gemini-2.5-pro-exp-03-251
Googlegemini-2.5-pro-preview-03-25gemini-2.5-pro-preview-03-251
Googlegemini-2.5-flash-preview-04-17gemini-2.5-flash-preview-04-171
Googlegemini-2.0-flash-expgemini-2.0-flash-exp1
Googlegemini-2.0-flashgemini-2.0-flash1
Googlegemini-2.0-flash-001gemini-2.0-flash-0011
Googlegemini-2.0-flash-exp-image-generationgemini-2.0-flash-exp-image-generation1
Googlegemini-2.0-flash-lite-001gemini-2.0-flash-lite-0011
Googlegemini-2.0-flash-litegemini-2.0-flash-lite1
Googlegemini-2.0-flash-lite-preview-02-05gemini-2.0-flash-lite-preview-02-051
Googlegemini-2.0-flash-lite-previewgemini-2.0-flash-lite-preview1
Googlegemini-2.0-pro-expgemini-2.0-pro-exp1
Googlegemini-2.0-pro-exp-02-05gemini-2.0-pro-exp-02-051
Googlegemini-2.0-flash-thinking-exp-01-21gemini-2.0-flash-thinking-exp-01-211
Googlegemini-2.0-flash-thinking-expgemini-2.0-flash-thinking-exp1
Googlegemini-2.0-flash-thinking-exp-1219gemini-2.0-flash-thinking-exp-12191
Googlegemini-2.0-flash-live-001gemini-2.0-flash-live-0011

Embedding

Embedding models provide vector representations of text for similarity matching and other applications.

ProviderModelSlugModel Unit Multiplier
Hugging FaceMiniLM-L6-v2sentence-transformers/all-MiniLM-L6-v20.1
Hugging FaceMiniLM-L6-v2sentence-transformers/all-MiniLM-L6-v20.1

Logging

By default, all model invocations are logged for future display in the console. If you’d like to opt out of model logging, please contact us.

Billing

Hypermode Model Router simplifies how you pay for model consumption, billing by model compute time and type. Model Units are included with all paid plans and are calculated by taking the Model Unit Multiplier times the seconds of model runtime across all requests.