Models
Compare Models
Master the art of AI detection. Discover what makes each model unique.
Powered by Firecrawl and
OpenAI
Here is a list of all the models available on the AI Gateway that can be used in the game.
- Qwen3-14B alibaba/qwen-3-14b Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
- Qwen3 235B A22B Instruct 2507 alibaba/qwen-3-235b Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
- Qwen3-30B-A3B alibaba/qwen-3-30b Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
- Qwen 3.32B alibaba/qwen-3-32b Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
- Qwen3 Coder 480B A35B Instruct alibaba/qwen3-coder Qwen3-Coder-480B-A35B-Instruct is Qwen's most agentic code model, featuring significant performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.
- Qwen 3 Coder 30B A3B Instruct alibaba/qwen3-coder-30b-a3b Efficient coding specialist balancing performance with cost-effectiveness for daily development tasks while maintaining strong tool integration capabilities.
- Qwen3 Coder Plus alibaba/qwen3-coder-plus Powered by Qwen3, this is a powerful Coding Agent that excels in tool calling and environment interaction to achieve autonomous programming. It combines outstanding coding proficiency with versatile general-purpose abilities.
- Qwen3 Max alibaba/qwen3-max The Qwen 3 series Max model has undergone specialized upgrades in agent programming and tool invocation compared to the preview version. The officially released model this time has achieved state-of-the-art (SOTA) performance in its field and is better suited to meet the demands of agents operating in more complex scenarios.
- Qwen3 Max Preview alibaba/qwen3-max-preview Qwen3-Max-Preview shows substantial gains over the 2.5 series in overall capability, with significant enhancements in Chinese-English text understanding, complex instruction following, handling of subjective open-ended tasks, multilingual ability, and tool invocation; model knowledge hallucinations are reduced.
- Qwen3 Next 80B A3B Instruct alibaba/qwen3-next-80b-a3b-instruct Qwen3-Next uses a highly sparse MoE design: 80B total parameters, but only ~3B activated per inference step. Experiments show that, with global load balancing, increasing total expert parameters while keeping activated experts fixed steadily reduces training loss.Compared to Qwen3’s MoE (128 total experts, 8 routed), Qwen3-Next expands to 512 total experts, combining 10 routed experts + 1 shared expert — maximizing resource usage without hurting performance. The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507, and shows clear advantages in tasks requiring ultra-long context (up to 256K tokens).
- Qwen3 Next 80B A3B Thinking alibaba/qwen3-next-80b-a3b-thinking Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We call this next-generation foundation models Qwen3-Next.
- Qwen3 VL 235B A22B Instruct alibaba/qwen3-vl-instruct The Qwen3 series VL models has been comprehensively upgraded in areas such as visual coding and spatial perception. Its visual perception and recognition capabilities have significantly improved, supporting the understanding of ultra-long videos, and its OCR functionality has undergone a major enhancement.
- Qwen3 VL 235B A22B Thinking alibaba/qwen3-vl-thinking Qwen3 series VL models feature significantly enhanced multimodal reasoning capabilities, with a particular focus on optimizing the model for STEM and mathematical reasoning. Visual perception and recognition abilities have been comprehensively improved, and OCR capabilities have undergone a major upgrade.
- Nova Lite amazon/nova-lite A very low cost multimodal model that is lightning fast for processing image, video, and text inputs.
- Nova Micro amazon/nova-micro A text-only model that delivers the lowest latency responses at very low cost.
- Nova Pro amazon/nova-pro A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.
- Titan Text Embeddings V2 amazon/titan-embed-text-v2 Amazon Titan Text Embeddings V2 is a light weight, efficient multilingual embedding model supporting 1024, 512, and 256 dimensions.
- Claude 3 Haiku anthropic/claude-3-haiku Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts. Haiku to quickly analyze large volumes of documents, such as quarterly filings, contracts, or legal cases, for half the cost of other models in its performance tier.
- Claude 3 Opus anthropic/claude-3-opus Claude 3 Opus is Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what's possible with generative AI.
- Claude 3.5 Haiku anthropic/claude-3.5-haiku Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks.
- Claude 3.5 Sonnet anthropic/claude-3.5-sonnet Claude 3.5 Sonnet strikes the ideal balance between intelligence and speed—particularly for enterprise workloads. It delivers strong performance at a lower cost compared to its peers, and is engineered for high endurance in large-scale AI deployments.
- Claude 3.7 Sonnet anthropic/claude-3.7-sonnet Claude 3.7 Sonnet is the first hybrid reasoning model and Anthropic's most intelligent model to date. It delivers state-of-the-art performance for coding, content generation, data analysis, and planning tasks, building upon its predecessor Claude 3.5 Sonnet's capabilities in software engineering and computer use.
- Claude Opus 4 anthropic/claude-opus-4 Claude Opus 4 is Anthropic's most powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours—dramatically outperforming all Sonnet models and significantly expanding what AI agents can accomplish.
- Claude Opus 4.1 anthropic/claude-opus-4.1 Claude Opus 4.1 is a drop-in replacement for Opus 4 that delivers superior performance and precision for real-world coding and agentic tasks. Opus 4.1 advances state-of-the-art coding performance to 74.5% on SWE-bench Verified, and handles complex, multi-step problems with more rigor and attention to detail.
- Claude Sonnet 4 anthropic/claude-sonnet-4 Claude Sonnet 4 significantly improves on Sonnet 3.7's industry-leading capabilities, excelling in coding with a state-of-the-art 72.7% on SWE-bench. The model balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.
- Claude Sonnet 4.5 anthropic/claude-sonnet-4.5 Claude Sonnet 4.5 is the newest model in the Sonnet series, offering improvements and updates over Sonnet 4.
- Command A cohere/command-a Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.
- Command R cohere/command-r Command R is a large language model optimized for conversational interaction and long context tasks. It targets the "scalable" category of models that balance high performance with strong accuracy, enabling companies to move beyond proof of concept and into production.
- Command R+ cohere/command-r-plus Command R+ is Cohere's newest large language model, optimized for conversational interaction and long-context tasks. It aims at being extremely performant, enabling companies to move beyond proof of concept and into production.
- Embed v4.0 cohere/embed-v4.0 A model that allows for text, images, or mixed content to be classified or turned into embeddings.
- DeepSeek-R1 deepseek/deepseek-r1 The latest revision of DeepSeek's first-generation reasoning model
- DeepSeek R1 Distill Llama 70B deepseek/deepseek-r1-distill-llama-70b DeepSeek-R1-Distill-Llama-70B is a distilled, more efficient variant of the 70B Llama model. It preserves strong performance across text-generation tasks, reducing computational overhead for easier deployment and research. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.
- DeepSeek V3 0324 deepseek/deepseek-v3 Fast general-purpose LLM with enhanced reasoning capabilities
- DeepSeek V3.1 deepseek/deepseek-v3.1 DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. DeepSeek has expanded their dataset by collecting additional long documents and substantially extending both training phases.
- DeepSeek V3.1 Base deepseek/deepseek-v3.1-base DeepSeek V3.1 Base is an improved version of the DeepSeek V3 model.
- DeepSeek V3.1 Terminus deepseek/deepseek-v3.1-terminus DeepSeek-V3.1-Terminus delivers more stable & reliable outputs across benchmarks compared to the previous version and addresses user feedback (i.e. language consistency and agent upgrades).
- DeepSeek V3.2 Exp deepseek/deepseek-v3.2-exp DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality.
- DeepSeek V3.2 Exp Thinking deepseek/deepseek-v3.2-exp-thinking DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality.
- Gemini 2.0 Flash google/gemini-2.0-flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
- Gemini 2.0 Flash Lite google/gemini-2.0-flash-lite Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
- Gemini 2.5 Flash google/gemini-2.5-flash Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
- Gemini 2.5 Flash Image Preview (Code name: Nano Banana) google/gemini-2.5-flash-image-preview Gemini 2.5 Flash Image Preview is our first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
- Gemini 2.5 Flash Lite google/gemini-2.5-flash-lite Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
- Gemini 2.5 Flash Lite Preview 09-2025 google/gemini-2.5-flash-lite-preview-09-2025 Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
- Gemini 2.5 Flash Preview 09-2025 google/gemini-2.5-flash-preview-09-2025 Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
- Gemini 2.5 Pro google/gemini-2.5-pro Gemini 2.5 Pro is our most advanced reasoning Gemini model, capable of solving complex problems. It features a 2M token context window and supports multimodal inputs including text, images, audio, video, and PDF documents.
- Gemini Embedding 001 google/gemini-embedding-001 State-of-the-art embedding model with excellent performance across English, multilingual and code tasks.
- Gemma 2 9B IT google/gemma-2-9b 9 billion parameter open source model by Google fine-tuned for chat purposes. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.
- Text Embedding 005 google/text-embedding-005 English-focused text embedding model optimized for code and English language tasks.
- Text Multilingual Embedding 002 google/text-multilingual-embedding-002 Multilingual text embedding model optimized for cross-lingual tasks across many languages.
- Mercury Coder Small Beta inception/mercury-coder-small Mercury Coder Small is ideal for code generation, debugging, and refactoring tasks with minimal latency.
- LongCat Flash Chat meituan/longcat-flash-chat LongCat-Flash-Chat is a high-throughput MoE chat model (128k context) optimized for agentic tasks.
- LongCat Flash Thinking meituan/longcat-flash-thinking LongCat-Flash-Thinking is a high-throughput MoE reasoning model (128k context) optimized for agentic tasks.
- Llama 3 70B Instruct meta/llama-3-70b Llama is a 70 billion parameter open source model by Meta fine-tuned for instruction following purposes. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.
- Llama 3 8B Instruct meta/llama-3-8b Llama is a 8 billion parameter open source model by Meta fine-tuned for instruction following purposes. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.
- Llama 3.1 70B Instruct meta/llama-3.1-70b An update to Meta Llama 3 70B Instruct that includes an expanded 128K context length, multilinguality and improved reasoning capabilities.
- Llama 3.1 8B Instruct meta/llama-3.1-8b Llama 3.1 8B with 128K context window support, making it ideal for real-time conversational interfaces and data analysis while offering significant cost savings compared to larger models. Served by Groq with their custom Language Processing Units (LPUs) hardware to provide fast and efficient inference.
- Llama 3.2 11B Vision Instruct meta/llama-3.2-11b Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
- Llama 3.2 1B Instruct meta/llama-3.2-1b Text-only model, supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
- Llama 3.2 3B Instruct meta/llama-3.2-3b Text-only model, fine-tuned for supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
- Llama 3.2 90B Vision Instruct meta/llama-3.2-90b Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
- Llama 3.3 70B Instruct meta/llama-3.3-70b Where performance meets efficiency. This model supports high-performance conversational AI designed for content creation, enterprise applications, and research, offering advanced language understanding capabilities, including text summarization, classification, sentiment analysis, and code generation.
- Llama 4 Maverick 17B 128E Instruct meta/llama-4-maverick The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts. Served by DeepInfra.
- Llama 4 Scout 17B 16E Instruct meta/llama-4-scout The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts. Served by DeepInfra.
- Mistral Codestral 25.01 mistral/codestral Mistral Codestral 25.01 is a state-of-the-art coding model optimized for low-latency, high-frequency use cases. Proficient in over 80 programming languages, it excels at tasks like fill-in-the-middle (FIM), code correction, and test generation.
- Codestral Embed mistral/codestral-embed Code embedding model that can embed code databases and repositories to power coding assistants.
- Devstral Small mistral/devstral-small Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents.
- Magistral Medium 2509 mistral/magistral-medium Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
- Magistral Medium 2506 mistral/magistral-medium-2506 Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
- Magistral Small 2509 mistral/magistral-small Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
- Magistral Small 2506 mistral/magistral-small-2506 Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
- Ministral 3B mistral/ministral-3b A compact, efficient model for on-device tasks like smart assistants and local analytics, offering low-latency performance.
- Ministral 8B mistral/ministral-8b A more powerful model with faster, memory-efficient inference, ideal for complex workflows and demanding edge applications.
- Mistral Embed mistral/mistral-embed General-purpose text embedding model for semantic search, similarity, clustering, and RAG workflows.
- Mistral Large mistral/mistral-large Mistral Large is ideal for complex tasks that require large reasoning capabilities or are highly specialized - like Synthetic Text Generation, Code Generation, RAG, or Agents.
- Mistral Medium 3.1 mistral/mistral-medium Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost.
- Mistral Small mistral/mistral-small Mistral Small is the ideal choice for simple tasks that one can do in bulk - like Classification, Customer Support, or Text Generation. It offers excellent performance at an affordable price point.
- Mixtral MoE 8x22B Instruct mistral/mixtral-8x22b-instruct 8x22b Instruct model. 8x22b is mixture-of-experts open source model by Mistral served by Fireworks.
- Pixtral 12B 2409 mistral/pixtral-12b A 12B model with image understanding capabilities in addition to text.
- Pixtral Large mistral/pixtral-large Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.
- Kimi K2 moonshotai/kimi-k2 Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks.
- Kimi K2 0905 moonshotai/kimi-k2-0905 Kimi K2 0905 has shown strong performance on agentic tasks thanks to its tool calling, reasoning abilities, and long context handling. But as a large parameter model (1T parameters), it’s also resource-intensive. Running it in production requires a highly optimized inference stack to avoid excessive latency.
- Kimi K2 Turbo moonshotai/kimi-k2-turbo Kimi K2 Turbo is the high-speed version of kimi-k2, with the same model parameters as kimi-k2, but the output speed is increased to 60 tokens per second, with a maximum of 100 tokens per second, the context length is 256k
- Morph V3 Fast morph/morph-v3-fast Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 4500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
- Morph V3 Large morph/morph-v3-large Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 2500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
- GPT-3.5 Turbo openai/gpt-3.5-turbo OpenAI's most capable and cost effective model in the GPT-3.5 family optimized for chat purposes, but also works well for traditional completions tasks.
- GPT-3.5 Turbo Instruct openai/gpt-3.5-turbo-instruct Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.
- GPT-4 Turbo openai/gpt-4-turbo gpt-4-turbo from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It has a knowledge cutoff of April 2023 and a 128,000 token context window.
- GPT-4.1 openai/gpt-4.1 GPT 4.1 is OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.
- GPT-4.1 mini openai/gpt-4.1-mini GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.
- GPT-4.1 nano openai/gpt-4.1-nano GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model.
- GPT-4o openai/gpt-4o GPT-4o from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It matches GPT-4 Turbo performance with a faster and cheaper API.
- GPT-4o mini openai/gpt-4o-mini GPT-4o mini from OpenAI is their most advanced and cost-efficient small model. It is multi-modal (accepting text or image inputs and outputting text) and has higher intelligence than gpt-3.5-turbo but is just as fast.
- GPT-5 openai/gpt-5 GPT-5 is OpenAI's flagship language model that excels at complex reasoning, broad real-world knowledge, code-intensive, and multi-step agentic tasks.
- GPT-5-Codex openai/gpt-5-codex GPT-5-Codex is a version of GPT-5 optimized for agentic coding tasks in Codex or similar environments.
- GPT-5 mini openai/gpt-5-mini GPT-5 mini is a cost optimized model that excels at reasoning/chat tasks. It offers an optimal balance between speed, cost, and capability.
- GPT-5 nano openai/gpt-5-nano GPT-5 nano is a high throughput model that excels at simple instruction or classification tasks.
- gpt-oss-120b openai/gpt-oss-120b Extremely capable general-purpose LLM with strong, controllable reasoning capabilities
- gpt-oss-20b openai/gpt-oss-20b A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments
- o1 openai/o1 o1 is OpenAI's flagship reasoning model, designed for complex problems that require deep thinking. It provides strong reasoning capabilities with improved accuracy for complex multi-step tasks.
- o3 openai/o3 OpenAI's o3 is their most powerful reasoning model, setting new state-of-the-art benchmarks in coding, math, science, and visual perception. It excels at complex queries requiring multi-faceted analysis, with particular strength in analyzing images, charts, and graphics.
- o3-mini openai/o3-mini o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini.
- o4-mini openai/o4-mini OpenAI's o4-mini delivers fast, cost-efficient reasoning with exceptional performance for its size, particularly excelling in math (best-performing on AIME benchmarks), coding, and visual tasks.
- text-embedding-3-large openai/text-embedding-3-large OpenAI's most capable embedding model for both english and non-english tasks.
- text-embedding-3-small openai/text-embedding-3-small OpenAI's improved, more performant version of their ada embedding model.
- text-embedding-ada-002 openai/text-embedding-ada-002 OpenAI's legacy text embedding model.
- Sonar perplexity/sonar Perplexity's lightweight offering with search grounding, quicker and cheaper than Sonar Pro.
- Sonar Pro perplexity/sonar-pro Perplexity's premier offering with search grounding, supporting advanced queries and follow-ups.
- Sonar Reasoning perplexity/sonar-reasoning A reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing detailed explanations with search grounding.
- Sonar Reasoning Pro perplexity/sonar-reasoning-pro A premium reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing comprehensive explanations with enhanced search capabilities and multiple search queries per request.
- Sonoma Dusk Alpha stealth/sonoma-dusk-alpha This model is no longer in stealth and gets responses from Grok 4 Fast Non-Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
- Sonoma Sky Alpha stealth/sonoma-sky-alpha This model is no longer in stealth and gets responses from Grok 4 Fast Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
- v0-1.0-md vercel/v0-1.0-md Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
- v0-1.5-md vercel/v0-1.5-md Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
- voyage-3-large voyage/voyage-3-large Voyage AI's embedding model with the best general-purpose and multilingual retrieval quality.
- voyage-3.5 voyage/voyage-3.5 Voyage AI's embedding model optimized for general-purpose and multilingual retrieval quality.
- voyage-3.5-lite voyage/voyage-3.5-lite Voyage AI's embedding model optimized for latency and cost.
- voyage-code-2 voyage/voyage-code-2 Voyage AI's embedding model optimized for code retrieval (17% better than alternatives). This is the previous generation of code embeddings models.
- voyage-code-3 voyage/voyage-code-3 Voyage AI's embedding model optimized for code retrieval.
- voyage-finance-2 voyage/voyage-finance-2 Voyage AI's embedding model optimized for finance retrieval and RAG.
- voyage-law-2 voyage/voyage-law-2 Voyage AI's embedding model optimized for legal retrieval and RAG.
- Grok 2 xai/grok-2 Grok 2 is a frontier language model with state-of-the-art reasoning capabilities. It features advanced capabilities in chat, coding, and reasoning, outperforming both Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard.
- Grok 2 Vision xai/grok-2-vision Grok 2 vision model excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA). It can process a wide variety of visual information including documents, diagrams, charts, screenshots, and photographs.
- Grok 3 Beta xai/grok-3 xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.
- Grok 3 Fast Beta xai/grok-3-fast xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
- Grok 3 Mini Beta xai/grok-3-mini xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible.
- Grok 3 Mini Fast Beta xai/grok-3-mini-fast xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
- Grok 4 xai/grok-4 xAI's latest and greatest flagship model, offering unparalleled performance in natural language, math and reasoning - the perfect jack of all trades.
- Grok 4 Fast Non-Reasoning xai/grok-4-fast-non-reasoning Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
- Grok 4 Fast Reasoning xai/grok-4-fast-reasoning Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
- Grok Code Fast 1 xai/grok-code-fast-1 xAI's latest coding model that offers fast agentic coding with a 256K context window.
- GLM-4.5 zai/glm-4.5 GLM-4.5 Series Models are foundation models specifically engineered for intelligent agents. The flagship GLM-4.5 integrates 355 billion total parameters (32 billion active), unifying reasoning, coding, and agent capabilities to address complex application demands. As a hybrid reasoning system, it offers dual operational modes.
- GLM 4.5 Air zai/glm-4.5-air GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
- GLM 4.5V zai/glm-4.5v Built on the GLM-4.5-Air base model, GLM-4.5V inherits proven techniques from GLM-4.1V-Thinking while achieving effective scaling through a powerful 106B-parameter MoE architecture.
- GLM 4.6 zai/glm-4.6 As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.