Ollama has increased the number of local runs of powerful language models dramatically, and it is now supported by a large and ever-growing set of models, which can be divided into four main categories:
- Source Models (Base Models): The Foundation of Ollama AI
- Fine-Tuned Models: Specialized AI Solutions
- Embedding Models: Powering Smart Search and Recommendations
- Multimodal Models: : Integrating Text, Images, and More
The difference between these groups lies in the various functionalities that the ecosystem offers. These enable the solution of multiple problems – from code generation and document search, to visual reasoning and interactive dialogue. Additionally, the latest Ollama engine offers advanced memory management, streaming tool calls, and support for quantized deployments, enabling more performance-conscious model selection than ever before.
1. Source Models (Base Models)
Source models are foundational LLMs trained on large-scale datasets without task-specific fine-tuning. They form the basis for most other models and are capable of understanding and generating natural language in a wide range of general-purpose scenarios.
- Predicting text continuations
- Answering open-ended questions
- Summarizing content
- Generating structured or unstructured text
Many source models now utilize Mixture of Experts (MoE) architectures, which offer higher efficiency and accuracy. Support for long context windows (up to 256k tokens) in models like LLaMA 3.3 and DeepSeek-R1.
- LLaMA 3.3 / 4 Scout: High-parameter MoE models that offer multimodal support, long context, and fine-grained reasoning.
- Phi-4: A compact, high-efficiency language model designed for edge and CPU-only devices.
- Mistral-7B-Instruct: Versatile and lightweight, ideal for laptops and real-time interaction.
2. Fine-Tuned Models
Fine-tuned models are personalized offspring of base models that have been retrained with task-specific or domain-specific data. They are capable of delivering the best performance on focused applications, such as instruction following, code generation, or conversational AI.
- Instruction tuning
- Code completion
- Chatbot optimization
- Domain-specific reasoning
Support for streaming tool responses, allowing real-time output from fine-tuned agents. Integration with "thinking mode", enabling step-by-step explanations during complex reasoning tasks.
- WizardLM-2 8B: A refined conversational agent with advanced instruction-following and logic reasoning.
- DeepSeek-Coder 33B: Fine-tuned for multi-language code generation and debugging with chain-of-thought capabilities.
- StableCode-Completion-Alpha-3B: Specialized for completing partial code, suitable for IDE integration.
3. Embedding Models
Embedding models transform text into vector representations (embeddings) that reflect semantic relationships between ideas. Such vectors are crucial for search, classification, clustering, and recommendation systems.
- Semantic search
- Document similarity analysis
- Text clustering
- Question-document matching
Ollama now includes optimized embedding pipelines with faster encoding and better memory estimates. Embedding models support low-resource inference, enabling semantic processing on local devices.
- Ollama-e-7B: A general-purpose embedding model supporting large-scale text similarity and search applications.
- all-MiniLM-L6-v2 (Sentence Transformers): A lightweight, efficient model producing sentence-level embeddings.
4. Multimodal Models
Multimodal models process and integrate inputs from multiple data types, such as text and images, within a single architecture. These models enable sophisticated reasoning over visual and linguistic information.
- Visual question answering (VQA)
- Document understanding (OCR + context)
- Image captioning and interpretation
- Cross-modal retrieval
Ollama now supports native multimodal inference, including models with image-to-text, text-to-image reasoning, and multi-image comparison. Many models support streamed outputs, improving usability in real-time applications.
- LLaVA 1.5 / 1.6: A general-purpose multimodal model for visual understanding and VQA.
- Qwen-VL 2.5: Capable of document OCR, layout analysis, translation, and visual reasoning.
- Gemma 3 (Multimodal): Accepts multiple images as input and performs visual comparisons and context linking.
- Moondream 2 / 1.8B: Lightweight visual models suitable for running on CPU and mobile environments.