How to Build AI Agents from Scratch Using Ollama and Agno

Q: What is Ollama, and how does it assist in running local LLMs?

Ollama is a command-line tool that makes it easy to run large language models like Mistral and LLaMA 3 on your local machine. It provides a local API for inference, allowing you to use these models without requiring cloud access or APIs.

Q: What is Agno in Python, and how is it different from LangChain?

Agno is a minimal Python framework for building AI agents. Unlike LangChain, it focuses on simplicity, allowing you to create reasoning loops, manage memory, and use tools without heavy abstractions or dependencies.

Q: How do I build an AI agent using Ollama and Agno?

You first install and run a local model with Ollama (e.g., ollama pull mistral), then use Agno to define the agent, tools, memory, and behavior in Python. This blog walks you through the complete setup.

Q: Which models can I use with Ollama?

Popular models include Mistral, LLaMA 3, TinyLlama, and others, which are available on the Ollama registry. Use ollama pull [model-name] to download them.

Q: Do I need a GPU to use Ollama?

No, a GPU is optional. Ollama supports CPU-based inference, although using a GPU will improve performance significantly.

Q: What tools can I integrate into my AI agent?

You can add tools like web search APIs, file operations, calculators, or custom Python functions. Agno allows agents to call these tools via natural language prompts.

Q: Why should I build local AI agents instead of using cloud APIs?

Local agents offer full privacy, no usage costs, and offline access. They also give you greater control over customization and debugging.

Q: How can I make my AI agent remember past conversations?

Agno includes built-in memory support. Enable memory=True when initializing your agent to retain context across messages.

Q: What if the model is too slow or crashes?

Try using a smaller model like TinyLlama, ensure your machine meets the minimum requirements (8–16 GB RAM), and close background apps to free up memory.

Q: Can I build multi-agent systems with Agno?

Yes. While this guide focuses on a single-agent setup, you can scale to multi-agent architectures and even integrate local vector databases and RAG pipelines.

# Python

9 Mins

Pravin Prajapati · 07 Aug 2025

How to Build AI Agents from Scratch Using Ollama and Agno

As artificial intelligence continues to advance, AI agents have emerged as powerful tools capable of reasoning, planning, and interacting with digital environments. Unlike traditional chatbots, agents can execute multi-step tasks, use external tools, and maintain memory — making them ideal for research, automation, and personal assistance.

With the rise of open-source language models and lightweight frameworks, it’s now possible to build these agents entirely on your local machine. This approach offers significant advantages: complete data privacy, zero API costs, offline availability, and complete control over customization.

You'll learn how to build a local AI agent using Ollama and Agno. Ollama simplifies running large language models, such as LLaMA 3 and Mistral, with a single command, providing a local API for inference. Agno is a minimal Python framework designed for building AI agents that can reason, use tools, and interact in loops, all without unnecessary complexity. By combining these two tools, you'll be able to create a functional AI agent that operates privately and efficiently on your device. Whether you're a developer, researcher, or enthusiast, this guide will help you get started with building intelligent systems from the ground up.

Prerequisites

Before you begin building AI agents with Ollama and Agno, ensure you meet the following prerequisites. These will help you set up your environment correctly and follow along with the implementation.

1. Basic Python Knowledge

You should have a working knowledge of Python, including:

Installing and importing packages
Writing functions and using classes
Understanding dictionaries, lists, and basic control flow

This guide will involve writing Python code to define agent behavior, integrate tools, and interact with the LLM via API.

2. System Requirements

You do not need a powerful machine to get started, but the following specs are recommended for smooth performance:

RAM: Minimum 8 GB (16 GB recommended)
CPU: Modern multi-core processor
GPU: Optional (useful for faster inference, but Ollama supports CPU mode)
Disk Space: At least 10–15 GB free (models like LLaMA 3 or Mistral can be 3–8 GB each)

Note: GPU acceleration is supported but not required. If you're using a GPU, ensure proper drivers are installed.

3. Required Tools & Installation

You will need the following software installed:

Python
Version 3.8 or higher
Install via python.org or using a package manager like pyenv or Homebrew.
Ollama
Used to run local LLMs via CLI and REST API.
Install instructions:
macOS: brew install ollama
Windows/Linux: Visit ollama.com/download
After installation, pull a model (e.g., Mistral) using:
ollama pull mistral
Agno
A minimal Python framework for building agents.
Install via pip:
pip install agno

Once these prerequisites are in place, you're ready to start building and running your own AI agent locally.

What is Agno?

Agno is a minimal agent framework that simplifies the process of building intelligent agents powered by language models. It abstracts away much of the complexity found in heavier frameworks (like LangChain or AutoGPT), making it easier to get started and customize your workflow.

Agno allows your AI agent to:

Route tasks intelligently: It uses the LLM to decide what action to take next, whether it's answering a question, calling a tool, or saving information.
Use tools: Agents can interact with external functions (called "tools") like APIs, file operations, or even custom Python functions.
Manage memory: Agents can maintain short- or long-term memory across conversations, enabling context retention and dynamic learning.
Run in a loop: Agno supports iterative agent behavior, where the LLM thinks, plans, acts, and reflects in multiple steps — ideal for solving complex problems.

Agno acts as the logic and coordination layer for your AI agent. While Ollama provides the language model, Agno handles everything around it: how the model thinks, when it uses tools, and how it manages context.

Step 1: Set Up Ollama

The first step in building your local AI agent is setting up Ollama, a powerful tool that lets you run open-source large language models (LLMs) on your own machine with minimal configuration. Ollama handles model downloads, inference, and exposes a local API that can be accessed from your applications, making it ideal for agent-based workflows.

1. Install Ollama

Ollama provides native support for macOS, Windows, and Linux. Choose the appropriate method for your operating system:

macOS (via Homebrew):
brew install ollama
Windows and Linux:
Visit the official installation page: https://ollama.com/download
Follow the step-by-step instructions for your platform.

After installation, verify it with:
ollama --version

2. Pull a Model

Once Ollama is installed, you need to download an LLM to run. Start with a smaller, general-purpose model like Mistral or LLaMA 3:

ollama pull mistral

ollama pull llama3

This command downloads the model to your local machine. The process may take a few minutes depending on your internet speed and the model size.

3. Run the Model

To test that everything is working, run the model directly from the command line:

ollama run mistral

You’ll enter an interactive prompt where you can chat with the model. Type any question and press Enter to receive a response.
This also launches a local API server at http://localhost:11434, which your agent will use in later steps.

Step 2: Install and Understand Agno

Agno is a minimalistic Python framework designed specifically for building AI agents. It provides simple abstractions that allow your agent to reason, act, and interact with tools in a loop — without the overhead of complex dependencies or architecture.

1. Install Agno

To install Agno, use pip — Python’s package manager:

pip install agno

This command installs the latest version of Agno along with any necessary dependencies.

2. Key Features of Agno

Looping Agent Logic: Agents can think, decide, and act in cycles.
Tool Use: Easily integrate external tools like web search, calculators, or file readers.
Memory: Maintain and access memory across agent steps (optional).
LLM-Agnostic: Connects to any language model with a compatible API — including local models via Ollama.

3. How It Works

Agno lets you define an agent using a simple Python class or function. You configure it with:

A reasoning strategy (e.g., "plan and execute")
A list of tools the agent can access
An LLM interface (like the Ollama API)

Each time the agent receives an input, it decides what to do next — whether that’s calling a tool, asking the LLM for help, or responding directly. This loop continues until the task is complete.

With Agno installed and understood, you're now ready to integrate it with Ollama to begin building your first functional AI agent.

Step 3: Build Your First Agent

Now that you have Ollama running a local LLM and Agno installed, it’s time to create your first working AI agent. In this step, we’ll define the core components of the agent — the language model, memory, and tools — and run a simple interaction loop.

This agent will be able to respond to questions and use tools like a basic calculator, all powered locally.

1. Import Required Modules

Create a new Python script (e.g., agent.py) and start by importing the necessary components:

from agno import Agent, OllamaLLM, Tool

2. Define a Simple Tool

Let’s define a basic calculator tool that the agent can use when needed:

def add_numbers(a: int, b: int) -> int:
    return a + b

add_tool = Tool(
    name="add_numbers",
    description="Adds two integers together.",
    func=add_numbers
)

3. Initialize the LLM (Ollama)

Here, we connect Agno to your locally running LLM via Ollama's API:

llm = OllamaLLM(model="mistral")  # Or 'llama3', depending on the model you pulled

4. Create the Agent

Now combine the LLM and the tool into an agent instance:

agent = Agent(
    llm=llm,
    tools=[add_tool],
    memory=True  # Enables context persistence
)

5. Run an Interaction Loop

You can now interact with the agent in a simple loop:

print("Ask me anything. Type 'exit' to quit.")

while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        break

    response = agent.run(user_input)
    print("Agent:", response)

Example Interaction

You: What's 4 + 5?
Agent: The result of 4 + 5 is 9.

Here, the agent interprets your prompt, decides whether it needs to use the add_numbers tool, and then uses it to generate a response — all locally.

Step 4: Extend with Tools

Building a functional AI agent is just the beginning. To make your agent truly powerful and practical, you can extend it by integrating custom tools that allow it to interact with external resources and perform a wider range of tasks.

1. Why Add Tools?

Tools enable your agent to:

Access real-world information beyond the LLM’s training data (e.g., web search)
Manipulate files and data
Perform calculations or API requests
Automate repetitive or complex workflows

By providing your agent with access to these capabilities, you increase its usefulness and allow it to solve more complex problems.

2. Example: Adding a Web Search Tool

Suppose you want your agent to look up current information on the internet. You can integrate a simple web search tool by wrapping a search API or a Python library.

Here’s a hypothetical example using a dummy search function:

import requests

def web_search(query: str) -> str:
    # Example placeholder for actual search API integration
    # In practice, connect to Bing, Google Custom Search, or another API
    response = requests.get(f"https://api.example.com/search?q={query}")
    results = response.json()
    return results['top_result']['snippet']

search_tool = Tool(
    name="web_search",
    description="Searches the web and returns a brief summary.",
    func=web_search
)

agent.tools.append(search_tool)

3. Example: Adding a File Writing Tool

Allow your agent to write data to a file — useful for note-taking or saving reports:

def write_to_file(filename: str, content: str) -> str:
    with open(filename, 'w') as file:
        file.write(content)
    return f"Content written to {filename}"

file_write_tool = Tool(
    name="write_to_file",
    description="Writes content to a specified file.",
    func=write_to_file
)

agent.tools.append(file_write_tool)

4. Using Tools in Conversation

With these tools, your agent can interpret user requests and decide when to invoke them. For example:

You: Find me the latest news about AI.
Agent: [calls web_search tool] Here is a summary of the latest AI news: ...

You: Save my meeting notes to notes.txt.
Agent: [calls write_to_file tool] Notes have been saved to notes.txt.

Extending your AI agent with custom tools drastically improves its capabilities and applicability. You can connect to any API, run scripts, manipulate files, or automate workflows — all controlled through natural language prompts.

Troubleshooting & Tips

Building AI agents locally is rewarding, but you might encounter some common challenges along the way. This section highlights typical issues and offers tips to help you overcome them.

1. Ollama Model Issues

Model Fails to Load or Pull: Ensure your internet connection is stable when pulling models. Some large models can take several minutes to download. If the download stalls, try restarting the process or checking Ollama’s logs for errors.
Model Not Responding or Crashing: Make sure your system meets the minimum RAM and disk requirements. Running out of memory can cause the model to crash or freeze. Close unnecessary applications or try a smaller model.

2. Model Performance and Speed

Use Smaller Models: Ollama supports various model sizes. For faster responses on limited hardware, try lightweight models like TinyLlama or other distilled variants that require fewer resources.
CPU vs GPU: If you have a compatible GPU, enable GPU acceleration in Ollama to speed up inference. Otherwise, expect slower performance on CPU-only setups.

3. Debugging Agent Prompts and Behavior

Unexpected or Irrelevant Responses: The agent’s output depends heavily on prompt design. Refine your system prompts and instructions to guide the model’s behavior more precisely.
Tool Invocation Issues: Confirm that tools are correctly registered with the agent and their function signatures match expected inputs. Logging inputs and outputs during tool calls can help identify issues.
Memory Not Persisting: If context or memory seems lost between interactions, check that memory is enabled in your agent configuration and properly implemented.

Essence

You’ve learned how to build your own AI agent from scratch using Ollama and Agno. This lets you run powerful language models locally while handling reasoning, tools, and memory. You now have a good foundation for setting up Ollama to run open-source LLMs on your machine and using Agno to create flexible AI agents that can work with different tools. You also saw how to add custom tools, fix common issues, and improve performance.

This is just the start of what you can do. You can take your AI agents further by building systems where multiple agents work together, adding local vector databases for better long-term memory, or using Retrieval-Augmented Generation (RAG) to combine external knowledge with your models. It’s a great way to begin creating smart assistants and get the most out of local AI. If you want professional help with Python development, check out Python Development Services.

Table of contents

FAQs about Ollama and Agno

What is Ollama, and how does it assist in running local LLMs?

What is Agno in Python, and how is it different from LangChain?

How do I build an AI agent using Ollama and Agno?

Which models can I use with Ollama?

Do I need a GPU to use Ollama?

What tools can I integrate into my AI agent?

Why should I build local AI agents instead of using cloud APIs?

How can I make my AI agent remember past conversations?

What if the model is too slow or crashes?

Can I build multi-agent systems with Agno?

Pravin Prajapati

Full Stack Developer

Expert in frontend and backend development, combining creativity with sharp technical knowledge. Passionate about keeping up with industry trends, he implements cutting-edge technologies, showcasing strong problem-solving skills and attention to detail in crafting innovative solutions.

Most Visited Blog

How Set Up Ollama to Run DeepSeek R1 Locally for RAG

How to install and run DeepSeek R1 with Ollama on your local system for offline AI capabilities to build secure, high-performance AI applications.

Everything You Need to Know about Ollama

Ollama allows you to run large language models locally, enhancing privacy, performance, and customization. Explore its features, models, and benefits.

The New Future Turn: Web-Based Augmented Reality

Explore the future of digital experiences with Web-Based Augmented Reality (AR).