How to build your Custom Chatbot with Llama 3.1 using Ollama and OpenWebUI: A Step-by-Step Guide

Muhammad Usama Khan
7 min readSep 22, 2024

--

Introduction

In today’s Tech era, chatbots are becoming increasingly popular as a way to automate customer service, provide personalized support, and streamline business processes. With the rise of conversational AI, building your custom chatbot has never been easier or more accessible. In this article, we’ll explore how to build a custom chatbot using LLaMA 3.1 and OpenWebUI.

How to Build Your Own Custom Chatbot with LLaMA 3.1 and OpenWebUI: A Step-by-Step Guide

What are Chatbots?

Chatbots are computer programs that use natural language processing (NLP) to simulate conversation with humans. They can be integrated into websites, messaging apps, or even physical devices like smart speakers. Chatbots have various applications, from simple customer support chatbots to more complex ones that can perform tasks like booking appointments or answering technical questions.

What is LLaMA and Ollama?

LLaMA (Large Language Model Meta AI) is a series of advanced language models developed by Meta (formerly Facebook). It’s a powerful model developed by Meta that can perform a variety of tasks, from answering questions and generating text to translating languages and even creating images. LLaMA models come in various sizes, from smaller, less resource-intensive versions to massive 70-billion parameter models. These models are ideal for research and real-world applications, such as chatbots, content creation, and much more.

Ollama, on the other hand, is a platform or tool that simplifies working with AI models like LLaMA by providing user-friendly APIs. It acts as an interface to deploy, manage, and interact with these models in production environments, making it easier for businesses and developers to integrate AI capabilities into their applications.

In this article, we’ll focus on deploying LLaMA 3.1, a powerful version of LLaMA, using OpenWebUI and show you how to build a custom chatbot.

Why Choose LLaMA 3.1 and OpenWebUI?

LLaMA (Large Language Model Application) is an AI model developed by Meta AI that’s specifically designed for conversational interfaces. It’s a powerful tool for building chatbots, as it can understand and respond to natural language inputs with remarkable accuracy.

OpenWebUI is a web-based interface builder that allows you to create custom interfaces for your chatbot without requiring extensive coding knowledge. With OpenWebUI, you can design visually appealing interfaces that match your brand’s identity and style.

Step-by-Step Guide: Building Your Custom Chatbot

Step 1: Prerequisites: Preferred minimum Hardware and Software Requirements

For LLaMA 3.1 (8B) Model:

  • GPU: NVIDIA or similar having at least 16GB VRAM (minimum), but 24GB+ is recommended for smoother operation
  • RAM: 64 GB of system RAM or more, can be tried with 32 GB as well but it will heavily impact the performance
  • Disk Space: At least 50GB of free disk space (for storing model weights and cache)
  • CUDA Version: 11.6 or later
  • OS: Linux or Windows (Linux is preferred for better compatibility with AI/ML frameworks)

For LLaMA 3.1 (70B) Model:

  • GPU: NVIDIA or similar with at least 40GB of VRAM (64GB VRAM is ideal)
  • RAM: Minimum of 128GB of system RAM for smooth operation (256GB is ideal for handling large models)
  • Disk Space: 150GB+ free disk space for model weights, checkpoints, and cached data
  • CUDA Version: 11.6 or later
  • OS: Linux recommended (Windows compatibility may vary)

Note: The larger the model, the more GPU memory and system RAM you’ll need. For the best performance and smoother training or inference processes, a high-performance GPU is needed.

Step 2: Install Ollama

Ollama Source code is available on Git Hub. You can easily clone it to your system and do the manual installation simply by reading the instructions on this ReadMe file but for the easy and one-click installation go to the next steps.

For Linux

Run this command and it will install Ollama on your system,

curl -fsSL https://ollama.com/install.sh | sh

For macOS

Please follow this Link to install Ollama on macOS

For Windows (being in Preview)

You can install Ollama on Windows by visiting this link

Docker

The official Ollama Docker image is available on Docker Hub.

Here are the instructions for Manual installation.

Step 3: Download the Llama 3.1 Model

The below command is the same for all the platforms. You need to open the terminal on your system and run this command first.

# ollama 8B (4.7G)
ollama run llama3.1

It will download the pre-trained Llama3.1 model.

Please note that this might take some time as Ollama needs to download the pre-trained model first and then it will run it on the terminal’s console.

Llama 3.1 is a new model from Meta and is available in three different sizes i.e. 8B, 70B, and 405B parameter sizes.

Llama 3.1 family of models:

  • 8B (4.7 GB)
  • 70B (40 GB)
  • 405B (231 GB)

After it is downloaded successfully, you should see the chat prompt open on the terminal’s console where you can prompt anything and ask anything and llama3.1 will respond accordingly.

Step 4: Running locally

Let’s start the Ollama server, run the following command in the terminal:

ollama serve

This will start the Ollama server on 127.0.0.1:11434.

Step 5: Run Llama3 with Ollama

In a separate terminal, you can run Llama again! Just run the following command on the Command Line Interface:

ollama run llama3.1

It will run the default llama3.1 8B model with which you can interact. If you want to run another model, you can simply type the full name with parameter size.

For example if you want to run the fine-tuned llama3.1 8B model then you can simply type this command in the terminal and run,

ollama run llama3.1:8b-instruct-fp16

This will download the llama3.1:8b-instruct-fp16 model (if it's not downloaded already) and run this model. Now you can interact with your locally deployed llama model by sending a prompt and receiving the generated output.

Step 6: Ollama REST API

To test the model with API, you can use an HTTP client e.g. CURL to send requests to http://localhost:11434. For example:

Generate a response

curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt":"What is LLM?"
}'

Chat with a model

curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{ "role": "user", "content": "What is LLM?" }
]
}'

For more information on Ollama API and its endpoints, visit API documentation.

Now what if you want it to act like ChatGPT where you can quickly go to your browser, access it, open chats, give prompts and receive outputs just like ChatGPT? Sounds big to be built? Well, No. On the contrary, it’s very simple.

There is an open-source tool called OpenWebUI which can help achieve this very quickly.

Step 7: Install Docker

I assume you have Docker installed already on your system but do not worry If you haven’t installed it already then you can simply go to this link and get Docker installed according to your Operating system.

Step 8: Pull and run the Open WebUI Container

Once Docker is installed, run this command in your terminal to pull the OpenWebUI container and run it:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Step 9: Open the WebUI

Open a web browser and visit http://localhost:3000. You should now see the OpenWebUI GUI. You'll then be asked to create an account or log in if you already created the admin account the first time.

Step 10: Select Your Model

Click on “Select a model” and choose Llama 3.1:8B or 8B-instruct (if you have already downloaded it) from the list of available models. This will download the necessary files and prepare the environment for your interaction with the model.

Step 11: Chat with Llama 3.1

You’re now ready to begin chatting with the Llama 3.1 model using Open WebUI’s interface. You can ask questions, provide input, upload files or just explore the abilities of this powerful language model.

Here is a quick example!

By following these steps, you’ll be able to install and use Open WebUI with Ollama and Llama 3.1 model, unlocking a world of possibilities for your AI-related projects.

Troubleshooting

On Linux systems, if you face issues accessing or selecting the available Llama models from the list on OpenWebUI GUI or they are not visible at all then you may need to modify the docker run command and re-run the OpenWebUI Docker container with this command.

docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -me open-webui --restart always ghcr.io/open-webui/open-webui:main

You should now be able to access the OpenWebUI interface with this link on your browser.

http://<your-static-ip>:8080/ OR http://localhost:8080/

Conclusion

Building a custom chatbot using LLaMA 3.1 and OpenWebUI is a straightforward process that requires minimal coding knowledge. By following these steps, you can create a powerful conversational interface that engages with your users in a more human-like way.

Happy Prompting!

👍 Don’t forget to like, share, and save this for reference later! Spread the knowledge.

🔔 Follow Muhammad Usama Khan here and on LinkedIn for more insightful content on Cloud, DevOps, AI/Data, SRE and Cloud Native.

--

--

Muhammad Usama Khan

LinkedIn Top Voice | DevOps/SRE Expert 🚀 | Certified Cloud Consultant ☁️ | AWS, Azure, GCP, OTC | AI & Data | 🔔 https://www.linkedin.com/in/usama-khan-791b0