Implementing Local RAG Service: Integrating Open WebUI, Ollama, and Qwen2.5

Introduction#

When building information retrieval and generative AI applications, the Retrieval-Augmented Generation (RAG) model is increasingly favored by developers for its powerful ability to retrieve relevant information from knowledge bases and generate accurate answers. However, implementing an end-to-end local RAG service requires not only a suitable model but also the integration of a robust user interface and an efficient inference framework.

Utilizing an easily deployable Docker approach can greatly simplify model management and service integration when constructing a local RAG service. Here, we rely on the user interface and model inference services provided by Open WebUI, and introduce the bge-m3 embedding model via Ollama to achieve document vectorization for retrieval, thereby assisting Qwen2.5 in generating more precise answers.

In this article, we will discuss how to quickly launch Open WebUI via Docker, synchronize Ollama's RAG capabilities, and implement an efficient document retrieval and generation system in conjunction with the Qwen2.5 model.

Project Overview#

This project will use the following key tools:

Open WebUI: Provides a web interface for user interaction with the model.
Ollama: Used for managing embedding and large language model inference tasks. The bge-m3 model in Ollama will be used for document retrieval, while Qwen2.5 will be responsible for answer generation.
Qwen2.5: The model part utilizes the Qwen 2.5 series launched by Alibaba, providing natural language generation for retrieval-augmented generation services.

To implement the RAG service, we need the following steps:

Deploy Open WebUI as the user interaction interface.
Configure Ollama to efficiently schedule the Qwen2.5 series models.
Use the embedding model named bge-m3 configured in Ollama to achieve retrieval vectorization.

Deploying Open WebUI#

Open WebUI provides a streamlined Docker solution, allowing users to start the web interface without manually configuring numerous dependencies.

First, ensure that Docker is installed on the server. If not installed, you can quickly install it using the following command:

curl https://get.docker.com | sh

Then create a directory to save Open WebUI's data, so that the data will not be lost after project updates:

sudo mkdir -p /DATA/open-webui

Next, we can start Open WebUI with the following command:

docker run -d -p 3000:8080 \
        --add-host=host.docker.internal:host-gateway \
        -v /DATA/open-webui:/app/backend/data \
        --name open-webui \
        --restart always \
        ghcr.io/open-webui/open-webui:main

If you want to run Open WebUI with Nvidia GPU support, you can use the following command:

docker run -d -p 3000:8080 \
        --gpus all \
        --add-host=host.docker.internal:host-gateway \
        -v /DATA/open-webui:/app/backend/data \
        --name open-webui \
        --restart always \
        ghcr.io/open-webui/open-webui:cuda

Here, we expose the Open WebUI service on port 3000 of the machine, which can be accessed via a browser at http://localhost:3000 (for remote access, use the public IP and open port 3000). /DATA/open-webui is the data storage directory, and you can adjust this path as needed.

Of course, in addition to the Docker installation method, you can also install Open WebUI via pip, source compilation, Podman, etc. For more installation methods, please refer to the Open WebUI official documentation.

Basic Setup#

Enter the account information to register, set a strong password!!!

Important

The first registered user will be automatically set as the system administrator, so please ensure you are the first registered user.

Click on the avatar in the lower left corner and select the admin panel.
Click on settings in the panel.
Disable new user registration (optional).
Click save in the lower right corner.

Configuring Ollama and Qwen2.5#

Deploying Ollama#

Install Ollama on the local server. Currently, Ollama provides various installation methods, please refer to Ollama's official documentation to download and install the latest version 0.3.11 (Qwen2.5 only starts supporting this version). Installation details can be found in a previous article I wrote: Ollama: From Beginner to Advanced.

Start the Ollama service (if started via Docker, this is not necessary, but port 11434 must be exposed):

ollama serve

Once the Ollama service is started, you can connect to it by accessing http://localhost:11434.

The Ollama Library provides semantic vector models (bge-m3) as well as various text generation models (including Qwen2.5). Next, we will configure Ollama to meet the project's needs for document retrieval and question-answer generation.

Downloading the Qwen2.5 Model#

To install Qwen2.5 via Ollama, you can directly run the ollama pull command in the command line to download the Qwen2.5 model. For example, to download the 72B model of Qwen2.5, you can use the following command:

ollama pull qwen2.5:72b

This command will fetch the Qwen2.5 model from Ollama's model repository and prepare the runtime environment.

Qwen2.5 offers various model sizes, including 72B, 32B, 14B, 7B, 3B, 1.5B, 0.5B, etc. You can choose the appropriate model based on your needs and GPU memory size. I am using a server with 4x V100, so I can directly choose the 72B model. If fast output speed is required and minor performance loss is acceptable, you can use the quantized version qwen2.5:72b-instruct-q4_0; if you can accept slower output speed, you can use qwen2.5:72b-instruct-q5_K_M. For the 4x V100 server, although the token generation of the q5_K_M model is noticeably lagging, I still chose the q5_K_M model to test Qwen2.5's performance.

For personal computers with less memory, it is recommended to use the 14B or 7B models, which can be downloaded using the following commands:

ollama pull qwen2.5:14b

ollama pull qwen2.5:7b

If you have both Open WebUI and Ollama services running, you can also download the model from the admin panel.

Downloading the bge-m3 Model#

Download the bge-m3 model in Ollama, which is used for document vectorization. Run the following command in the command line to download the model (or download it in the Open WebUI interface):

ollama pull bge-m3:latest

At this point, we have completed the configuration of Ollama, and next we will configure the RAG service in Open WebUI.

RAG Integration and Configuration#

Configuring Ollama's RAG Interface in Open WebUI#

Accessing the Open WebUI Admin Interface#

After starting Open WebUI, you can directly access the service address via a web browser, log in to your admin account, and then enter the admin panel.

Setting the Ollama Interface#

In the admin panel of Open WebUI, click on Settings, and you will see options for external connections. Ensure that the Ollama API address is host.docker.internal:11434, then click the verify connection button on the right to confirm that the Ollama service is connected properly.

Setting the Semantic Vector Model#

In the admin panel of Open WebUI, click on Settings, then click on Documents, and complete the following steps:

Set the semantic vector model engine to Ollama.
Set the semantic vector model to bge-m3:latest.
The remaining settings can be kept as default; here I set the maximum file upload size to 10MB, the maximum upload quantity to 3, Top K to 5, and block size and block overlap to 1500 and 100 respectively, and enabled PDF image processing.
Click save in the lower right corner.

Testing the RAG Service#

Now, you have implemented a complete local RAG system. You can enter any natural language question in the main interface of Open WebUI, then upload the corresponding document. The system will call the semantic vector model to vectorize the document, and then use the Qwen2.5 model to retrieve the document, generate an answer, and return it to the user.

In the user chat interface of Open WebUI, upload the document you want to retrieve, then enter your question and click send. Open WebUI will call Ollama's bge-m3 model for document vectorization, and then call the Qwen2.5 model for question-answer generation.

Here, I uploaded a simple txt file (text generated by GPT) with the following content:

# Adventure in the Enchanted Forest

## Introduction
In a distant kingdom's border, there lies a mysterious enchanted forest, rumored to be home to many strange creatures and ancient magic. Few dare to enter, as those who have ventured into the forest have never returned. The story's protagonist is a young adventurer named Evan.

## Chapter One: Evan's Decision
Evan is a young man who loves adventure and exploration. He has heard many stories about the enchanted forest since childhood. Despite his family and friends urging him not to go, he firmly believes that he is destined to uncover the secrets of this forest. One morning, he packs his bag, filled with courage and curiosity, and sets off towards the forest.

### 1.1 Preparations Before Departure
Before setting off, Evan visits the most famous library in town to research information about the enchanted forest. He discovers an ancient manuscript that records the route into the forest and how to avoid some of its dangerous creatures. Evan copies this manuscript into his notebook, preparing to refer to it when needed.

### 1.2 The First Crossing
As soon as Evan enters the forest, he feels that the atmosphere here is completely different from the outside world. The air is filled with a rich floral scent, along with faint strange sounds. On the first day of crossing the forest, Evan encounters no danger, but he can sense that something is watching him from the shadows.

## Chapter Two: Mysterious Creatures
The next day, Evan continues deeper into the forest. However, he does not go far before encountering a strange creature. It is a glowing little deer, radiating a soft blue light. At first, Evan feels surprised and fearful, but the little deer shows no intention of attacking him and instead leads him to a hidden cave.

### 2.1 Secrets in the Cave
Inside the cave, Evan discovers an ancient stone tablet inscribed with strange symbols. The little deer seems to know the meaning of these symbols and guides Evan step by step in deciphering them. It turns out that these symbols record a powerful magic that can help him find lost treasures in the forest.

### 2.2 Receiving Help
Evan decides to accept the little deer's help in unraveling the secrets of these symbols. They spend several days in the cave, and Evan learns how to use the resources in the forest to create potions and weapons. Through this, his survival skills in the forest greatly improve.

## Chapter Three: The Final Trial
Guided by the little deer, Evan finally arrives at the heart of the forest, where there is an ancient altar. It is said that only the bravest adventurers can pass the altar's trials and obtain the ultimate treasure.

### 3.1 Facing Fears
The area around the altar is filled with various traps and illusions. Evan must confront his deepest fears to overcome these obstacles. Ultimately, he uses his wisdom and courage to overcome everything and earns the right to enter the altar.

### 3.2 Discovering the Treasure
At the center of the altar, Evan discovers a sparkling gemstone. It is said that this gemstone possesses the power to change one's fate. Evan picks up the gemstone and feels its immense power. He knows that this is not just a treasure but could also be the key to unlocking the secrets of the enchanted forest.

## Conclusion
Evan successfully uncovered some of the secrets of the enchanted forest, becoming a legendary hero. His adventure story also inspires more young adventurers to embark on journeys of exploration into the unknown world, armed with courage and wisdom.

Then I asked three questions:

What strange creature did Evan encounter in the forest?
What was inscribed on the ancient stone tablet that Evan found in the cave?
What treasure did Evan discover at the center of the altar?

The following image shows the answers:

Summary#

With the help of Open WebUI and Ollama, we can easily build an efficient and intuitive local RAG system. By using the bge-m3 semantic vector model for text vectorization, combined with the Qwen2.5 generation model, users can interact efficiently with document retrieval and enhanced generation tasks in a unified web interface. This not only protects data privacy but also significantly enhances the localization capabilities of generative AI applications.

Original Link#

https://cuterwrite.top/p/integrate-open-webui-ollama-qwen25-local-rag/