Introduction ¶
This post outlines my journey in building a self-hosted document analyzer using Ollama’s Large Language Model (LLM) hosted on my NAS, Llama2 as the core model, and Cloudflare Workers to streamline API requests and enhance scalability.
By integrating Cloudflare Tunnel to securely expose the Ollama API and Cloudflare Workers to handle requests, I created a robust solution where users can paste text or upload documents for real-time analysis.
You can try it here: My AI Analyzer
Key Features ¶
- Self-Hosted LLM: Powered by Ollama, hosted locally on a NAS.
- Cloudflare Workers: Serverless, lightweight, and highly scalable API request handling.
- Secure Internet Access: Cloudflare Tunnel exposes the API without compromising security.
- Dynamic Input Handling: Users can paste text or upload PDF/TXT files for analysis.
- Custom Prompts: Offers prompts like summarization, sentiment analysis, and key point extraction.
- NDJSON Streaming Support: Handles real-time streamed responses for faster user feedback.
Architecture Overview ¶
1. Self-Hosted LLM on NAS ¶
- The Ollama platform runs on my NAS, providing local and efficient access to the Llama2 model.
- Hosting the LLM on a NAS reduces dependency on cloud-based services while ensuring lower latency.
2. Cloudflare Tunnel ¶
- Secures public access to the Ollama API, removing the need for port forwarding.
- All traffic passes through Cloudflare’s secure network, protecting my internal infrastructure.
3. Hugo Static Website ¶
- The front-end is built using Hugo, a static site generator.
- Users can paste text or upload PDF/TXT files for analysis via a simple, responsive interface.
4. Cloudflare Worker as the Backend ¶
- Instead of using a traditional Node.js server, I use Cloudflare Workers for API routing and request handling.
- Cloudflare Workers process requests asynchronously, handle large streaming responses (NDJSON), and format the output for display.
Implementation Details ¶
Front-End Design ¶
The website features:
- Tabbed Interface: Users can switch between pasting text or uploading files.
- Upload Support: Handles PDFs and text files, parsing them client-side.
- Loading Indicators: Prevents multiple submissions and shows a clear processing state.
Cloudflare Worker ¶
The Cloudflare Worker:
- Receives POST requests from the Hugo front-end.
- Sends the request to the Ollama API using secure HTTPS.
- Handles NDJSON streaming from the LLM, concatenating the
response
fields into a full result. - Sends the formatted analysis back to the front-end.
Prompts and Functionality ¶
- Summarize this document: Returns a concise summary.
- Extract key points: Lists the main ideas.
- Evaluate sentiment: Analyzes the overall tone.
Challenges and Solutions ¶
- Handling NDJSON Responses The Ollama API streams its output in NDJSON format (one JSON object per line). To handle this:
- The Worker processes each chunk and combines the responses into a full result.
- By using TextDecoder and ReadableStream, I ensured seamless decoding of the streamed data.
- File Upload Parsing Handling different file types:
- TXT Files: Read directly using client-side JavaScript.
- PDF Files: Parsed using PDF.js for text extraction before sending to the Worker.
- Security and CORS
- Cloudflare Tunnel keeps the API secure by exposing only authenticated traffic.
- CORS (Cross-Origin Resource Sharing) headers were added to ensure the Worker accepts requests from the front-end.
Conclusion ¶
This project demonstrates how combining Ollama, Cloudflare Workers, and Llama2 enables efficient and scalable self-hosted AI solutions. The Cloudflare Worker replaced the Node.js server, reducing complexity and improving request handling through serverless technology.
If you’re interested in building your own self-hosted AI application, Cloudflare Workers offer a cost-effective and highly scalable alternative to traditional backends. Let me know your thoughts or any features you’d like to see in future updates