01 - Motivation
I wanted to build something AI related learning and exam prep help for ham radio and landed on a ham radio exam prep tool. The ASOC (Amateur Station Operator's Certificate) examination , notoriously dry to study for dense textbook, lots of regulations, Q codes, circuit theory. The constraint I set for myself (i have no money): 100% local and free. No OpenAI API, no paid services. Just Ollama running Llama3 on my machine, a PDF textbook, and some Python.
02 - The Problem with Vanilla LLMs
If you just ask Llama3 "explain antenna impedance for the ASOC exam", you get a decent answer — but it's not grounded in anything. It might hallucinate regulations, quote wrong frequency bands, or explain things in a way that doesn't match the actual syllabus.
The fix is RAG — Retrieval Augmented Generation. Instead of relying on the model's training data, you:
- Feed it the actual textbook
- At query time, retrieve the most relevant chunks
- Give those chunks to the LLM as context
- The LLM answers from the textbook, not from memory
The answers are now grounded, accurate, and exam-relevant.
03 - Data: The NIAR Study Manual
The textbook is a 158-page PDF from the National Institute of Amateur Radio, Hyderabad. It covers:
| Section | Pages | Content |
|---|---|---|
| Radio Theory (Restricted) | 13–86 | Electricity, circuits, semiconductors, propagation, antennas |
| Radio Theory (General) | 87–114 | Advanced modulation, satellites, ionosphere |
| Radio Regulations | 115–134 | ITU rules, Q codes, operating procedures |
| Morse Code | 135–140 | Timing, sending, receiving |
Rather than dumping the whole PDF into the LLM context (too large, too noisy), I split it into 25 modules — each mapped to a page range and a list of topics. This is defined in config.py:
MODULES = [
{
"id": 1,
"title": "Atomic Structure & Basic Electricity",
"level": "Restricted",
"pages": [13, 18],
"topics": ["Atomic structure", "Conductors", "Insulators", "Ohm's Law", ...]
},
# ... 24 more
]This structure is the backbone of the whole system. Every query, every exam question, every study session is scoped to one module.
04 — The RAG Pipeline
Step 1 - PDF Extraction
PyMuPDF extracts text page by page. Each module's pages are extracted independently:
def extract_text(pdf_path, start_page, end_page):
doc = pymupdf.open(pdf_path)
out = ""
for i in range(start_page - 1, end_page - 1):
page = doc[i]
out += page.get_text()
return outStep 2 - Sliding Window Chunking
Instead of hard splits every 1000 characters, I use a sliding window with overlap so concepts that span chunk boundaries don't get lost:
def chunk_text(text, chunk_size=1000, overlap=100):
chunks = []
start = 0
while start < len(text):
chunks.append(text[start:start + chunk_size])
start += chunk_size - overlap
return chunksWith a 100-character overlap, each chunk shares context with its neighbors. This dramatically improves retrieval quality for multi-paragraph concepts.
Step 3 - Vector Embeddings + ChromaDB
Each chunk is converted to a 384-dimensional vector using sentence-transformers (all-MiniLM-L6-v2) and stored in ChromaDB with metadata:
embedding_fn = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
collection = client.get_or_create_collection("hamradio", embedding_function=embedding_fn)
collection.add(
documents=[chunk],
ids=[f"module_{module['id']}_chunk_{i}"],
metadatas=[{"module_id": module["id"], "title": module["title"]}]
)The module_id metadata is critical, it lets us filter queries to only search within the active module.
Step 4 - Retrieval
When a user asks a question, ChromaDB converts it to a vector and finds the top 3 most semantically similar chunks — from that module only:
results = collection.query(
query_texts=[question],
n_results=3,
where={"module_id": module_id}
)
return " ".join(results["documents"][0])Step 5 - Generation
The retrieved chunks are injected into the system prompt as context, and Llama3 answers from them:
system = f"""You are an expert ham radio teacher preparing a student for the ASOC exam.
Use the following textbook content as your primary source and explain clearly.
Textbook content:
{context}"""
response = ollama.chat(
model="llama3",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": question}
]
)The LLM never sees the full textbook — only the 3 most relevant chunks for the question asked. This keeps the context tight and the answers focused.
05 - Exam Mode
Exam mode uses the same RAG pipeline but with a different prompt — instead of explaining, it generates MCQ questions:
system = f"""You are a ham radio exam question generator.
Generate one multiple choice question based on the textbook content below.
Respond ONLY in valid JSON format with keys: question, options (A B C D), correct, explanation.
Use ONLY double quotes. No single quotes anywhere.
Textbook content:
{context}"""The JSON output is parsed and rendered as an interactive quiz — pick an answer, get instant feedback with explanation.
One challenge: Llama3 occasionally returns malformed JSON with single quotes or extra text around the JSON block. The fix is a combination of regex cleanup and a fallback extractor, the code below i dont understand it clearly honestly I found it online , we directly take the info between {} so to nullify the errors caused by malformed JSON:
raw = re.sub(r"'([^']*)'(\s*:)", r'"\1"\2', raw)
raw = re.sub(r":\s*'([^']*)'", r': "\1"', raw)
try:
return json.loads(raw)
except json.JSONDecodeError:
start = raw.find("{")
end = raw.rfind("}") + 1
return json.loads(raw[start:end])06 - Backend: FastAPI
Four endpoints power the whole app:
| Endpoint | Method | What it does |
|---|---|---|
/modules | GET | Returns all 25 modules |
/study | POST | RAG query → teaching answer |
/exam/question | POST | RAG query → MCQ question |
/exam/check | POST | Checks answer, returns feedback |
FastAPI's Pydantic models handle request validation automatically , if the frontend sends the wrong shape of data, it returns a clear error instantly.
07 - Frontend: Next.js
Three pages:
/- Module grid, grouped by grade level (Restricted / General / Both)/study/[id]- Auto-generates a module summary on load, then a chat interface for follow-up questions/exam/[id]- Generates MCQ questions one at a time, tracks score, shows explanation after each answer
The UI is black + cyan on JetBrains Mono — intentionally minimal.
08 - What RAG Type Did We Use?
We used Basic RAG - the simplest and most common variant:
Query → semantic search → top-k chunks → stuff into prompt → generate
It works well for this use case because the textbook is well-structured and the queries are focused (scoped to one module at a time).
Other RAG variants worth knowing:
| Type | What it adds | When to use |
|---|---|---|
| Hybrid RAG | Combines semantic + keyword (BM25) search | When queries contain exact terms like Q codes or regulation numbers |
| Parent-Child RAG | Small chunks for search, large chunks for context | When answers need more surrounding context |
| Reranking RAG | Second model reranks retrieved chunks by relevance | When retrieval quality needs to be higher |
| HyDE | Generates a hypothetical answer first, then searches | When queries are vague or indirect |
| Agentic RAG | LLM decides how many times to search and what to search for | Production systems requiring high accuracy |
| Graph RAG | Builds a knowledge graph of the document | Complex documents with rich entity relationships |
For a production version of this tool, Hybrid RAG + Reranking would be the biggest improvements. Ham radio content has lots of specific terms (QRM, QSL, AFSK, ITU Article 25) that semantic search alone sometimes misses - keyword search catches those exactly.
09 - What's Next
More textbooks - The system is not tied to this one PDF. Any ham radio study material can be ingested by updating config.py with new modules and page ranges.
Hybrid RAG - Adding BM25 keyword search alongside semantic search for better retrieval of specific regulations and Q codes.
User progress tracking - Track which modules a user has studied and which exam questions they got wrong, and weight future questions toward weak areas.
Deployment - Currently runs fully locally. The next step is deploying the FastAPI backend to Railway and the Next.js frontend to Vercel, with a hosted Ollama instance or switching to Gemini's free tier for the LLM (sponsor me :D ).
Author: Himanshu Suri Date: March 2026