Himanshu's blog

01 - Motivation

I wanted to build something AI related learning and exam prep help for ham radio and landed on a ham radio exam prep tool. The ASOC (Amateur Station Operator's Certificate) examination , notoriously dry to study for dense textbook, lots of regulations, Q codes, circuit theory. The constraint I set for myself (i have no money): 100% local and free. No OpenAI API, no paid services. Just Ollama running Llama3 on my machine, a PDF textbook, and some Python.

02 - The Problem with Vanilla LLMs

If you just ask Llama3 "explain antenna impedance for the ASOC exam", you get a decent answer — but it's not grounded in anything. It might hallucinate regulations, quote wrong frequency bands, or explain things in a way that doesn't match the actual syllabus.

The fix is RAG — Retrieval Augmented Generation. Instead of relying on the model's training data, you:

Feed it the actual textbook
At query time, retrieve the most relevant chunks
Give those chunks to the LLM as context
The LLM answers from the textbook, not from memory

The answers are now grounded, accurate, and exam-relevant.

03 - Data: The NIAR Study Manual

The textbook is a 158-page PDF from the National Institute of Amateur Radio, Hyderabad. It covers:

Section	Pages	Content
Radio Theory (Restricted)	13–86	Electricity, circuits, semiconductors, propagation, antennas
Radio Theory (General)	87–114	Advanced modulation, satellites, ionosphere
Radio Regulations	115–134	ITU rules, Q codes, operating procedures
Morse Code	135–140	Timing, sending, receiving

Rather than dumping the whole PDF into the LLM context (too large, too noisy), I split it into 25 modules — each mapped to a page range and a list of topics. This is defined in config.py:

MODULES = [
    {
        "id": 1,
        "title": "Atomic Structure & Basic Electricity",
        "level": "Restricted",
        "pages": [13, 18],
        "topics": ["Atomic structure", "Conductors", "Insulators", "Ohm's Law", ...]
    },
    # ... 24 more
]

This structure is the backbone of the whole system. Every query, every exam question, every study session is scoped to one module.

04 — The RAG Pipeline

Step 1 - PDF Extraction

PyMuPDF extracts text page by page. Each module's pages are extracted independently:

def extract_text(pdf_path, start_page, end_page):
    doc = pymupdf.open(pdf_path)
    out = ""
    for i in range(start_page - 1, end_page - 1):
        page = doc[i]
        out += page.get_text()
    return out

Step 2 - Sliding Window Chunking

Instead of hard splits every 1000 characters, I use a sliding window with overlap so concepts that span chunk boundaries don't get lost:

def chunk_text(text, chunk_size=1000, overlap=100):
    chunks = []
    start = 0
    while start < len(text):
        chunks.append(text[start:start + chunk_size])
        start += chunk_size - overlap
    return chunks

With a 100-character overlap, each chunk shares context with its neighbors. This dramatically improves retrieval quality for multi-paragraph concepts.

Step 3 - Vector Embeddings + ChromaDB

Each chunk is converted to a 384-dimensional vector using sentence-transformers (all-MiniLM-L6-v2) and stored in ChromaDB with metadata:

embedding_fn = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
collection = client.get_or_create_collection("hamradio", embedding_function=embedding_fn)
 
collection.add(
    documents=[chunk],
    ids=[f"module_{module['id']}_chunk_{i}"],
    metadatas=[{"module_id": module["id"], "title": module["title"]}]
)

The module_id metadata is critical, it lets us filter queries to only search within the active module.

Step 4 - Retrieval

When a user asks a question, ChromaDB converts it to a vector and finds the top 3 most semantically similar chunks — from that module only:

results = collection.query(
    query_texts=[question],
    n_results=3,
    where={"module_id": module_id}
)
return " ".join(results["documents"][0])

Step 5 - Generation

The retrieved chunks are injected into the system prompt as context, and Llama3 answers from them:

system = f"""You are an expert ham radio teacher preparing a student for the ASOC exam.
Use the following textbook content as your primary source and explain clearly.
 
Textbook content:
{context}"""
 
response = ollama.chat(
    model="llama3",
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": question}
    ]
)

The LLM never sees the full textbook — only the 3 most relevant chunks for the question asked. This keeps the context tight and the answers focused.

05 - Exam Mode

Exam mode uses the same RAG pipeline but with a different prompt — instead of explaining, it generates MCQ questions:

system = f"""You are a ham radio exam question generator.
Generate one multiple choice question based on the textbook content below.
Respond ONLY in valid JSON format with keys: question, options (A B C D), correct, explanation.
Use ONLY double quotes. No single quotes anywhere.
 
Textbook content:
{context}"""

The JSON output is parsed and rendered as an interactive quiz — pick an answer, get instant feedback with explanation.

One challenge: Llama3 occasionally returns malformed JSON with single quotes or extra text around the JSON block. The fix is a combination of regex cleanup and a fallback extractor, the code below i dont understand it clearly honestly I found it online , we directly take the info between {} so to nullify the errors caused by malformed JSON:

raw = re.sub(r"'([^']*)'(\s*:)", r'"\1"\2', raw)
raw = re.sub(r":\s*'([^']*)'", r': "\1"', raw)
try:
    return json.loads(raw)
except json.JSONDecodeError:
    start = raw.find("{")
    end = raw.rfind("}") + 1
    return json.loads(raw[start:end])

06 - Backend: FastAPI

Four endpoints power the whole app:

Endpoint	Method	What it does
`/modules`	GET	Returns all 25 modules
`/study`	POST	RAG query → teaching answer
`/exam/question`	POST	RAG query → MCQ question
`/exam/check`	POST	Checks answer, returns feedback

FastAPI's Pydantic models handle request validation automatically , if the frontend sends the wrong shape of data, it returns a clear error instantly.

07 - Frontend: Next.js

Three pages:

/ - Module grid, grouped by grade level (Restricted / General / Both)
/study/[id] - Auto-generates a module summary on load, then a chat interface for follow-up questions
/exam/[id] - Generates MCQ questions one at a time, tracks score, shows explanation after each answer

The UI is black + cyan on JetBrains Mono — intentionally minimal.

08 - What RAG Type Did We Use?

We used Basic RAG - the simplest and most common variant:

Query → semantic search → top-k chunks → stuff into prompt → generate

It works well for this use case because the textbook is well-structured and the queries are focused (scoped to one module at a time).

Other RAG variants worth knowing:

Type	What it adds	When to use
Hybrid RAG	Combines semantic + keyword (BM25) search	When queries contain exact terms like Q codes or regulation numbers
Parent-Child RAG	Small chunks for search, large chunks for context	When answers need more surrounding context
Reranking RAG	Second model reranks retrieved chunks by relevance	When retrieval quality needs to be higher
HyDE	Generates a hypothetical answer first, then searches	When queries are vague or indirect
Agentic RAG	LLM decides how many times to search and what to search for	Production systems requiring high accuracy
Graph RAG	Builds a knowledge graph of the document	Complex documents with rich entity relationships

For a production version of this tool, Hybrid RAG + Reranking would be the biggest improvements. Ham radio content has lots of specific terms (QRM, QSL, AFSK, ITU Article 25) that semantic search alone sometimes misses - keyword search catches those exactly.

09 - What's Next

More textbooks - The system is not tied to this one PDF. Any ham radio study material can be ingested by updating config.py with new modules and page ranges.

Hybrid RAG - Adding BM25 keyword search alongside semantic search for better retrieval of specific regulations and Q codes.

User progress tracking - Track which modules a user has studied and which exam questions they got wrong, and weight future questions toward weak areas.

Deployment - Currently runs fully locally. The next step is deploying the FastAPI backend to Railway and the Next.js frontend to Vercel, with a hosted Ollama instance or switching to Gemini's free tier for the LLM (sponsor me :D ).

Author: Himanshu Suri Date: March 2026