Retrieval-Augmented Chat in Emacs

NAME

rag-chat – an Emacs-based workflow for indexing a codebase, querying local embeddings, and displaying AI answers in a *RAG Chat* buffer.

SYNOPSIS

This setup uses three main pieces:

  • index_codebase.py – Recursively indexes your project’s files and stores embeddings in a local Chroma DB.
  • query_codebase.py – Retrieves relevant code snippets (chunks) from that DB based on a textual query.
  • rag-chat.el – An Emacs extension that ties it all together. You run M-x rag-chat-index-codebase to index, then M-x rag-chat-ask-question to query and get an AI completion. The results appear in the *RAG Chat* buffer.

DESCRIPTION

This Retrieval-Augmented Generation (RAG) workflow addresses the challenge of referencing large codebases when asking an AI model for help. Instead of sending your entire project to the LLM, you index the code offline (embedding each file or chunk) into a local vector store (Chroma). Then, at query time:

  1. Emacs spawns a Python script (query_codebase.py) to find the top-matching chunks for your query.
  2. Those chunks + your question get combined and sent to OpenAI (or similar) for a final answer.
  3. The output is displayed in a dedicated Emacs buffer (*RAG Chat*), so you can read or copy the answer easily.

Below, we provide the full code for each component and highlight the major parts of each script or Emacs function.

USAGE

1. M-x rag-chat-mode
   Opens the <RAG Chat> buffer (optional to start, but you can see instructions there).

2. M-x rag-chat-index-codebase
   - Prompts for project root (the folder you want indexed).
   - Optional subdirectory if you only want to index part of it.
   - Spawns index_codebase.py to generate embeddings into project_root/chroma_db.

3. M-x rag-chat-ask-question
   - Prompts for the same project root.
   - Asks for your question (e.g. "How do I call get_secrets()?").
   - Runs query_codebase.py to grab top code snippets, merges them into a prompt, and calls the LLM.
   - Shows the Q&A in <RAG Chat>.

4. Reading & Closing <RAG Chat>
   - The final Q&A appears in a new window or in a buffer named *RAG Chat*.
   - Press q in that window to close it (quit-window), or use M-x kill-buffer if you prefer.
        

CODE: index_codebase.py

Major Components:

  • chunk_text – Splits a file’s content into ~1000-word chunks for manageable embedding sizes.
  • embed_text – Calls OpenAI’s text-embedding-ada-002 model to produce vector embeddings.
  • index_directory – Recursively scans files, skipping or chunking as needed, then stores them in Chroma DB.
  • main – The entry point that determines the project root, optional subdir, and calls index_directory.
#!/usr/bin/env python3
import os
import sys
import openai
import chromadb
from chromadb.config import Settings

openai.api_key = os.getenv("OPENAI_API_KEY")  # or set your key here

def chunk_text(text, max_chunk_size=1000):
    \"\"\"
    Break text into ~max_chunk_size tokens (approx. by word count).
    \"\"\"
    lines = text.split("\\n")
    chunks = []
    current_chunk = []
    current_size = 0

    for line in lines:
        line_len = len(line.split())
        if current_size + line_len > max_chunk_size:
            chunks.append("\\n".join(current_chunk))
            current_chunk = [line]
            current_size = line_len
        else:
            current_chunk.append(line)
            current_size += line_len
    if current_chunk:
        chunks.append("\\n".join(current_chunk))
    return chunks

def embed_text(text):
    \"\"\"
    Use OpenAI embeddings endpoint (text-embedding-ada-002).
    \"\"\"
    response = openai.Embedding.create(
        input=[text],
        model=\"text-embedding-ada-002\"
    )
    embedding = response[\"data\"][0][\"embedding\"]
    return embedding

def index_directory(code_dir, db_dir):
    \"\"\"
    Recursively index files in code_dir: chunk, embed, store in Chroma DB at db_dir.
    \"\"\"
    # Initialize Chroma client
    client = chromadb.Client(Settings(
        chroma_db_impl=\"duckdb+parquet\",
        persist_directory=db_dir
    ))
    collection = client.get_or_create_collection(name=\"my_codebase\")

    for root, dirs, files in os.walk(code_dir):
        # skip indexing if it's the DB dir
        if db_dir in os.path.join(root):
            continue

        for f in files:
            # Example filter: index everything or selectively do if f.endswith(\".py\")
            file_path = os.path.join(root, f)
            try:
                with open(file_path, \"r\", encoding=\"utf-8\") as fp:
                    content = fp.read()
            except Exception as e:
                print(f\"Error reading {file_path}: {e}\")
                continue

            # Chunk the file content
            chunks = chunk_text(content)
            for i, chunk in enumerate(chunks):
                try:
                    emb = embed_text(chunk)
                except Exception as e_emb:
                    print(f\"Embedding error for {file_path}, chunk {i}: {e_emb}\")
                    continue

                doc_id = f\"{file_path}-chunk{i}\"
                metadata = {\"path\": file_path, \"chunk_index\": i}
                collection.add(
                    documents=[chunk],
                    embeddings=[emb],
                    metadatas=[metadata],
                    ids=[doc_id]
                )

def main():
    \"\"\"Usage:
    python index_codebase.py  []
    We'll store the DB in /chroma_db
    \"\"\"
    if len(sys.argv) < 2:
        print(\"Usage: python index_codebase.py  []\")
        sys.exit(1)

    project_root = os.path.abspath(sys.argv[1])
    code_dir = project_root
    if len(sys.argv) > 2:
        code_dir = os.path.join(project_root, sys.argv[2])

    db_dir = os.path.join(project_root, \"chroma_db\")

    print(f\"Project root: {project_root}\")
    print(f\"Indexing code in: {code_dir}\")
    print(f\"DB location: {db_dir}\")

    index_directory(code_dir, db_dir)
    print(\"Indexing complete.\")

if __name__ == \"__main__\":
    main()
        

CODE: query_codebase.py

Major Components:

  • retrieve_relevant_chunks – Connects to the same local DB, queries for the top K nearest neighbors using the user’s question as text.
  • main – Takes project_root and question from the command line, prints a JSON object with fields like "documents" and "metadatas".
#!/usr/bin/env python3
import os
import sys
import json
import openai
import chromadb
from chromadb.config import Settings

openai.api_key = os.getenv(\"OPENAI_API_KEY\")

def retrieve_relevant_chunks(db_dir, query, top_k=20):
    \"\"\"
    Query local Chroma DB at db_dir for the top_k relevant code chunks.
    \"\"\"
    client = chromadb.Client(Settings(
        chroma_db_impl=\"duckdb+parquet\",
        persist_directory=db_dir
    ))
    collection = client.get_collection(\"my_codebase\")
    results = collection.query(query_texts=[query], n_results=top_k)
    return results

def main():
    \"\"\"Usage:
      python query_codebase.py  
    \"\"\"
    if len(sys.argv) < 3:
        sys.exit(1)

    project_root = os.path.abspath(sys.argv[1])
    question = \" \".join(sys.argv[2:])
    db_dir = os.path.join(project_root, \"chroma_db\")

    results = retrieve_relevant_chunks(db_dir, question, top_k=20)
    print(json.dumps(results, indent=2))

if __name__ == \"__main__\":
    main()
        

CODE: rag-chat.el

Major Components:

  • rag-chat-index-codebase – Prompts for root/subdir, spawns index_codebase.py asynchronously.
  • rag-chat--query-codebase – Synchronously runs query_codebase.py, captures JSON output via shell-command-to-string, then calls a callback with parsed data.
  • rag-chat--openai-chat – Sends “system” + “user” messages to https://api.openai.com/v1/chat/completions. Uses request-deferred with the user’s model/key settings.
  • rag-chat-ask-question – The main Q&A command. It merges code snippets from the DB with your question, calls rag-chat--openai-chat, and pops up the *RAG Chat* window with the final answer.
  • rag-chat-mode – Provides a small major mode with instructions and a q key to quit-window.
;;; rag-chat.el --- Retrieval-Augmented Chat for a codebase, using local Chroma DB -*- lexical-binding: t; -*-
;;
;; Usage:
;;   1. Place this file somewhere in your Emacs load-path, then:
;;        (require 'rag-chat)
;;   2. Configure:
;;        (setq rag-chat--index-script \"/path/to/index_codebase.py\")
;;        (setq rag-chat--query-script \"/path/to/query_codebase.py\")
;;        (setq rag-chat--openai-api-key \"sk-XXXX\")
;;   3. M-x rag-chat-mode  -> opens the *RAG Chat* buffer.
;;   4. M-x rag-chat-index-codebase -> index codebase.
;;   5. M-x rag-chat-ask-question -> query local DB + ask AI.

(require 'json)
(require 'request-deferred)
(require 'subr-x) ;; for string-trim

(defgroup rag-chat nil
  \"Retrieval-Augmented Chat in Emacs for local codebases.\"
  :group 'tools)

(defcustom rag-chat--openai-api-key \"\"
  \"Your OpenAI API key for chat completions.\"
  :type 'string
  :group 'rag-chat)

(defcustom rag-chat--openai-model \"gpt-4\"
  \"Which OpenAI Chat Completion model to use.\"
  :type 'string
  :group 'rag-chat)

(defcustom rag-chat--index-script \"/path/to/index_codebase.py\"
  \"Path to the Python script that indexes the codebase.\"
  :type 'string
  :group 'rag-chat)

(defcustom rag-chat--query-script \"/path/to/query_codebase.py\"
  \"Path to the Python script that queries the local vector DB.\"
  :type 'string
  :group 'rag-chat)

(defcustom rag-chat--num-results 20
  \"Number of relevant chunks to retrieve from the codebase for each query.\"
  :type 'integer
  :group 'rag-chat)

(defvar rag-chat--buffer-name \"*RAG Chat*\"
  \"Name of the buffer where RAG chat Q&A is displayed.\")


;;; --- Indexing the Codebase ---

(defun rag-chat-index-codebase (&optional project-root subdir)
  \"Index the code in PROJECT-ROOT or optional SUBDIR using index_codebase.py.
The embeddings are stored in /chroma_db.\"
  (interactive
   (list (read-directory-name \"Project root: \" default-directory)
         (read-string \"Subdir to index (optional): \")))
  (let* ((bufname \"*RAG Index Output*\")
         (script rag-chat--index-script)
         (project-arg (expand-file-name project-root))
         (subdir-arg subdir))
    (with-current-buffer (get-buffer-create bufname)
      (erase-buffer)
      (insert (format \"Indexing project root: %s\\n\" project-arg))
      (when (and subdir-arg (not (string-empty-p subdir-arg)))
        (insert (format \"Indexing subdir: %s\\n\" subdir-arg)))
      (insert (format \"Using script: %s\\n\\n\" script))
      ;; Start an async process to run index_codebase.py
      (if (and subdir-arg (not (string-empty-p subdir-arg)))
          (start-process \"rag-index-process\" bufname
                         \"python3\" script project-arg subdir-arg)
        (start-process \"rag-index-process\" bufname
                       \"python3\" script project-arg)))
    (pop-to-buffer bufname)))


;;; --- Querying the Codebase & Building Chat Prompts ---

(defun rag-chat--query-codebase (project-root question callback)
  \"Run query_codebase.py for PROJECT-ROOT and QUESTION. Pass JSON results to CALLBACK.\"
  (let* ((script rag-chat--query-script)
         (project-arg (expand-file-name project-root))
         (cmd (mapconcat #'shell-quote-argument
                         (list \"python3\" script project-arg question)
                         \" \"))
         (raw-output (shell-command-to-string cmd)))
    (condition-case err
        (let ((json-data (json-read-from-string raw-output)))
          (funcall callback json-data))
      (error
       (message \"RAG Chat Query Error: %S\" err)
       nil))))

(defun rag-chat--openai-chat (messages callback)
  \"Send MESSAGES (list of role/content objects) to the OpenAI Chat endpoint.
Call CALLBACK with the final answer string.\"
  (request-deferred
   \"https://api.openai.com/v1/chat/completions\"
   :type \"POST\"
   :headers `((\"Content-Type\" . \"application/json\")
              (\"Authorization\" . ,(concat \"Bearer \" rag-chat--openai-api-key)))
   :data (json-encode
          `((\"model\" . ,rag-chat--openai-model)
            (\"messages\" . ,messages)
            (\"max_tokens\" . 10000)
            (\"temperature\" . 0.2)))
   :parser 'json-read
   :success (cl-function
             (lambda (&key data &allow-other-keys)
               (let* ((choices (assoc-default 'choices data))
                      (first-choice (and choices (aref choices 0)))
                      (msg (assoc-default 'message first-choice))
                      (content (assoc-default 'content msg)))
                 (funcall callback content))))
   :error (cl-function
           (lambda (&rest args &key error-thrown &allow-other-keys)
             (message \"OpenAI Chat Error: %S\" error-thrown)))))

;;; --- High-Level Q&A Function ---

(defun rag-chat-ask-question (&optional project-root question)
  \"Prompt for PROJECT-ROOT (directory) and QUESTION (string).
1. Query codebase for relevant snippets (top rag-chat--num-results).
2. Send them + QUESTION to OpenAI Chat Completion API.
3. Display answer in `rag-chat--buffer-name` buffer, then pop that buffer.\"
  (interactive
   (list (read-directory-name \"Project root: \" default-directory)
         (read-string \"Question: \")))
  (unless (and question (not (string-empty-p question)))
    (user-error \"No question provided.\"))
  (let* ((root (expand-file-name project-root))
         (q question))
    (rag-chat--query-codebase
     root q
     (lambda (retrieval-result)
       (let* ((docs (cdr (assoc \"documents\" retrieval-result)))
              ;; convert vectors to lists:
              (docs-list (append docs nil))
              (docs-lol (mapcar (lambda (subv) (append subv nil)) docs-list))
              (snippets (apply #'append docs-lol))
              
              (system-message
               \"You are a coding assistant. Use the following code snippets if relevant to answer the user's question.\")
              (user-message
               (concat
                \"Relevant Code Snippets:\\n\"
                (mapconcat #'identity snippets \"\\n\\n---\\n\\n\")
                \"\\n\\nUser Question:\\n\"
                q))
              
              (messages (list
                         (list (cons \"role\" \"system\")
                               (cons \"content\" system-message))
                         (list (cons \"role\" \"user\")
                               (cons \"content\" user-message)))))
         (rag-chat--openai-chat
          messages
          (lambda (answer)
            (with-current-buffer (get-buffer-create rag-chat--buffer-name)
              (goto-char (point-max))
              (insert (format \"Q: %s\\nA: %s\\n\\n\" q answer)))
            (pop-to-buffer (get-buffer-create rag-chat--buffer-name)
                           '((display-buffer-pop-up-window)
                             (inhibit-same-window . t)))))))))

;;; --- A Simple RAG Chat Mode ---

(defun rag-chat-mode ()
  \"Major mode for retrieval-augmented chat with a local codebase.
Use:
  M-x rag-chat-index-codebase -> to index /chroma_db
  M-x rag-chat-ask-question   -> to query codebase + LLM for an answer.
All Q&A is displayed in the *RAG Chat* buffer.\"
  (interactive)
  (switch-to-buffer (get-buffer-create rag-chat--buffer-name))
  (rag-chat-mode-init))

(defun rag-chat-mode-init ()
  (kill-all-local-variables)
  (use-local-map (make-sparse-keymap))
  (setq major-mode 'rag-chat-mode
        mode-name \"RAG-Chat\")
  (read-only-mode -1)
  (local-set-key (kbd \"q\") #'quit-window)
  (insert \"Welcome to RAG Chat Mode.\\n\\n\")
  (insert \"Commands:\\n\")
  (insert \"  M-x rag-chat-index-codebase -> Index project code into /chroma_db\\n\")
  (insert \"  M-x rag-chat-ask-question -> Query the codebase with an OpenAI-augmented Q&A.\\n\\n\"))

(provide 'rag-chat)
;;; rag-chat.el ends here
        

EXPLANATION & HIGHLIGHTS

  • index_codebase.py:
    • Initializes a Chroma DB in duckdb+parquet mode. No external server needed.
    • Walks your code directory. For each file:
      • chunk_text slices content into ~1,000-word pieces.
      • embed_text sends each chunk to OpenAI text-embedding-ada-002.
      • Stores them in the local DB with collection.add.
  • query_codebase.py:
    • Accepts project_root and question.
    • Points to the same chroma_db in that root, queries with collection.query to find top matches.
    • Prints JSON with \"documents\", \"metadatas\", etc. Emacs parses that to get relevant code snippets.
  • rag-chat.el (the Emacs extension):
    • rag-chat--query-codebase calls query_codebase.py synchronously, capturing JSON with shell-command-to-string.
    • rag-chat--openai-chat uses request-deferred to call the /v1/chat/completions API, passing your rag-chat--openai-model and rag-chat--openai-api-key.
    • rag-chat-ask-question merges the retrieved code snippets + your question, forms “system” + “user” messages, and displays the final AI answer in the *RAG Chat* buffer.
    • rag-chat-index-codebase spawns index_codebase.py asynchronously, letting you watch the logs in *RAG Index Output*.

CLOSING THE *RAG CHAT* BUFFER

By default, rag-chat-mode binds q to quit-window, so if *RAG Chat* has focus, pressing q closes that window. You can also do:

  • C-x k – Kills the *RAG Chat* buffer entirely.
  • C-x 1 – If you just want to hide the extra window, run this in another window to delete all but the current window.

SEE ALSO

Complete Code | Back to Blog Index | Main Terminal

Explore or adapt these scripts for more advanced flows (e.g., local embeddings, multi-turn Q&A, or agent-based approaches). The code here is enough to get started with an Emacs-based RAG solution referencing your codebase in real-time.