rag-chat – an Emacs-based workflow for indexing a codebase, querying local embeddings, and displaying AI answers in a *RAG Chat* buffer.
This setup uses three main pieces:
index_codebase.py
– Recursively indexes your project’s files and stores embeddings in a local Chroma DB.query_codebase.py
– Retrieves relevant code snippets (chunks) from that DB based on a textual query.rag-chat.el
– An Emacs extension that ties it all together. You run M-x rag-chat-index-codebase
to index, then M-x rag-chat-ask-question
to query and get an AI completion. The results appear in the *RAG Chat* buffer.This Retrieval-Augmented Generation (RAG) workflow addresses the challenge of referencing large codebases when asking an AI model for help. Instead of sending your entire project to the LLM, you index the code offline (embedding each file or chunk) into a local vector store (Chroma). Then, at query time:
query_codebase.py
) to find the top-matching chunks for your query.Below, we provide the full code for each component and highlight the major parts of each script or Emacs function.
1. M-x rag-chat-mode Opens the <RAG Chat> buffer (optional to start, but you can see instructions there). 2. M-x rag-chat-index-codebase - Prompts for project root (the folder you want indexed). - Optional subdirectory if you only want to index part of it. - Spawns index_codebase.py to generate embeddings into project_root/chroma_db. 3. M-x rag-chat-ask-question - Prompts for the same project root. - Asks for your question (e.g. "How do I call get_secrets()?"). - Runs query_codebase.py to grab top code snippets, merges them into a prompt, and calls the LLM. - Shows the Q&A in <RAG Chat>. 4. Reading & Closing <RAG Chat> - The final Q&A appears in a new window or in a buffer named *RAG Chat*. - Press q in that window to close it (quit-window), or use M-x kill-buffer if you prefer.
Major Components:
chunk_text
– Splits a file’s content into ~1000-word chunks for manageable embedding sizes.embed_text
– Calls OpenAI’s text-embedding-ada-002 model to produce vector embeddings.index_directory
– Recursively scans files, skipping or chunking as needed, then stores them in Chroma DB.main
– The entry point that determines the project root, optional subdir, and calls index_directory
.#!/usr/bin/env python3 import os import sys import openai import chromadb from chromadb.config import Settings openai.api_key = os.getenv("OPENAI_API_KEY") # or set your key here def chunk_text(text, max_chunk_size=1000): \"\"\" Break text into ~max_chunk_size tokens (approx. by word count). \"\"\" lines = text.split("\\n") chunks = [] current_chunk = [] current_size = 0 for line in lines: line_len = len(line.split()) if current_size + line_len > max_chunk_size: chunks.append("\\n".join(current_chunk)) current_chunk = [line] current_size = line_len else: current_chunk.append(line) current_size += line_len if current_chunk: chunks.append("\\n".join(current_chunk)) return chunks def embed_text(text): \"\"\" Use OpenAI embeddings endpoint (text-embedding-ada-002). \"\"\" response = openai.Embedding.create( input=[text], model=\"text-embedding-ada-002\" ) embedding = response[\"data\"][0][\"embedding\"] return embedding def index_directory(code_dir, db_dir): \"\"\" Recursively index files in code_dir: chunk, embed, store in Chroma DB at db_dir. \"\"\" # Initialize Chroma client client = chromadb.Client(Settings( chroma_db_impl=\"duckdb+parquet\", persist_directory=db_dir )) collection = client.get_or_create_collection(name=\"my_codebase\") for root, dirs, files in os.walk(code_dir): # skip indexing if it's the DB dir if db_dir in os.path.join(root): continue for f in files: # Example filter: index everything or selectively do if f.endswith(\".py\") file_path = os.path.join(root, f) try: with open(file_path, \"r\", encoding=\"utf-8\") as fp: content = fp.read() except Exception as e: print(f\"Error reading {file_path}: {e}\") continue # Chunk the file content chunks = chunk_text(content) for i, chunk in enumerate(chunks): try: emb = embed_text(chunk) except Exception as e_emb: print(f\"Embedding error for {file_path}, chunk {i}: {e_emb}\") continue doc_id = f\"{file_path}-chunk{i}\" metadata = {\"path\": file_path, \"chunk_index\": i} collection.add( documents=[chunk], embeddings=[emb], metadatas=[metadata], ids=[doc_id] ) def main(): \"\"\"Usage: python index_codebase.py[ ] We'll store the DB in /chroma_db \"\"\" if len(sys.argv) < 2: print(\"Usage: python index_codebase.py [ ]\") sys.exit(1) project_root = os.path.abspath(sys.argv[1]) code_dir = project_root if len(sys.argv) > 2: code_dir = os.path.join(project_root, sys.argv[2]) db_dir = os.path.join(project_root, \"chroma_db\") print(f\"Project root: {project_root}\") print(f\"Indexing code in: {code_dir}\") print(f\"DB location: {db_dir}\") index_directory(code_dir, db_dir) print(\"Indexing complete.\") if __name__ == \"__main__\": main()
Major Components:
retrieve_relevant_chunks
– Connects to the same local DB, queries for the top K nearest neighbors using the user’s question as text.main
– Takes project_root
and question
from the command line, prints a JSON object with fields like "documents"
and "metadatas"
.#!/usr/bin/env python3 import os import sys import json import openai import chromadb from chromadb.config import Settings openai.api_key = os.getenv(\"OPENAI_API_KEY\") def retrieve_relevant_chunks(db_dir, query, top_k=20): \"\"\" Query local Chroma DB at db_dir for the top_k relevant code chunks. \"\"\" client = chromadb.Client(Settings( chroma_db_impl=\"duckdb+parquet\", persist_directory=db_dir )) collection = client.get_collection(\"my_codebase\") results = collection.query(query_texts=[query], n_results=top_k) return results def main(): \"\"\"Usage: python query_codebase.py\"\"\" if len(sys.argv) < 3: sys.exit(1) project_root = os.path.abspath(sys.argv[1]) question = \" \".join(sys.argv[2:]) db_dir = os.path.join(project_root, \"chroma_db\") results = retrieve_relevant_chunks(db_dir, question, top_k=20) print(json.dumps(results, indent=2)) if __name__ == \"__main__\": main()
Major Components:
rag-chat-index-codebase
– Prompts for root/subdir, spawns index_codebase.py
asynchronously.rag-chat--query-codebase
– Synchronously runs query_codebase.py
, captures JSON output via shell-command-to-string
, then calls a callback with parsed data.rag-chat--openai-chat
– Sends “system” + “user” messages to https://api.openai.com/v1/chat/completions
. Uses request-deferred
with the user’s model/key settings.rag-chat-ask-question
– The main Q&A command. It merges code snippets from the DB with your question, calls rag-chat--openai-chat
, and pops up the *RAG Chat* window with the final answer.rag-chat-mode
– Provides a small major mode with instructions and a q
key to quit-window.;;; rag-chat.el --- Retrieval-Augmented Chat for a codebase, using local Chroma DB -*- lexical-binding: t; -*- ;; ;; Usage: ;; 1. Place this file somewhere in your Emacs load-path, then: ;; (require 'rag-chat) ;; 2. Configure: ;; (setq rag-chat--index-script \"/path/to/index_codebase.py\") ;; (setq rag-chat--query-script \"/path/to/query_codebase.py\") ;; (setq rag-chat--openai-api-key \"sk-XXXX\") ;; 3. M-x rag-chat-mode -> opens the *RAG Chat* buffer. ;; 4. M-x rag-chat-index-codebase -> index codebase. ;; 5. M-x rag-chat-ask-question -> query local DB + ask AI. (require 'json) (require 'request-deferred) (require 'subr-x) ;; for string-trim (defgroup rag-chat nil \"Retrieval-Augmented Chat in Emacs for local codebases.\" :group 'tools) (defcustom rag-chat--openai-api-key \"\" \"Your OpenAI API key for chat completions.\" :type 'string :group 'rag-chat) (defcustom rag-chat--openai-model \"gpt-4\" \"Which OpenAI Chat Completion model to use.\" :type 'string :group 'rag-chat) (defcustom rag-chat--index-script \"/path/to/index_codebase.py\" \"Path to the Python script that indexes the codebase.\" :type 'string :group 'rag-chat) (defcustom rag-chat--query-script \"/path/to/query_codebase.py\" \"Path to the Python script that queries the local vector DB.\" :type 'string :group 'rag-chat) (defcustom rag-chat--num-results 20 \"Number of relevant chunks to retrieve from the codebase for each query.\" :type 'integer :group 'rag-chat) (defvar rag-chat--buffer-name \"*RAG Chat*\" \"Name of the buffer where RAG chat Q&A is displayed.\") ;;; --- Indexing the Codebase --- (defun rag-chat-index-codebase (&optional project-root subdir) \"Index the code in PROJECT-ROOT or optional SUBDIR using index_codebase.py. The embeddings are stored in/chroma_db.\" (interactive (list (read-directory-name \"Project root: \" default-directory) (read-string \"Subdir to index (optional): \"))) (let* ((bufname \"*RAG Index Output*\") (script rag-chat--index-script) (project-arg (expand-file-name project-root)) (subdir-arg subdir)) (with-current-buffer (get-buffer-create bufname) (erase-buffer) (insert (format \"Indexing project root: %s\\n\" project-arg)) (when (and subdir-arg (not (string-empty-p subdir-arg))) (insert (format \"Indexing subdir: %s\\n\" subdir-arg))) (insert (format \"Using script: %s\\n\\n\" script)) ;; Start an async process to run index_codebase.py (if (and subdir-arg (not (string-empty-p subdir-arg))) (start-process \"rag-index-process\" bufname \"python3\" script project-arg subdir-arg) (start-process \"rag-index-process\" bufname \"python3\" script project-arg))) (pop-to-buffer bufname))) ;;; --- Querying the Codebase & Building Chat Prompts --- (defun rag-chat--query-codebase (project-root question callback) \"Run query_codebase.py for PROJECT-ROOT and QUESTION. Pass JSON results to CALLBACK.\" (let* ((script rag-chat--query-script) (project-arg (expand-file-name project-root)) (cmd (mapconcat #'shell-quote-argument (list \"python3\" script project-arg question) \" \")) (raw-output (shell-command-to-string cmd))) (condition-case err (let ((json-data (json-read-from-string raw-output))) (funcall callback json-data)) (error (message \"RAG Chat Query Error: %S\" err) nil)))) (defun rag-chat--openai-chat (messages callback) \"Send MESSAGES (list of role/content objects) to the OpenAI Chat endpoint. Call CALLBACK with the final answer string.\" (request-deferred \"https://api.openai.com/v1/chat/completions\" :type \"POST\" :headers `((\"Content-Type\" . \"application/json\") (\"Authorization\" . ,(concat \"Bearer \" rag-chat--openai-api-key))) :data (json-encode `((\"model\" . ,rag-chat--openai-model) (\"messages\" . ,messages) (\"max_tokens\" . 10000) (\"temperature\" . 0.2))) :parser 'json-read :success (cl-function (lambda (&key data &allow-other-keys) (let* ((choices (assoc-default 'choices data)) (first-choice (and choices (aref choices 0))) (msg (assoc-default 'message first-choice)) (content (assoc-default 'content msg))) (funcall callback content)))) :error (cl-function (lambda (&rest args &key error-thrown &allow-other-keys) (message \"OpenAI Chat Error: %S\" error-thrown))))) ;;; --- High-Level Q&A Function --- (defun rag-chat-ask-question (&optional project-root question) \"Prompt for PROJECT-ROOT (directory) and QUESTION (string). 1. Query codebase for relevant snippets (top rag-chat--num-results). 2. Send them + QUESTION to OpenAI Chat Completion API. 3. Display answer in `rag-chat--buffer-name` buffer, then pop that buffer.\" (interactive (list (read-directory-name \"Project root: \" default-directory) (read-string \"Question: \"))) (unless (and question (not (string-empty-p question))) (user-error \"No question provided.\")) (let* ((root (expand-file-name project-root)) (q question)) (rag-chat--query-codebase root q (lambda (retrieval-result) (let* ((docs (cdr (assoc \"documents\" retrieval-result))) ;; convert vectors to lists: (docs-list (append docs nil)) (docs-lol (mapcar (lambda (subv) (append subv nil)) docs-list)) (snippets (apply #'append docs-lol)) (system-message \"You are a coding assistant. Use the following code snippets if relevant to answer the user's question.\") (user-message (concat \"Relevant Code Snippets:\\n\" (mapconcat #'identity snippets \"\\n\\n---\\n\\n\") \"\\n\\nUser Question:\\n\" q)) (messages (list (list (cons \"role\" \"system\") (cons \"content\" system-message)) (list (cons \"role\" \"user\") (cons \"content\" user-message))))) (rag-chat--openai-chat messages (lambda (answer) (with-current-buffer (get-buffer-create rag-chat--buffer-name) (goto-char (point-max)) (insert (format \"Q: %s\\nA: %s\\n\\n\" q answer))) (pop-to-buffer (get-buffer-create rag-chat--buffer-name) '((display-buffer-pop-up-window) (inhibit-same-window . t))))))))) ;;; --- A Simple RAG Chat Mode --- (defun rag-chat-mode () \"Major mode for retrieval-augmented chat with a local codebase. Use: M-x rag-chat-index-codebase -> to index /chroma_db M-x rag-chat-ask-question -> to query codebase + LLM for an answer. All Q&A is displayed in the *RAG Chat* buffer.\" (interactive) (switch-to-buffer (get-buffer-create rag-chat--buffer-name)) (rag-chat-mode-init)) (defun rag-chat-mode-init () (kill-all-local-variables) (use-local-map (make-sparse-keymap)) (setq major-mode 'rag-chat-mode mode-name \"RAG-Chat\") (read-only-mode -1) (local-set-key (kbd \"q\") #'quit-window) (insert \"Welcome to RAG Chat Mode.\\n\\n\") (insert \"Commands:\\n\") (insert \" M-x rag-chat-index-codebase -> Index project code into /chroma_db\\n\") (insert \" M-x rag-chat-ask-question -> Query the codebase with an OpenAI-augmented Q&A.\\n\\n\")) (provide 'rag-chat) ;;; rag-chat.el ends here
chunk_text
slices content into ~1,000-word pieces.embed_text
sends each chunk to OpenAI text-embedding-ada-002.collection.add
.project_root
and question
.chroma_db
in that root, queries with collection.query
to find top matches.\"documents\"
, \"metadatas\"
, etc. Emacs parses that to get relevant code snippets.rag-chat--query-codebase
calls query_codebase.py
synchronously, capturing JSON with shell-command-to-string
.rag-chat--openai-chat
uses request-deferred
to call the /v1/chat/completions API, passing your rag-chat--openai-model
and rag-chat--openai-api-key
.rag-chat-ask-question
merges the retrieved code snippets + your question, forms “system” + “user” messages, and displays the final AI answer in the *RAG Chat* buffer.rag-chat-index-codebase
spawns index_codebase.py
asynchronously, letting you watch the logs in *RAG Index Output*.
By default, rag-chat-mode
binds q to
quit-window
, so if *RAG Chat* has focus, pressing q
closes that window. You can also do:
Complete Code | Back to Blog Index | Main Terminal
Explore or adapt these scripts for more advanced flows (e.g., local embeddings, multi-turn Q&A, or agent-based approaches). The code here is enough to get started with an Emacs-based RAG solution referencing your codebase in real-time.