Project Proposal: VitAI - RAG-based assistant for AQAvit
Goal: To develop a Retrieval-Augmented Generation (RAG) system that allows users to ask natural language questions about the repo and receive accurate, context-aware, and up-to-date answers based on the codebase.
Details:
- Identify model (free open source embedding model (i.e, all-MiniLM-L6-v2)
- Identify data (i.e., code, etc)
- repo:
- https://github.com/adoptium/aqa-tests
- https://github.com/adoptium/TKG
- and others
- Phase 1 - README, wiki, blogs, etc in the above 2 repos
- Phase 2 - code
- repo:
- Data Processing & Indexing
- Generate embeddings
- Store in vector DB (e.g. FAISS or Chroma).
- Build Retriever Layer
- Implement CLI semantic search: query + context chunks => LLM
- Test queries like:
- How do I run sanity.openjdk locally on xLinux?
- What does the sanity.openjdk test target run?
- Where can I configure JVM options for running tests in Grinder?
- How do I contribute a new test case to the functional suite?
- Validate output quality
- Serve as an API (optional)
| Component | Possible Open-Source Tool |
|---|---|
| LLM | Mistral 7B / LLaMA 2 via llama.cpp |
| Embeddings | sentence-transformers (e.g., all-MiniLM) |
| Vector DB | FAISS or Chroma |
| Pipeline | LangChain or LlamaIndex |
| Document Input | .txt, .md (Phrase 1) |
Refer to this presentation that relates to this project (from Microservices for Test presentation), which offers up some questions that we ask around the different test activities, and what data sources we could refer to when asking these questions.
each service equates to an AI agent
Repos for documentation:
- https://github.com/adoptium/aqa-tests - the central project for AQAvit
- https://github.com/adoptium/TKG - a lightweight test harness for running a diverse set of tests or commands
- https://github.com/adoptium/aqa-systemtest - system verification tests
- https://github.com/adoptium/aqa-test-tools - various test tools that improve workflow
- https://github.com/adoptium/STF - system Test Framework
- https://github.com/adoptium/bumblebench - microbenchmarking test framework
- https://github.com/adoptium/run-aqa - run-aqa GitHub action
- https://github.com/adoptium/openj9-systemtest - system verification tests for OpenJ9
- https://github.com/eclipse-openj9/openj9/
For now, only include the README files and the docs folder.
Some experience https://bill.burkecentral.com/2025/09/21/my-crud-ai-llm-chat-app-experiences/
I've been experimenting with the GitHub REST API to perform agentic RAG using lexical search (similar to what Claude Code does). This would eliminate the need for creating, hosting, and managing vector stores all together. I have temporarily hosted it for testing. It would be great if you could ask deep technical questions that require code understanding. Please visit https://vitai-demo.netlify.app/
You can find the code at anirudhsengar/vitai-demo. The goal of this temporary deployment with a simple UI is to evaluate whether this approach can effectively gather and reason over the necessary context in an agentic manner.
If successful, the agent can be hosted to a local/remote MCP server and integrated with other tools. Else, I can begin experimenting with vector store approach and provide another prototype. A hybrid approach could also be adopted i.e. vector store + lexical search.
Additional repos: https://github.com/eclipse-omr/omr https://github.com/ibmruntimes/openj9-openjdk-jdk - exclude https://github.com/ibmruntimes/openj9-openjdk-jdk/tree/openj9/src/hotspot folder