aqa-tests Project Proposal: VitAI - RAG-based assistant for AQAvit

Goal: To develop a Retrieval-Augmented Generation (RAG) system that allows users to ask natural language questions about the repo and receive accurate, context-aware, and up-to-date answers based on the codebase.

Details:

Identify model (free open source embedding model (i.e, all-MiniLM-L6-v2)
Identify data (i.e., code, etc)
- repo:
  - https://github.com/adoptium/aqa-tests
  - https://github.com/adoptium/TKG
  - and others
- Phase 1 - README, wiki, blogs, etc in the above 2 repos
- Phase 2 - code
Data Processing & Indexing
- Generate embeddings
- Store in vector DB (e.g. FAISS or Chroma).
Build Retriever Layer
- Implement CLI semantic search: query + context chunks => LLM
- Test queries like:
  - How do I run sanity.openjdk locally on xLinux?
  - What does the sanity.openjdk test target run?
  - Where can I configure JVM options for running tests in Grinder?
  - How do I contribute a new test case to the functional suite?
Validate output quality
Serve as an API (optional)

Component	Possible Open-Source Tool
LLM	Mistral 7B / LLaMA 2 via llama.cpp
Embeddings	sentence-transformers (e.g., all-MiniLM)
Vector DB	FAISS or Chroma
Pipeline	LangChain or LlamaIndex
Document Input	.txt, .md (Phrase 1)

Aug 27 '25 14:08 llxia

Refer to this presentation that relates to this project (from Microservices for Test presentation), which offers up some questions that we ask around the different test activities, and what data sources we could refer to when asking these questions.

each service equates to an AI agent

Sep 19 '25 15:09 smlambert

Repos for documentation:

https://github.com/adoptium/aqa-tests - the central project for AQAvit
https://github.com/adoptium/TKG - a lightweight test harness for running a diverse set of tests or commands
https://github.com/adoptium/aqa-systemtest - system verification tests
https://github.com/adoptium/aqa-test-tools - various test tools that improve workflow
https://github.com/adoptium/STF - system Test Framework
https://github.com/adoptium/bumblebench - microbenchmarking test framework
https://github.com/adoptium/run-aqa - run-aqa GitHub action
https://github.com/adoptium/openj9-systemtest - system verification tests for OpenJ9
https://github.com/eclipse-openj9/openj9/

For now, only include the README files and the docs folder.

Sep 19 '25 18:09 llxia

Some experience https://bill.burkecentral.com/2025/09/21/my-crud-ai-llm-chat-app-experiences/

Sep 22 '25 14:09 sophia-guo

I've been experimenting with the GitHub REST API to perform agentic RAG using lexical search (similar to what Claude Code does). This would eliminate the need for creating, hosting, and managing vector stores all together. I have temporarily hosted it for testing. It would be great if you could ask deep technical questions that require code understanding. Please visit https://vitai-demo.netlify.app/

You can find the code at anirudhsengar/vitai-demo. The goal of this temporary deployment with a simple UI is to evaluate whether this approach can effectively gather and reason over the necessary context in an agentic manner.

If successful, the agent can be hosted to a local/remote MCP server and integrated with other tools. Else, I can begin experimenting with vector store approach and provide another prototype. A hybrid approach could also be adopted i.e. vector store + lexical search.

Oct 17 '25 07:10 anirudhsengar

Additional repos: https://github.com/eclipse-omr/omr https://github.com/ibmruntimes/openj9-openjdk-jdk - exclude https://github.com/ibmruntimes/openj9-openjdk-jdk/tree/openj9/src/hotspot folder

Oct 28 '25 12:10 llxia