Sebastian Lobentanzer
Sebastian Lobentanzer
Allow the conftest setup to work with separate test data files for different test cases.
We could explore the utility of libraries such as [DSPy](https://github.com/stanfordnlp/dspy) for automating the prompt generation and optimisation process. Alternative: Microsoft Guidance? https://github.com/guidance-ai/guidance
There remain some questions about the right prompt for the behaviour of the different models; llama series models seem to handle prompts differently than GPT. As an initial experiment, DSPy...
Create module `extract.py` for dedicated extraction of information from text. Precursor to the `biogather` package.
For cases of bad performance in particular, it would be good to have an automated way of getting a rough idea of failure modes: were the instructions not understood, system...
Basically a RAG-driven assistant along the lines of https://www.cell.com/cell/fulltext/S0092-8674(24)00304-0
To allow more crowdsourcing of additional benchmark materials, it could be good to have a GUI where benchmark Q&A could be entered. Would be constrained to existing basic unit tests....
Extend RAG agent to be able to use arbitrary APIs to generate response supplements. LLaMA3 example: https://nbsanity.com/static/d06085f1dacae8c9de9402f2d7428de2/demo.html