Sebastian Lobentanzer

Results 129 issues of Sebastian Lobentanzer

Allow the conftest setup to work with separate test data files for different test cases.

We could explore the utility of libraries such as [DSPy](https://github.com/stanfordnlp/dspy) for automating the prompt generation and optimisation process. Alternative: Microsoft Guidance? https://github.com/guidance-ai/guidance

There remain some questions about the right prompt for the behaviour of the different models; llama series models seem to handle prompts differently than GPT. As an initial experiment, DSPy...

Create module `extract.py` for dedicated extraction of information from text. Precursor to the `biogather` package.

For cases of bad performance in particular, it would be good to have an automated way of getting a rough idea of failure modes: were the instructions not understood, system...

Basically a RAG-driven assistant along the lines of https://www.cell.com/cell/fulltext/S0092-8674(24)00304-0

To allow more crowdsourcing of additional benchmark materials, it could be good to have a GUI where benchmark Q&A could be entered. Would be constrained to existing basic unit tests....

Extend RAG agent to be able to use arbitrary APIs to generate response supplements. LLaMA3 example: https://nbsanity.com/static/d06085f1dacae8c9de9402f2d7428de2/demo.html