Sebastian Lobentanzer issues

Results 129 issues of


Sebastian Lobentanzer

Benchmark: extend `pytest_generate_tests` to accept any number of test data YAML files

Allow the conftest setup to work with separate test data files for different test cases.

DSPy for iterative (automatic) prompt tuning

We could explore the utility of libraries such as [DSPy](https://github.com/stanfordnlp/dspy) for automating the prompt generation and optimisation process. Alternative: Microsoft Guidance? https://github.com/guidance-ai/guidance

Optimise extraction prompts via DSPy

There remain some questions about the right prompt for the behaviour of the different models; llama series models seem to handle prompts differently than GPT. As an initial experiment, DSPy...

Text extraction module

Create module `extract.py` for dedicated extraction of information from text. Precursor to the `biogather` package.

Benchmark: record variance in results

Benchmark: failure mode assessment

For cases of bad performance in particular, it would be good to have an automated way of getting a rough idea of failure modes: were the instructions not understood, system...

Persona: how to choose the right problem

Basically a RAG-driven assistant along the lines of https://www.cell.com/cell/fulltext/S0092-8674(24)00304-0

Higher-level `biochatter.chat()` wrapper for conversations?

Benchmark: web app for non-programmer users

To allow more crowdsourcing of additional benchmark materials, it could be good to have a GUI where benchmark Q&A could be entered. Would be constrained to existing basic unit tests....

Function calling module

Extend RAG agent to be able to use arbitrary APIs to generate response supplements. LLaMA3 example: https://nbsanity.com/static/d06085f1dacae8c9de9402f2d7428de2/demo.html