Erik Bjäreholt comments

Results 677 comments of


                                            Erik Bjäreholt

Benchmarks/evals

Improved the eval harness quite a bit in #90, among other changes (incl a lot of Docker stuff). I'm now 80% happy with the harness and am trying to think...

Make it better at writing small patches

@gptme Optimize the prompt in gptme/tools/patch.py to make it comply better. We want to make sure patches are concise and reliable. We need to be careful not to over-optimize the...

Make it better at writing small patches

I partially improved this by stopping generation on the ending `>>>>>> UPDATED` in a patch block. Works a lot better now (before it kept going, filling in the rest of...

Make it better at writing small patches

I've found that Claude 3.5 Sonnet doesn't really suffer from this anymore. Feels like a thing I cannot control anyway, it's all in the hands of the LLM. Trying to...

Save and upload logs from bot run

@gptme patch .github/workflows/bot.yml to implement this. Read the log from the `.jsonl` file. Each line has role and content keys. When the log is written to the issue inside the...

Save and upload logs from bot run

@gptme now check if there's something left to do, apply patches if so

shortcuts for shell, python, and other commands

I didn't go all the way to single-letter shortcuts, but we now have `/py` and `/sh`, or any other valid file extension for a supported executable codeblock (not many others,...

Add streaming responses to web UI

It is a lower priority :) It's not that much of a distraction, the changes required would improve the code overall anyway. The web UI is nice for browsing past...

option to suppress stdout to save tokens on User: prompt - enhancement request

I think these are good ideas, and could easily be added (and would get merged). But right now I'm personally focusing on better general ways to handle context size, see...

Multi-Machine support for Analytics (Research) and Dashboard

@casaout I will follow this with interest (due to my involvement in @ActivityWatch). Do you know when you will start building it?