[FEATURE]: Prompt + file edit logging for benchmarking

Open 0xrushi opened this issue 10 hours ago • 1 comments

[x] I have verified this feature I'm about to request hasn't been suggested before.

Hi! Would it be possible to add logging for:

This would make it easier to replay the exact same prompt(s) and outputs for benchmarking across different models.

Optionally, we could also support an evaluation mode where an LLM acts as a judge to score/compare results for the same session.

Jan 18 '26 01:01 0xrushi