rai icon indicating copy to clipboard operation
rai copied to clipboard

Improve benchmark integration with langfuse/langsmith

Open MagdalenaKotynia opened this issue 9 months ago • 3 comments

Is your feature request related to a problem? Please describe.

Related to #455. Currently implemented integration with langfuse and langsmith could be improved.

Describe the solution you'd like

Proposed improvements:

Tracking platform (CPU/GPU etc.) and what compute resource is currently used by the model:

  • Purpose: To reliably compare the latency between the models
  • Effort: Low

Adding task ID

  • Purpose: Sometimes I wanted to check tracking for a specific task and it was hard to search it. It can be useful for comparing models’ performance for a specific task
  • Effort: Low

Tracking commit hash

  • Purpose: to compare different versions of RAI, check performance after some changes in code, make sure that we compare something reliably
  • Effort: Low

Tracking session ID

  • Purpose: To easily filter one benchmark run from all runs.
  • Effort: Low

Introduce Error Codes

  • Purpose: To easily filter errrors/pass them to fine-tuning workflow
  • Effort: Low

Additional context

Errors included in Comment column could have error category IDs. (screen from Langfuse) Image

Session column in langfuse could be used (in langsmith other mechanism could be used, I didn't notice explicit session id there)

Image

MagdalenaKotynia avatar Mar 21 '25 14:03 MagdalenaKotynia

@MagdalenaKotynia what's the current timeline for this task? Anyone working on this?

maciejmajek avatar Apr 17 '25 15:04 maciejmajek

@maciejmajek I haven't started working on this task. This enhancement proposal was created as a future enhancement after more priority work on the tool calling benchmark is done. I suggest to start working on it after the refactor of rai_bench in #517 is finished and merged. I edited the issue and added some other proposed improvements related to tracing.

MagdalenaKotynia avatar Apr 18 '25 08:04 MagdalenaKotynia

I'm removing this issue from RAI 2.0, due to time constraints as well as low priority of the task

maciejmajek avatar May 03 '25 22:05 maciejmajek

applied here https://github.com/RobotecAI/rai/pull/606

jmatejcz avatar Jun 09 '25 16:06 jmatejcz