Mahmoud Mabrouk issues

Results 116 issues of


                                            Mahmoud Mabrouk

[AGE-154] [Fea] Show the inputs for spans

Currently only the outputs for spans are shown. From [SyncLinear.com](https://synclinear.com) | [AGE-154](https://linear.app/agenta/issue/AGE-154/[fea]-show-the-inputs-for-spans)

enhancement

Task

[AGE-144] [Bug] Warning Default value gpt-3.5-turbo provided but choices are empty.

Creating an app from code. If using the line *model_1*=ag.GroupedMultipleChoiceParam( *default*="gpt-3.5-turbo", *choices*=get_all_supported_llm_models()), We get the warning when running the app from shell (or in the lambda/docker logs) From [SyncLinear.com](https://synclinear.com) |...

bug

Low priority

2 points

[sub-issue] Evaluation fails when correct_answer is not set in the test set

To reproduce: - Create a test set with a correct_answer - Run an evaluation The evaluation will fail and the result will not be seen Expected behavior: The evaluation will...

bug

stale

Fix code scanning alert -

I think we should disable this for cloud and keep it only for oss. @aakrem Tracking issue for: - [x] https://github.com/Agenta-AI/agenta/security/code-scanning/26

stale

[AGE-120] When adding a data point to a test set, there is never a default set selected

In the playground, each time we add a datapoint to a test set, the default selected option in the multi-select down below is the "+Add new". For users who are...

enhancement

good first issue

low complexity

Task

Low priority

Subtask

2 points

Improve the workflow for classification in entity recognition

In entity recognition tasks, users often need to evaluate multiple outputs. For example, if the task is to extract the author and date from a PDF, the user might create...

enhancement

dev experience

[AGE-277] Enable streaming mode for the SDK

**Is your feature request related to a problem? Please describe.** Using llms without streaming is slow. **Describe the solution you'd like** Add a feature to stream outputs to the SDK...

dev experience

SDK

linear

[Roadmap] Improve experimentation and evaluation workflow from CLI

We would like to enable the users to run the evaluation workflow from the CLI (without going through the UI). The UI in this case would be mostly used to...

Roadmap

December 2023

[AGE-271] View the configuration used in each LLM app in the evaluation comparison view

We would like to the table for evaluation comparison view at the header of each output the data about the configuration that is used for that evaluation. Tasks: - [...

Frontend

incomplete description

evaluation

design needed

linear

[Roadmap] Update templates and cookbook

- [ ] #1045 (requires #1044) - [x] #1066 - [ ] #941 - [x] #940 - [ ] Create template with images - [ ] Create link to code...

Roadmap