[AGE-2919] Support more libraries in custom evaluators in Python

Open mmabrouk opened this issue 5 months ago • 1 comments

Discussed in https://github.com/Agenta-AI/agenta/discussions/2691

^{Originally posted by mmabrouk July 21, 2025}

Problem

You can currently create custom evaluators using Python code, however the libraries that can be used are heavily limited due to the use of RestrictedPython. It would be great if we could use any library in custom code.

@badcandy From Slack

I really like that Agenta allows defining custom evaluators in Python and registering them so they can be used within the UI during evaluation. However, I noticed that due to the restrictions of RestrictedPython, the libraries available for use are limited. I have two questions regarding this: Is there a way to use external libraries like scikit-learn in a custom evaluator, or is it currently not possible?

_AGE-2919

Jul 21 '25 08:07 mmabrouk

Right now, custom evaluators in Agenta are limited to a small set of whitelisted libraries: math, random, datetime, json, requests, and typing. This is enforced by the use of RestrictedPython, which runs evaluator code in a sandbox and only allows imports from that hardcoded list. External libraries like scikit-learn can't be used in custom evaluators because they're not included in the allowed modules, and there isn't a way to extend this list or configure additional imports in the current design. All evaluator code is routed through these sandboxing controls for security reasons, and there are no hidden workarounds or exceptions for these restrictions. You can see the implementation details in the sandbox code here.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jul 21 '25 08:07 dosubot[bot]