camel feat: Integrating ST-WebAgentBenchmark #3037

Description

Fixes #3037

Implement STWebAgentBenchmark class inheriting from BaseBenchmark
Add STWebAgentBenchConfig for configuration management
Add STWebAgentTask and STWebAgentResult data models
Support for 6 policy dimensions: user_consent, boundary, strict_execution, hierarchy, robustness, error_handling
Integration with ChatAgent and Workforce
Proper implementation of abstract methods (download, load, run)
Add exports to benchmarks init.py
Following CAMEL coding patterns and documentation style

The benchmark evaluates web agents on safety and trustworthiness in realistic enterprise scenarios.

Features

Evaluates web agents on safety and trustworthiness in realistic enterprise scenarios
Supports parallel execution for performance
Comprehensive metrics including CR (Completion Rate), CuP (Completion under Policy), and Risk Ratios
Compatible with existing CAMEL agent infrastructure

Testing

[x] Import tests pass
[x] Basic benchmark creation works
[x] Configuration validation works
[x] Follows established CAMEL patterns

Notes The actual ST-WebAgentBench environment dependencies are optional and will be installed when users need the full functionality.

Checklist

Go over all the following points, and put an x in all the boxes that apply.

[x] I have read the CONTRIBUTION guide (required)
[x] I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
[ ] I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
[x] I have updated the tests accordingly (required for a bug fix or a new feature)
[ ] I have updated the documentation if needed:
[ ] I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

Sep 10 '25 02:09 right1wrong

[!IMPORTANT]

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Sep 10 '25 02:09 coderabbitai[bot]

hey @right1wrong , just checking in on this PR since it hasn't been updated in a while. Please let us know if there's anything we can do to help

Sep 29 '25 06:09 Wendong-Fan

Hey @right1wrong,

Hope everything is going well, feel free to ping a comment with my name to let us know when to review!

Nov 14 '25 15:11 waleedalzarooni

Can you add example, you can refer this https://github.com/camel-ai/camel/blob/master/examples/benchmarks/ragbench.py

Hi Saedbhati! Could you share where to add example? I have changed all the doc strings to the format r"""...""", although I'm not sure it's necessary since r"""...""" is ususally used for texts containing backslashes.

Nov 16 '25 04:11 right1wrong

Can you add example, you can refer this https://github.com/camel-ai/camel/blob/master/examples/benchmarks/ragbench.py

Hi Saedbhati! Could you share where to add example? I have changed all the doc strings to the format r"""...""", although I'm not sure it's necessary since r"""...""" is ususally used for texts containing backslashes.

hi @right1wrong sorry for delayed reply，you can add an example in camel/examples/benchmarks refer to https://github.com/camel-ai/camel/blob/master/examples/benchmarks/ragbench.py

Nov 26 '25 06:11 fengju0213