Monil Patel
Monil Patel
**Describe the bug** The application fails to handle errors gracefully when API calls to Coinbase result in errors, leading to uninformative error messages for users. **To Reproduce** 1. Attempt to...
**Describe the bug** The trimTokens function is inconsistently applied across different underlying LLM calls, leading to errors when the context window is exceeded. **To Reproduce** 1. Call the LLM function...
**Is your feature request related to a problem? Please describe.** The `plugin-node` gives you an S3-compatible API. However, it assumes you're using AWS S3; it doesn't let you use S3-compatible...
# feat(scenarios): Add Step Count Evaluator Links: [Issue #5726](https://github.com/elizaOS/eliza/issues/5726) ## Summary Add an evaluator that asserts on the number of agent/tool/action steps taken to complete a scenario step. This encourages...
# feat(scenarios): Add Consistency Evaluator Links: [Issue #5726](https://github.com/elizaOS/eliza/issues/5726) ## Summary Add an evaluator that runs the same step multiple times and asserts consistency over a chosen metric (response content, length,...
# feat(scenarios): Add Cost Evaluator Links: [Issue #5726](https://github.com/elizaOS/eliza/issues/5726) ## Summary Introduce an evaluator that asserts the estimated dollar cost of LLM usage per step. Cost is derived from token counts...
# feat(scenarios): Add Token Count Evaluator Links: [Issue #5726](https://github.com/elizaOS/eliza/issues/5726) ## Summary Add an evaluator to assert on input/output/total token counts for LLM calls used during a scenario step. This establishes...
### Problem Statement Currently, ElizaOS scenario testing lacks the ability to mock internal agent runtime calls (particularly LLM interactions) when testing via the API client. This makes it difficult to...
#### **Description** Currently, the `llm_judge` evaluator provides a binary `PASS`/`FAIL` outcome. This is effective for clear-cut cases but doesn't capture the nuance of Large Language Model (LLM) responses, which can...