prompt-testing topics

LLM-RGB

122

Stars

9

Forks

Watchers

LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.

babelcloud

benchmark

llm

prompt

prompt-engineering

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...

promptfoo

llm

llmops

prompt-engineering

prompt-testing

agentic_security

1.7k

Stars

229

Forks

1.7k

Watchers

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

msoedov

llm-fuzzer

llm-fuzzer-aggregator

llm-fuzzing

llm-guardrails