deception
deception copied to clipboard

Published 1 year ago •

→

Metadata

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation met...

Reame
Issues

Results 0 deception issues

Sort by recently updated

About

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation met...

machine-learning

nlp

language-model

gemini

ai-safety

model-evaluation

llm

ai-benchmarks

mistral

disinformation

ai-security

ai-evaluation

llama

claude

llm-benchmarking

gpt4o

31

Stars

2

Forks

31

Watchers

Owner

lechmazur

← Metadata

31

Stars

2

Forks

31

Watchers

Owner

lechmazur

Metadata

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation met...

Back

deception deception copied to clipboard

Metadata

← Metadata

Owner

Metadata

deception
deception copied to clipboard