thu-coai
thu-coai
CharacterGLM-6B
CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
COLDataset
The official repository of the paper: COLD: A Benchmark for Chinese Offensive Language Detection
DiaSafety
This repo is for the paper: On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
SafetyBench
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety.
ShieldLM
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
TaiLr
ICLR2023 - Tailoring Language Generation Models under Total Variation Distance
Targeted-Data-Extraction
Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation"