Awesome-LM-SSP icon indicating copy to clipboard operation
Awesome-LM-SSP copied to clipboard

A reading list for large models safety, security, and privacy.

Awesome-LM-SSP

Awesome Page Views Stars

Awesome-LM-SSP

Introduction

The resources related to the trustworthiness of large models (LMs) across multiple dimensions (e.g., safety, security, and privacy), with a special focus on multi-modal LMs (e.g., vision-language models and diffusion models).

  • This repo is in progress :seedling: (manually collected).

  • Badges:

    • Model:

      • LLM
      • VLM
      • SLM
      • Diffusion
    • Comment: Benchmark New_dataset Agent CodeGen Defense RAG Chinese ...

    • Venue: conference blog OpenAI Meta AI ...

  • :sunflower: Welcome to recommend resources to us via pulling requests or opening issues with the following format:

Title Link Code Venue Classification Model Comment
aa arxiv github bb'23 A1. Jailbreak LLM Agent

News

  • [2024.08.17] We collected 34 related papers from ACL'24!
  • [2024.05.13] We collected 7 related papers from S&P'24!
  • [2024.04.27] We adjusted the categories.
  • [2024.01.20] We collected 3 related papers from NDSS'24!
  • [2024.01.17] We collected 108 related papers from ICLR'24!
  • [2024.01.09] 🚀 LM-SSP is released!

Collections

  • Book (2)
  • Competition (5)
  • Leaderboard (3)
  • Toolkit (10)
  • Survey (33)
  • Paper (1319)
    • A. Safety (728)
      • A0. General (18)
      • A1. Jailbreak (295)
      • A2. Alignment (75)
      • A3. Deepfake (58)
      • A4. Ethics (5)
      • A5. Fairness (54)
      • A6. Hallucination (109)
      • A7. Prompt Injection (44)
      • A8. Toxicity (70)
    • B. Security (203)
      • B0. General (8)
      • B1. Adversarial Examples (85)
      • B2. Poison & Backdoor (96)
      • B3. System (14)
    • C. Privacy (388)
      • C0. General (28)
      • C1. Contamination (13)
      • C2. Copyright (135)
      • C3. Data Reconstruction (44)
      • C4. Membership Inference Attacks (35)
      • C5. Model Extraction (10)
      • C6. Privacy-Preserving Computation (75)
      • C7. Property Inference Attacks (3)
      • C8. Unlearning (45)

Star History

Star History Chart

Acknowledgement