awesome-machine-learning-interpretability icon indicating copy to clipboard operation
awesome-machine-learning-interpretability copied to clipboard

[Ongoing] Knowledge base additions

Open jphall663 opened this issue 2 years ago • 7 comments

  • ~executive order on AI~
  • ~NIST 800-30 rev1 https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-30r1.pdf~
  • ~IEEE 1012 (199X or 2016) https://people.eecs.ku.edu/~hossein/Teaching/Stds/1012.pdf~
  • ~https://www.uspto.gov/sites/default/files/documents/USPTO_AI-Report_2020-10-07.pdf~
  • ~https://www.commerce.gov/issues/intellectual-property (see if you think it misses the mark, it might b/c I don't see an AI focus)~
  • ~https://standards.ieee.org/ieee/3119/10729/~

jphall663 avatar Oct 27 '23 19:10 jphall663

~https://www.frontiermodelforum.org/uploads/2023/10/FMF-AI-Red-Teaming.pdf~

~https://github.com/openai/openai-cookbook/tree/main~

jphall663 avatar Oct 30 '23 12:10 jphall663

~https://resources.oreilly.com/examples/0636920415947/-/blob/master/Attack_Cheat_Sheet.png <- community resources~

jphall663 avatar Oct 30 '23 12:10 jphall663

All added. Waiting on EO. Decided to go ahead and add the "Intellectual property" page because I could still imagine it being a useful resource/portal (especially considering the USTPO falls under it, and that contains a specific resource we link to).

datherton09 avatar Oct 30 '23 17:10 datherton09

~https://www.imda.gov.sg/resources/press-releases-factsheets-and-speeches/press-releases/2023/generative-ai-evaluation-sandbox <- GAI resources~

jphall663 avatar Oct 31 '23 15:10 jphall663

[ALL ADDED, 2/21/2024]

benchmarks:

https://wavesbench.github.io/ https://github.com/huggingface/evaluate https://github.com/AI-secure/DecodingTrust https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vQObeTxvXtOs--zd98qG2xBHHuTTJOyNISBJPthZFr3at2LCrs3rcv73d4of1A78JV2eLuxECFXJY43/pubhtml https://safetyprompts.com/ python software:

https://github.com/lilacai/lilac official guidance:

https://www.ohchr.org/sites/default/files/documents/issues/business/b-tech/taxonomy-GenAI-Human-Rights-Harms.pdf community resources:

https://www.hackerone.com/vulnerability-and-security-testing-blog https://www.synack.com/wp-content/uploads/2022/09/Crowdsourced-Security-Landscape-Government.pdf CSET stuff (just double check we reference somehow): -- https://cset.georgetown.edu/article/translating-ai-risk-management-into-practice/ -- https://cset.georgetown.edu/publication/repurposing-the-wheel/ -- https://cset.georgetown.edu/publication/adding-structure-to-ai-harm/ -- https://cset.georgetown.edu/article/understanding-ai-harms-an-overview/ -- https://cset.georgetown.edu/publication/ai-incident-collection-an-observational-study-of-the-great-ai-experiment/ https://www.scsp.ai/wp-content/uploads/2023/11/SCSP_JHU-HCAI-Framework-Nov-6.pdf https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation https://c2pa.org/ https://aiverifyfoundation.sg/downloads/Cataloguing_LLM_Evaluations.pdf https://partnershiponai.org/modeldeployment/ https://cdn.openai.com/openai-preparedness-framework-beta.pdf

https://dominiquesheltonleipzig.com/country-legislation-frameworks/

red-teaming section:

https://www.hackerone.com/thought-leadership/ai-safety-red-teaming https://cset.georgetown.edu/article/what-does-ai-red-teaming-actually-mean/

datherton09 avatar Feb 21 '24 17:02 datherton09

Red teaming -- but do we want to start hosting papers?

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal (2024) Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendryckshttps://arxiv.org/pdf/2402.04249.pdf

Red-Teaming for Generative AI: Silver Bullet or Security Theater? Michael Feffer, Anusha Sinha, Zachary C. Lipton, Hoda Heidarihttps://arxiv.org/pdf/2401.15897.pdf

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yanghttps://arxiv.org/pdf/2310.00322.pdf

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment (2023)https://arxiv.org/pdf/2308.09662.pdf

Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases Rishabh Bhardwaj, Soujanya Poriahttps://arxiv.org/pdf/2310.14303.pdf

jphall663 avatar Mar 15 '24 01:03 jphall663

GAI Critiques:

  • reasoning gap: https://arxiv.org/pdf/2402.19450.pdf
  • stealing language models: https://arxiv.org/pdf/2403.06634.pdf
  • dialect prejudice: https://arxiv.org/pdf/2403.00742.pdf

jphall663 avatar Mar 15 '24 01:03 jphall663