awesome-machine-learning-interpretability [Ongoing] Knowledge base additions

~executive order on AI~
~NIST 800-30 rev1 https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-30r1.pdf~
~IEEE 1012 (199X or 2016) https://people.eecs.ku.edu/~hossein/Teaching/Stds/1012.pdf~
~https://www.uspto.gov/sites/default/files/documents/USPTO_AI-Report_2020-10-07.pdf~
~https://www.commerce.gov/issues/intellectual-property (see if you think it misses the mark, it might b/c I don't see an AI focus)~
~https://standards.ieee.org/ieee/3119/10729/~

Oct 27 '23 19:10 jphall663

~https://www.frontiermodelforum.org/uploads/2023/10/FMF-AI-Red-Teaming.pdf~

~https://github.com/openai/openai-cookbook/tree/main~

Oct 30 '23 12:10 jphall663

~https://resources.oreilly.com/examples/0636920415947/-/blob/master/Attack_Cheat_Sheet.png <- community resources~

Oct 30 '23 12:10 jphall663

All added. Waiting on EO. Decided to go ahead and add the "Intellectual property" page because I could still imagine it being a useful resource/portal (especially considering the USTPO falls under it, and that contains a specific resource we link to).

Oct 30 '23 17:10 datherton09

~https://www.imda.gov.sg/resources/press-releases-factsheets-and-speeches/press-releases/2023/generative-ai-evaluation-sandbox <- GAI resources~

Oct 31 '23 15:10 jphall663

[ALL ADDED, 2/21/2024]

benchmarks:

https://wavesbench.github.io/ https://github.com/huggingface/evaluate https://github.com/AI-secure/DecodingTrust https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vQObeTxvXtOs--zd98qG2xBHHuTTJOyNISBJPthZFr3at2LCrs3rcv73d4of1A78JV2eLuxECFXJY43/pubhtml https://safetyprompts.com/ python software:

https://github.com/lilacai/lilac official guidance:

https://www.ohchr.org/sites/default/files/documents/issues/business/b-tech/taxonomy-GenAI-Human-Rights-Harms.pdf community resources:

https://www.hackerone.com/vulnerability-and-security-testing-blog https://www.synack.com/wp-content/uploads/2022/09/Crowdsourced-Security-Landscape-Government.pdf CSET stuff (just double check we reference somehow): -- https://cset.georgetown.edu/article/translating-ai-risk-management-into-practice/ -- https://cset.georgetown.edu/publication/repurposing-the-wheel/ -- https://cset.georgetown.edu/publication/adding-structure-to-ai-harm/ -- https://cset.georgetown.edu/article/understanding-ai-harms-an-overview/ -- https://cset.georgetown.edu/publication/ai-incident-collection-an-observational-study-of-the-great-ai-experiment/ https://www.scsp.ai/wp-content/uploads/2023/11/SCSP_JHU-HCAI-Framework-Nov-6.pdf https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation https://c2pa.org/ https://aiverifyfoundation.sg/downloads/Cataloguing_LLM_Evaluations.pdf https://partnershiponai.org/modeldeployment/ https://cdn.openai.com/openai-preparedness-framework-beta.pdf

https://dominiquesheltonleipzig.com/country-legislation-frameworks/

red-teaming section:

https://www.hackerone.com/thought-leadership/ai-safety-red-teaming https://cset.georgetown.edu/article/what-does-ai-red-teaming-actually-mean/

Feb 21 '24 17:02 datherton09

Red teaming -- but do we want to start hosting papers?

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal (2024) Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendryckshttps://arxiv.org/pdf/2402.04249.pdf

Red-Teaming for Generative AI: Silver Bullet or Security Theater? Michael Feffer, Anusha Sinha, Zachary C. Lipton, Hoda Heidarihttps://arxiv.org/pdf/2401.15897.pdf

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yanghttps://arxiv.org/pdf/2310.00322.pdf

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment (2023)https://arxiv.org/pdf/2308.09662.pdf

Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases Rishabh Bhardwaj, Soujanya Poriahttps://arxiv.org/pdf/2310.14303.pdf

Mar 15 '24 01:03 jphall663

GAI Critiques:

reasoning gap: https://arxiv.org/pdf/2402.19450.pdf
stealing language models: https://arxiv.org/pdf/2403.06634.pdf
dialect prejudice: https://arxiv.org/pdf/2403.00742.pdf

Mar 15 '24 01:03 jphall663