News

Conferences
- CVPR 2023
  - 일시/장소: 6. 18 - 22, Vancouver convention center
  - Main and Expo: 20 - 22, Workshop and Tutorial: 18-19
  - 국내 부스: LG, 현대차 등 (네이버 논문 8개 포스터 방문 많이 해주세요)
- EMNLP 2023
  - Abs & Full: 16일, 23일 (AoE)
- EU AI 법안과 CRFM의 LLM 조건 만족도 평가

ArXiv

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
- U of Maryland 에서 나온 Open-source small LLM과 Big tech black box LLM을 함께 활용하는 방법
- sLLM은 soft prompt (보통 벡터형태) 와 사용자 입력을 black box LLM에 입력하기 위한 instruct로 변환
- 변환된 instruct + 사용자 예시가 black box LLM으로 들어가면 결과와 스코어 기반으로 bayesian optimization 해서 다시 soft prompt 업데이트
- 코드는 여기: https://github.com/Lichang-Chen/InstructZero
- 프로젝트 페이지는 여기: https://lichang-chen.github.io/InstructZero/
Knowledge Distillation of Large Language Models
- CoAI group, 칭화대, MSR 에서 나온 연구
- Reverse KLD를 활용한 Whitebox 시나리오 기반의 LLM --> sLLM, 기존 Foward KLD 기반 KD보다 안정적
- Foward KLD가 teacher가 뱉는 걸 student가 외운다 느낌이면 Reverse KLD KD는 student가 더 잘 생성하도록 teacher를 통해 가이드한다 느낌.
- 코드는 https://github.com/microsoft/LMOps/tree/main/minillm

Jun 17 '23 23:06 jungwoo-ha

News

AMD에서 AI Accelerator GPU 시장에 진출하기 위해 최근 노력이 활발해지고 있습니다.

Announcement: https://www.amd.com/en/newsroom/press-releases/2023-6-13-amd-expands-leadership-data-center-portfolio-with-.html HuggingFace Blogpost: https://huggingface.co/blog/huggingface-and-amd

현재 NVIDIA CUDA의 독점 체제에서 AMD가 가장 유사한 시장 위치에 있으나 CUDA 등 여러 장벽에 의해 영향을 가지지 못했습니다. 그러나 PyTorch에서도 AMD의 ROCm backend를 지원하며 이번에 HuggingFace에서도 AMD를 도입하기 위해 적극적으로 노력하고 있어 Python frontend를 사용하는 대다수의 유저는 곧 코드를 그대로 사용하면서 AMD를 사용할 수 있는 방향으로 움직임이 있습니다.

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

ArXiv: https://arxiv.org/abs/2306.07899 GitHub: https://github.com/epfl-dlab/GPTurk

Jun 18 '23 03:06 veritas9872

Research

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust

ArXiv: https://arxiv.org/abs/2305.20030 GitHub: https://github.com/YuxinWenRick/tree-ring-watermark

DDIM 기반의 Diffusion Model에서 결과물을 해치지 않으면서 robust한 watermark를 심는 방버이 제안되었습니다. 해당 방법은 최초 latent space의 noise에서 Fourier space에서 concentric ring을 tree-ring pattern으로 생성하는 방법을 활용하는데 Fourier Transform의 성질에 의해 rotation, noise injection, cropping 등 다양한 image space perterbation에 대해 강건하면서 최종 결과물의 현실감을 해치지 않는 방법입니다. Diffusion process를 실행해야 하기 때문에 기존 모델을 소유해야지만 확인이 가능하지만 Diffusion Model 기반의 서비스를 하시는 많은 분들께 도움이 될 것 같습니다.

Jun 18 '23 06:06 veritas9872

News

Evidence for the use of quantum computing before fault tolerance
- demonstrated for the first time that quantum computers with 100+ qubits can produce accurate results - and reach beyond leading classical approaches
- new error mitigation!

Papers

Simplicity Bias Leads to Amplified Performance Disparities (FAccT 2023)
- 배경: SGD has simplicity bias (bias towards a simple solution, i.e., prioritize learning majority class)
- How to quantify this? difficulty disparity, difficulty amplification factor
- This is model-specific! (even with a balanced dataset)
- Conclusion: we should use a model-specific fairness audit (post-training audit)
"Controversial" paper: Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models (MIT, Boston University)
- GPT4가 MIT EECS를 100% 맞는다........?
- https://flower-nutria-41d.notion.site/No-GPT4-can-t-ace-MIT-b27e6796ab5a48368127a98216c76864
- TL;DR:
  - Evaluation: The authors' evaluation uses GPT 4 to score itself and repeatedly prompts until the correct answer is reached.
  - Dataset leakage: Significant leakage and duplication in the prompts.
- 아직 멀었는걸까요,, ㅎㅎ
Recent papers in optimization theory:
new optimizers
- Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training (arXiv 2023)
parameter-free SGD! (no need to tune initial learning rate)
- Making SGD Parameter-Free (COLT 2022)
- Prodigy: An Expeditiously Adaptive Parameter-Free Learner (arXiv 2023)
...등등
Explosion in NLP Theories!! (수학에 관심있으신 분들은 한번 ~~내년부터~~ 같이 해보시죠 ㅎㅎ)
- 최적화 관점에서!
  - A Kernel-Based View of Language Model Fine-Tuning (ICML 2023)
- ICL (in-context learning)의 작동원리
  - ex) Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection (arXiv 2023)
- Prompt tuning의 작동원리
  - ex) On the Role of Attention in Prompt-tuning (ICML 2023)
- Transformer의 작동원리
  - ex) Transformers learn through gradual rank increase (arXiv 2023)
  - ex)Faith and Fate: Limits of Transformers on Compositionality
- ...등등

Jun 18 '23 11:06 nick-jhlee

Health system-scale language models are all-purpose prediction engines

AI 비서가 함께 치료를 관찰하고 예측과 조언을 제공하는 것은 의사들이 AI에게 기대하는 미래 중 하나겠지요.
대규모 의료 시스템의 전체 EHR에서 LLM인 NYUTron을 교육하여 광범위한 임상 및 운영 작업에서 의사의 메모(clinical note)를 읽고 이러한 예측 중 몇 가지를 수행하는 모델이 나왔습니다.
실시간 의료 환경에 NYUTron을 배포하고 임상 워크플로에 원활하게 통합되는지를 확인했고,30일 재입원을 예측하는 데서 효과를 확인했습니다.
NYUTon은 구조화 하지 않은 임상 노트를 가지고 (1) 30일 내 재입원 예측, (2) 병원 내 사망 예측, (3) 동반 질환 지수 예측, (4) 입원 기간(LOS; Length of Stay) 예측, (5) 보험 거부 예측을 수행했고 78.7–94.9%의 AUC를 보였습니다.(기존 모델에 비해 AUC가 5.36–14.7% 향상)

a , NYU Langone EHR에 두 가지 유형의 데이터 세트를 불러왔습니다.
사전 훈련 데이터 세트인 NYU Notes에는 10년간의 입원 환자 임상 기록(환자 387,144명, 41억 단어)이 포함되어 있습니다.
5개의 미세 조정 데이터 세트가 있습니다. 각각은 작업별 레이블(2-4개 클래스)과 함께 1-10년 간의 입원 환자 임상 노트(55,791-413,845명의 환자, 51-87백만 단어)를 포함합니다.
b , EHR에 포함된 의학적 언어에 대한 사전 훈련된 모델을 생성하기 위해 전체 EHR에서 NYUTron이라고 하는 1억 9백만 개의 매개변수 BERT와 유사한 LLM을 사전 훈련했습니다. MLM(Masked Language Modeling) 을 사용했고요.
c , 이후에 특정 작업(예: 30일 전 원인 재입원 예측)에 대해 사전 훈련된 모델을 미세 조정하고 검증했습니다.
d, 마지막으로 미세 조정된 모델을 가속 형식으로 압축하고 NYU Langone EHR과 인터페이스하여 퇴원 기록 추론 엔진에 로드했습니다.

후향적연구 -> 전향적 연구 진행(30일내 재입원)
개발 환경 외부에서 NYUTron의 성능을 평가하기 위해 후향적 시험 결과를 기반으로 모델을 선택하고 2022년 1월부터 4월까지 전향적 시험을 실행했습니다.
EHR과 인터페이스하는 엔진은 치료 의사가 서명한 퇴원 기록을 읽을 수 있습니다.
이 기간 동안 29,286명의 퇴원 환자가 있었고 3,271명의 환자(11.17%)가 30일 이내에 복귀했습니다.
NYUTron은 3,271건의 재입원 중 2,692건(recall 82.30%)을 precision 20.58%으로 예측했고,AUC는 78.70%입니다.(https://www.nature.com/articles/s41586-023-06160-y#Fig4)
6명의 의사로 구성된 패널이 잠재적 임상 영향에 대해서 NYUTron의 결과를 검토했는데,.NYUTron이 성공적으로 식별한 100건의 재입원 중 61%는 계획되지 않은 재입원이었고, 이 중 27%는 퇴원 시 예방 가능했습니다.

Jun 18 '23 11:06 sujungleeml

WeeklyArxivTalk
WeeklyArxivTalk copied to clipboard

[20230618] Weekly AI ArXiv 만담 시즌2 - 20회차

News

ArXiv

News

AMD에서 AI Accelerator GPU 시장에 진출하기 위해 최근 노력이 활발해지고 있습니다.

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

Research

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust

News

Papers

WeeklyArxivTalk WeeklyArxivTalk copied to clipboard

[20230618] Weekly AI ArXiv 만담 시즌2 - 20회차

News

ArXiv

News

AMD에서 AI Accelerator GPU 시장에 진출하기 위해 최근 노력이 활발해지고 있습니다.

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

Research

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust

News

Papers

WeeklyArxivTalk
WeeklyArxivTalk copied to clipboard