I. Deep Generative Modeling

II. Multimodal NLP

III. Music & Cinematic Technologies

I. Deep Generative Modeling

CTM

[arXiv] [demo]

Unified framework enables diverse samplers and 1-step generation SOTAs
(ICLR2024)

ICLR24

SAN

[arXiv] [code] [demo]

Enhancing GAN with metrizable discriminators
(ICLR2024)

ICLR24

Applications:
[Vocoder]

MPGD

[arXiv] [demo]

Fast, Efficient, Training-Free, and Controllable diffusion-based generation method
(ICLR2024)

ICLR24

HQ-VAE

[OpenReview] [arXiv]

Generalizing hierarchical VQ-VAEs with a Bayesian framework
(TMLR2024)

TMLR

FP-Diffusion

[PMLR] [code]

Improving density estimation of diffusion
(ICML2023)

ICML23

GibbsDDRM

[PMLR] [code]

Achieving blind inversion using DDPM
(ICML2023 Oral)

ICML23 Oral

Applications:
[DeReverb] [SpeechEnhance]

Consistency-type Models

[arXiv]

Theoretically unified framework for "consistency" on diffusion models
(ICML2023 SPIGM workshop)

ICML23 SPIGM workshop

SQ-VAE

[PMLR] [arXiv] [code]

Improving codebook utilization and training stability
(ICML2022)

AR-ELBO

[Elsevier] [arXiv]

Mitigating oversmoothness in VAE
(Neurocomputing2022)

II. Multimodal NLP

DiffuCOMET

[ACL] [arXiv] [code]

DiffuCOMET: Contextual Commonsense Knowledge Diffusion
(ACL2024)

ACL2024

CyCLIPs/CyCLAPs

[ACL] [arXiv]

On the Language Encoder of Contrastive Cross-modal Models
(ACL2024 Findings)

ACL2024

DIIR

[ACL] [arXiv] [code]

Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning
(ACL2024 Findings)

ACL2024

CPD Challenge 2023

[CPD Challenge 2023]

Commonsense Persona-grounded Dialogue Challenge

PeaCok

[ACL] [arXiv] [code]

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(ACL2023, Outstanding Paper Award)

ComFact

[EMNLP] [arXiv] [code]

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
(EMNLP2022 Findings)

III. Music & Cinematic Technologies

Mixing Graph Estimation

[arXiv] [code] [demo]

Searching For Music Mixing Graphs: A Pruning Approach
(DAFx24)

DAFx24

Guitar Amp. Modeling

[arXiv]

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data
(DAFx24)

DAFx24

Text-to-Music Editing

[arXiv] [code] [demo]

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
(IJCAI2024)

IJCAI2024

STARSS23

[arXiv] [Dataset]

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
(NeurIPS2023)

NeurIPS2023

BigVSAN Vocoder

[arXiv] [code] [demo]

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
(ICASSP2024)

ICASSP2024

Instr.-Agnostic Trans.

[arXiv]

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
(ICASSP2024)

ICASSP2024

Vocal Restoration

[arXiv]

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
(ICASSP2024)

ICASSP2024

Zero-/Few-shot SELD

[arXiv]

Zero- and Few-shot Sound Event Localization and Detection
(ICASSP2024)

ICASSP2024

CLIPSep

[OpenReview] [arXiv] [code] [demo]

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
(ICLR2023)

hFT-Transformer

[arXiv] [code]

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
(ISMIR2023)

Audio Restoration: ViT-AE

[IEEE] [arXiv] [demo]

Extending Audio Masked Autoencoders Toward Audio Restoration
(WASPAA2023)

Diffiner

[ISCA] [arXiv] [code]

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
(INTERSPEECH2023)

Automatic Music Tagging

[arXiv]

An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification
(ICASSP2023)

Vocal Dereverberation

[arXiv] [demo]

Unsupervised Vocal Dereverberation with Diffusion-based Generative Models
(ICASSP2023)

Mixing Style Transfer

[arXiv] [code] [demo]

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
(ICASSP2023)

Music Transcription

[arXiv] [code] [demo]

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
(ICASSP2023)

Singing Voice Vocoder

[arXiv] [demo]

Hierarchical Diffusion Models for Singing Voice Neural Vocoder
(ICASSP2023)

Distortion Effect Removal

[poster] [arXiv] [demo]

Distortion Audio Effects: Learning How to Recover the Clean Signal
(ISMIR2022)

Automatic Music Mixing

[poster] [arXiv] [code] [demo]

Automatic Music Mixing with Deep Learning and Out-of-Domain Data
(ISMIR2022)

Sound Separation

[IEEE]

Music Source Separation with Deep Equilibrium Models
(ICASSP2022)

Automatic DJ Transition

[arXiv] [code] [demo]

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
(ICASSP2022)

Sound Event Localization and Detection

[IEEE] [arXiv]

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
(ICASSP2022)

Singing Voice Conversion

[arXiv] [demo]

Robust One-Shot Singing Voice Conversion

Sound Separation

[video] [site]

Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years

MDX21

[site] [frontiers]

Music Demixing Challenge 2021

Sound Demixing Challenge 2023

[site] [paper (music)] [paper (cinematic)]

Sound Demixing Challenge 2023

DCASE Challenge

[DCASE Challenge2023]

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes

creativeai creativeai copied to clipboard

Metadata

I. Deep Generative Modeling

CTM

SAN

MPGD

HQ-VAE

FP-Diffusion

GibbsDDRM

Consistency-type Models

SQ-VAE

AR-ELBO

II. Multimodal NLP

DiffuCOMET

[ACL] [arXiv] [code]

CyCLIPs/CyCLAPs

[ACL] [arXiv]

DIIR

[ACL] [arXiv] [code]

CPD Challenge 2023

PeaCok

ComFact

III. Music & Cinematic Technologies

Mixing Graph Estimation

Guitar Amp. Modeling

Text-to-Music Editing

STARSS23

BigVSAN Vocoder

Instr.-Agnostic Trans.

Vocal Restoration

Zero-/Few-shot SELD

CLIPSep

hFT-Transformer

Audio Restoration: ViT-AE

Diffiner

Automatic Music Tagging

Vocal Dereverberation

Mixing Style Transfer

Music Transcription

Singing Voice Vocoder

Distortion Effect Removal

Automatic Music Mixing

Sound Separation

Automatic DJ Transition

Sound Event Localization and Detection

Singing Voice Conversion

Sound Separation

MDX21

Sound Demixing Challenge 2023

DCASE Challenge

Contact

Yuki Mitsufuji ([email protected])

← Metadata

Owner

Metadata

creativeai
creativeai copied to clipboard