ltu icon indicating copy to clipboard operation
ltu copied to clipboard

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Results 43 ltu issues
Sort by recently updated
recently updated
newest added

Hello, thank you for providing a good idea of research on audio question answering. When I was testing, I found that there was no evaluation script for open-set problem in...

question

Hello, I am trying to setup the LTU-AS system for local inference. I got an error because I only have one GPU, is there a reason why whisper-at is moved...

bug

Hie, Thanks for opensourcing this amazing work. Is there any parameter to parallize the model to run on smaller gpus. I was not able to find one in config. As...

enhancement

Hi, thank you for your wonderful work! I've tried to run "finetune_toy.sh" following this: # prepare toy data and pretrained models ./prep_train.sh # run finetuning on the data ./finetune_toy.sh But...

bug

Are models downloaded from `inference.sh` 7B (Default) or 13B (Beta)? I found the latter quite error prone and not stable, which is similar to what I'm observing now locally. I...

question

Hi, @YuanGongND, thanks for the excellent work. I have carefully read through your paper and I am intrigued by the methodology you employed in generating simulation data. The approach of...

question

Hello, I've been reading the LTU-AS paper recently, and I'm a bit confused about the ablation experiments mentioned in the paper. It states that using only spoken text as input...

question

Hello, thank you for your excellent work. I have a few questions about data construction: 1. How do different data sets allocate the proportion to generate QA pairs? For example,...

good first issue
question

why pad_or_trim use 1000 rather than 3000 when transcribe_audio? `mel = pad_or_trim(mel, 1000).to(model.device).to(dtype)`

question

Hi,sir: 'Whisper Decoder' wav mentioned in ltu-as paper Fig.1. But I don't see whisper decoder being used anywhere. Could you please explain why? Thank you!