ltu issues

no evaluation script for open-set problem

1

Hello, thank you for providing a good idea of research on audio question answering. When I was testing, I found that there was no evaluation script for open-set problem in...

alexaway

question

whisper-at on cuda:1

1

Hello, I am trying to setup the LTU-AS system for local inference. I got an error because I only have one GPU, is there a reason why whisper-at is moved...

alexanderwerning

bug

Model Parallelization

5

Hie, Thanks for opensourcing this amazing work. Is there any parameter to parallize the model to run on smaller gpus. I was not able to find one in config. As...

BhashaBluff

enhancement

Hi, thank you for your wonderful work! I've tried to run "finetune_toy.sh" following this: # prepare toy data and pretrained models ./prep_train.sh # run finetuning on the data ./finetune_toy.sh But...

zengxijuan

bug

Which model is 7B (Default) and which is 13B (Beta)?

12

Are models downloaded from `inference.sh` 7B (Default) or 13B (Beta)? I found the latter quite error prone and not stable, which is similar to what I'm observing now locally. I...

yl4579

question

Question about the Realism of Simulated Acoustic Event Combinations in Data Generation

2

Hi, @YuanGongND, thanks for the excellent work. I have carefully read through your paper and I am intrigued by the methodology you employed in generating simulation data. The approach of...

haoxiangsnr

question

About the experimental results of the paper LTU-AS

1

Hello, I've been reading the LTU-AS paper recently, and I'm a bit confused about the ablation experiments mentioned in the paper. It states that using only spoken text as input...

yangyuxiang1996

question

Questions about data construction

1

Hello, thank you for your excellent work. I have a few questions about data construction: 1. How do different data sets allocate the proportion to generate QA pairs? For example,...

zengxijuan

good first issue

question

why pad_or_trim use 1000 rather than 3000 when transcribe_audio?

1

why pad_or_trim use 1000 rather than 3000 when transcribe_audio? `mel = pad_or_trim(mel, 1000).to(model.device).to(dtype)`

peggyxpxu

question

Where use whisper decoder?

1

Hi,sir: 'Whisper Decoder' wav mentioned in ltu-as paper Fig.1. But I don't see whisper decoder being used anywhere. Could you please explain why? Thank you!

peggyxpxu

ltu
ltu copied to clipboard

Metadata

no evaluation script for open-set problem

whisper-at on cuda:1

Model Parallelization

vicuna_ltu model file missing

Which model is 7B (Default) and which is 13B (Beta)?

Question about the Realism of Simulated Acoustic Event Combinations in Data Generation

About the experimental results of the paper LTU-AS

Questions about data construction

why pad_or_trim use 1000 rather than 3000 when transcribe_audio?

Where use whisper decoder?

← Metadata

Owner

Metadata

ltu ltu copied to clipboard

Metadata

← Metadata

Owner

Metadata

ltu
ltu copied to clipboard