ltu icon indicating copy to clipboard operation
ltu copied to clipboard

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Results 43 ltu issues
Sort by recently updated
recently updated
newest added

Hi, Thank you for the great work and the detailed documentation you have provided. It's been very helpful. I'm trying to use the 13B model instead of the default 7B...

bug

Hi,sir: I find the prompts for training and testing for audio event classification are different in the code. In the train task ”cla_label”, one example of the question is "Identify...

question

Hi~ Can you provide a download script or download links for OpenAQA's audio data? This can help us save a lot time so we can pay more attention on other...

question

Hi, first thanks for this awesome work. I'm trying to rewrite the training code for ltu-as while I find that the `cutoff_len` for stage 1 and 2 is 108 which...

question

when I use eval code 'eval_esc.py' [https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/eval/eval_esc.py](url) The following error occurs: ``` from stats import calculate_stats ImportError: cannot import name 'calculate_stats' from 'stats' (/home/aipf/work/miniconda3/envs/venv_ltu_as/lib/python3.10/site-packages/stats.py) ``` when I use eval code...

bug
reproduction

Hi Yuan, Could you please let me know the LICENSE of your trained models and the created AQA datasets? That would be very help! Thanks!

File "/transformers/tokenization_utils_base.py", line 708, in as_tensor return torch.tensor(value) ValueError: too many dimensions 'str' ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have...

bug

Hi, I have encountered the error when I run the [stage1_proj_cla.sh](https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/train_scripts/stage1_proj_cla.sh), both the `base_model` and `data_path` are keep the same, and I also change the script to finetune_low_resource.py with smaller...

bug

Hi, may I ask what the maximum allowable length is for audio input? Would a 1-minute WAV file be within the acceptable range? Thank you!

question

I'm encountering a problem with the local inference of LTU/LTU_AS. I've modified the script for local inference to allow checking its output on any 16k WAV file, but I'm facing...

bug