ltu
ltu copied to clipboard
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Hi, Thank you for the great work and the detailed documentation you have provided. It's been very helpful. I'm trying to use the 13B model instead of the default 7B...
Hi,sir: I find the prompts for training and testing for audio event classification are different in the code. In the train task ”cla_label”, one example of the question is "Identify...
Hi~ Can you provide a download script or download links for OpenAQA's audio data? This can help us save a lot time so we can pay more attention on other...
Hi, first thanks for this awesome work. I'm trying to rewrite the training code for ltu-as while I find that the `cutoff_len` for stage 1 and 2 is 108 which...
when I use eval code 'eval_esc.py' [https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/eval/eval_esc.py](url) The following error occurs: ``` from stats import calculate_stats ImportError: cannot import name 'calculate_stats' from 'stats' (/home/aipf/work/miniconda3/envs/venv_ltu_as/lib/python3.10/site-packages/stats.py) ``` when I use eval code...
Hi Yuan, Could you please let me know the LICENSE of your trained models and the created AQA datasets? That would be very help! Thanks!
File "/transformers/tokenization_utils_base.py", line 708, in as_tensor return torch.tensor(value) ValueError: too many dimensions 'str' ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have...
Hi, I have encountered the error when I run the [stage1_proj_cla.sh](https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/train_scripts/stage1_proj_cla.sh), both the `base_model` and `data_path` are keep the same, and I also change the script to finetune_low_resource.py with smaller...
Hi, may I ask what the maximum allowable length is for audio input? Would a 1-minute WAV file be within the acceptable range? Thank you!
I'm encountering a problem with the local inference of LTU/LTU_AS. I've modified the script for local inference to allow checking its output on any 16k WAV file, but I'm facing...