ltu icon indicating copy to clipboard operation
ltu copied to clipboard

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Results 43 ltu issues
Sort by recently updated
recently updated
newest added

Hi, I have another question about the model related configuration settings during batch inference after model fine tuning. In the inference_batch.py script for LTU-AS provided below: ``` def main( load_8bit:...

question

Hello, Thank you so much for sharing the code. Great work on the repo!! I am trying to run the code for LTU openaqa, I've completed the first 3 stage...

bug

Hi, I have a question about the LTU-AS FT. I saw the model used in [finetune.py](https://github.com/YuanGongND/ltu/blob/6869e4780d332b5758662091bad1c69daa572ca9/src/ltu_as/finetune.py) is only trained based on `LlamaForCausalLM`. However, since there has many classification downstream tasks...

question

Hi, I have a question about LTU-AS multi-GPU training, may I kindly ask if this repo support multiple GPU training? Since I didn't saw related configures (e.g. accelerate, deepspeed). Thank...

question

Hi, I have a question about the base model for ft and training stage 1. Since I saw the base model for FT is `ltuas_long_noqa_a6.bin`, which is only 187MB, and...

question

Hello, thank you for providing such a good idea of research on audio question answering. I have some questions about the LTU_AS: 1. For ASR task. During inference period(refer to...

question

In the LTU paper you say you will distribute the dataset after the peer review process. I noticed that you have been accepted to ASRU 2023 for your LTU-AS paper...

enhancement

Hello, I would like to ask about the following 2 questions: 1. If there if any shell scipt to run extract_whisper_feature.py? since I don't know what is the parameters of...

question

It seems missing the tokenize the audio (from 'input_ids') step both in finetune.py/finetune_low_resource.py of the LTU repo. Where is the detailed coding step for audio tokenization? I saw the 'load_audio()'...

question

Hello, I would like to ask, how do you test the audio in the LibriSpeech dataset that exceeds 10 seconds in duration?I'm encountering an issue while using the LibriSpeech dataset...

question