ltu
ltu copied to clipboard
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Hi, I have another question about the model related configuration settings during batch inference after model fine tuning. In the inference_batch.py script for LTU-AS provided below: ``` def main( load_8bit:...
Hello, Thank you so much for sharing the code. Great work on the repo!! I am trying to run the code for LTU openaqa, I've completed the first 3 stage...
Hi, I have a question about the LTU-AS FT. I saw the model used in [finetune.py](https://github.com/YuanGongND/ltu/blob/6869e4780d332b5758662091bad1c69daa572ca9/src/ltu_as/finetune.py) is only trained based on `LlamaForCausalLM`. However, since there has many classification downstream tasks...
Hi, I have a question about LTU-AS multi-GPU training, may I kindly ask if this repo support multiple GPU training? Since I didn't saw related configures (e.g. accelerate, deepspeed). Thank...
Hi, I have a question about the base model for ft and training stage 1. Since I saw the base model for FT is `ltuas_long_noqa_a6.bin`, which is only 187MB, and...
Hello, thank you for providing such a good idea of research on audio question answering. I have some questions about the LTU_AS: 1. For ASR task. During inference period(refer to...
In the LTU paper you say you will distribute the dataset after the peer review process. I noticed that you have been accepted to ASRU 2023 for your LTU-AS paper...
Hello, I would like to ask about the following 2 questions: 1. If there if any shell scipt to run extract_whisper_feature.py? since I don't know what is the parameters of...
It seems missing the tokenize the audio (from 'input_ids') step both in finetune.py/finetune_low_resource.py of the LTU repo. Where is the detailed coding step for audio tokenization? I saw the 'load_audio()'...
Hello, I would like to ask, how do you test the audio in the LibriSpeech dataset that exceeds 10 seconds in duration?I'm encountering an issue while using the LibriSpeech dataset...