Yuan Gong
Yuan Gong
What we did is inputing the first 10-second audio feature (from Whisper-encoder) and **full** Whisper transcription to the LLM. The transcription can be as long as possible (not limited to...
> What this audio segment is saying? does LTU-AS output the entire text, and not just the text contained within those 10 seconds of speech? No, LTU-AS almost never cut...
hi there, > How do different data sets allocate the proportion to generate QA pairs? For example, how does AudioSet data determine which audio segments are used to generate Classification...
hi there, Thanks for the question. The batch inference script does the inference for open-set questions: https://github.com/YuanGongND/ltu/blob/0fa0923f9c9d04346486a28477ba69b7d957130c/src/ltu/inference_batch.py#L150-L153 The actual open-ended test set can be downloaded from https://www.dropbox.com/scl/fo/juh1dk9ltvhghuj0l1sag/h?rlkey=0n2cd5kebzh8slwanjzrfn7q6&dl=0, check `open_as_gpt4.json` and...
They are in https://github.com/YuanGongND/ltu/blob/0fa0923f9c9d04346486a28477ba69b7d957130c/src/ltu/hf-dev/transformers-main/src/transformers/data/data_collator.py#L615-L616 (similar path for LTU-AS). They cannot be in finetune.py/finetune_low_resource.py because they have to be loaded on-the-fly otherwise there will be an OOM (we cannot put all...
Thanks for the question. The LTU-AS model, is trained with two types of data - [continuous audio token, spoken text] or [continuous audio token only] (in the situation that the...
hi there, I am actively working on this. Will update ASAP (expect to release in October). Though I cannot guarantee anything at this point (the actual release time also subject...
hi there, The dataset and code are all released now. -Yuan
hi there, thanks for the question. 1/ what's your GPU setting? for model parallelization, you would need multiple GPUs in a single node. 2/ were you able to run inference?...
Done, please see [LTU](https://github.com/YuanGongND/ltu/blob/main/src/ltu/train_script/finetune_toy_low_resource.sh) and [LTU-AS](https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/train_script/finetune_toy_low_resource.sh). Your resources should be enough to train the model, remember to tune the `micro_batch_size` to the max number that your GPUs can run. Regarding...