Yuan Gong comments

Results 284 comments of


                                            Yuan Gong

How to process audio that exceeds 10 seconds in length

What we did is inputing the first 10-second audio feature (from Whisper-encoder) and **full** Whisper transcription to the LLM. The transcription can be as long as possible (not limited to...

How to process audio that exceeds 10 seconds in length

> What this audio segment is saying? does LTU-AS output the entire text, and not just the text contained within those 10 seconds of speech? No, LTU-AS almost never cut...

Questions about data construction

hi there, > How do different data sets allocate the proportion to generate QA pairs? For example, how does AudioSet data determine which audio segments are used to generate Classification...

no evaluation script for open-set problem

hi there, Thanks for the question. The batch inference script does the inference for open-set questions: https://github.com/YuanGongND/ltu/blob/0fa0923f9c9d04346486a28477ba69b7d957130c/src/ltu/inference_batch.py#L150-L153 The actual open-ended test set can be downloaded from https://www.dropbox.com/scl/fo/juh1dk9ltvhghuj0l1sag/h?rlkey=0n2cd5kebzh8slwanjzrfn7q6&dl=0, check `open_as_gpt4.json` and...

Missing Tokenize Audio Info during Fine-tuning/Training

They are in https://github.com/YuanGongND/ltu/blob/0fa0923f9c9d04346486a28477ba69b7d957130c/src/ltu/hf-dev/transformers-main/src/transformers/data/data_collator.py#L615-L616 (similar path for LTU-AS). They cannot be in finetune.py/finetune_low_resource.py because they have to be loaded on-the-fly otherwise there will be an OOM (we cannot put all...

About the experimental results of the paper LTU-AS

Thanks for the question. The LTU-AS model, is trained with two types of data - [continuous audio token, spoken text] or [continuous audio token only] (in the situation that the...

OpenAQA Dataset Access

hi there, I am actively working on this. Will update ASAP (expect to release in October). Though I cannot guarantee anything at this point (the actual release time also subject...

OpenAQA Dataset Access

hi there, The dataset and code are all released now. -Yuan

Model Parallelization

hi there, thanks for the question. 1/ what's your GPU setting? for model parallelization, you would need multiple GPUs in a single node. 2/ were you able to run inference?...

Model Parallelization

Done, please see [LTU](https://github.com/YuanGongND/ltu/blob/main/src/ltu/train_script/finetune_toy_low_resource.sh) and [LTU-AS](https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/train_script/finetune_toy_low_resource.sh). Your resources should be enough to train the model, remember to tune the `micro_batch_size` to the max number that your GPUs can run. Regarding...