ltu
ltu copied to clipboard
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Is there a length limit for input audio in ltu-as? And are there any plans to implement streaming audio input?
I am trying to use the familiar framework to align LLama with other time-series data. But my finetuned model rarely output formatted answer for multi-choices question, therefore it's very difficult...
```python # LLaVA if model_args.freeze_backbone: model.model.requires_grad_(False) ``` In LTU code, only note the LLM has already frozen. ```python # for audio params, lora always trainable, llama always frozen for name,...