ltu icon indicating copy to clipboard operation
ltu copied to clipboard

Which model is 7B (Default) and which is 13B (Beta)?

Open yl4579 opened this issue 1 year ago • 12 comments

Are models downloaded from inference.sh 7B (Default) or 13B (Beta)? I found the latter quite error prone and not stable, which is similar to what I'm observing now locally. I think the model is 13B (Beta)? If so, how do I get the 7B (Default) model instead?

yl4579 avatar Dec 19 '23 06:12 yl4579

hi there,

All models are 7B. The error might be the consistency of GPUs.

*GPU Issue for LTU-AS: We find that Open-AI whisper features are different on different GPUs, which impacts the performance of LTU-AS as it takes the Whisper feature as input. In the paper, we always use features generated by old GPUs (Titan-X). But we do release a checkpoint that uses a feature generated by newer GPUs (A5000/A6000), please manually switch the checkpoint if you are running on old/new GPUs (by default this code uses a new GPU feature). A mismatch of training and inference GPU does not completely destroy the model, but would cause a performance drop.

A good way to test is check if the output is consistent with our online API.

-Yuan

YuanGongND avatar Dec 19 '23 06:12 YuanGongND

The online API is not working right now. If it’s different though, since I’m running inference on A40, how do I get it working in the same way as the API?

yl4579 avatar Dec 19 '23 06:12 yl4579

You can manually download the model to your path https://github.com/YuanGongND/ltu#pretrained-models, we provide 4 checkpoints.

And then change https://github.com/YuanGongND/ltu/blob/1963db6943bc409e42287bf5b4e6977982999fe2/src/ltu_as/inference_gradio.py#L52

-Yuan

YuanGongND avatar Dec 19 '23 06:12 YuanGongND

there might be some other reasons, e.g., the sampling rate need to be 16kHz.

YuanGongND avatar Dec 19 '23 07:12 YuanGongND

I just checked the output and I'm pretty sure the default model produces output very similar to 13B (Beta) in the huggingface space (though down now). How do I get the 7B (Default) results?

yl4579 avatar Dec 19 '23 07:12 yl4579

please upload a sample wav and question. I will check later.

Our MIT GPUs are currently down, will check with our IT.

YuanGongND avatar Dec 19 '23 07:12 YuanGongND

There are three lora checkpoints, have you tried them all? https://github.com/YuanGongND/ltu#pretrained-models

Also, I restarted the HF space. Can you check if it is consistent with your local model? I am using the same checkpoint ("Long_sequence_exclude_noqa_new_gpu (Default)") as the default checkpoint online.

YuanGongND avatar Dec 19 '23 08:12 YuanGongND

Now I have confirmed they give similar response, but the response is different from those I got a month ago (around early Nov). Did you change the model for your huggingface space?

yl4579 avatar Dec 19 '23 19:12 yl4579

I do not remember clearly, but we did switch the checkpoint. You can try the "Original in Paper" checkpoint under LTU-AS, https://github.com/YuanGongND/ltu#pretrained-models.

It is an easy switch, just download the checkpoint and change https://github.com/YuanGongND/ltu/blob/1963db6943bc409e42287bf5b4e6977982999fe2/src/ltu_as/inference_gradio.py#L52 to point it to the new checkpoint.

YuanGongND avatar Dec 19 '23 19:12 YuanGongND

In your experience which one is better? I changed to eval_mdl_path = '../../pretrained_mdls/ltu_ori_paper.bin' but got the following error:

RuntimeError                              Traceback (most recent call last)
Cell In[3], line 50
     47 temp, top_p, top_k = 0.1, 0.95, 500
     49 state_dict = torch.load(eval_mdl_path, map_location='cpu')
---> 50 miss, unexpect = model.load_state_dict(state_dict, strict=False)
     52 model.is_parallelizable = True
     53 model.model_parallel = True

File ~/.conda/envs/venv_ltu_as/lib/python3.10/site-packages/torch/nn/modules/module.py:1671, in Module.load_state_dict(self, state_dict, strict)
   1666         error_msgs.insert(
   1667             0, 'Missing key(s) in state_dict: {}. '.format(
   1668                 ', '.join('"{}"'.format(k) for k in missing_keys)))
   1670 if len(error_msgs) > 0:
-> 1671     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   1672                        self.__class__.__name__, "\n\t".join(error_msgs)))
   1673 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	size mismatch for base_model.model.model.audio_proj.1.weight: copying a param with shape torch.Size([4096, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1280]).

yl4579 avatar Dec 19 '23 19:12 yl4579

did you download the one under LTU or LTU-AS?

It is hard to say which is better, it depends on the task.

YuanGongND avatar Dec 19 '23 19:12 YuanGongND

btw, you can ask multiple questions to the model in one time, but I guess the model performance will be better if you ask one by one. You can tune the prompt for each task, e.g., you can say "give an answer anyways" to force the model to give an answer rather say "I don't know".

YuanGongND avatar Dec 19 '23 19:12 YuanGongND