YWMditto

Results 2 issues of YWMditto

I have seen the code of Wav2Vec2FeatureExtractor in transformers, and it said the model `wav2vec2-base-960h` is trained without using attention mask. I wonder why and how the model is trained...

question
needs triage

For example, truncating the loss difference to 0 does not seem to be implemented. ![image](https://github.com/princeton-nlp/LLM-Shearing/assets/46778265/279a5e9b-9953-416a-941f-9fe71418ccc4) https://github.com/princeton-nlp/LLM-Shearing/blob/1386c8f69cfb3bf64896959cf3754d2bf87659c7/llmshearing/callbacks/dynamic_loading_callback.py#L34 And, what is the purpose of this line? https://github.com/princeton-nlp/LLM-Shearing/blob/1386c8f69cfb3bf64896959cf3754d2bf87659c7/llmshearing/callbacks/dynamic_loading_callback.py#L41