Yu-won Lee

Results 230 comments of Yu-won Lee

It's a bit weird that the scripts in the repo are set to v2.6.

Thanks for letting me know. First, please make sure that `use_liger` is set to True when you train with this repository. Second, the current version of my repo doesn’t yet...

`"overlap_comm": true,` also this could cause a bit more memory. Its in the `zero3_offload.json`

@baicenxiao Maybe using the reentrant in gradient checkpointing is the difference between mine and the huggingface code you've used. For the llama-factory, I haven't really look into it, so I'm...

I'm not really sure what is the difference. I think llama-factory is based on the code from official qwen-vl repo, but that is not quite different from mine except for...

@baicenxiao I'm not sure why llama-factory uses less vram. I've looked into the code but it's not so different from the code I've made. I've checked the default optimizer and...

I've made a additional monkey patching in the forward fuction, and it will run with much less memory and much faster speed. The code will be updated when the test...

I plan to add dynamic truncation, but I’m not sure of the best way to implement it—if the limit is too short, it might cut off part of the user’s...

That's a bit odd, it should have `config.json`( The full model config) in the directory. Are you using a checkpoint for it?

Originally, it was to load the model with the same config you've trained (Becuase you need to merge the weights). Also another thing that is to delete the quantization config...