Wing Lian
Wing Lian
> I get similar error with > > `accelerate launch -m axolotl.cli.train llama_lora.yml --deepspeed deepspeed_configs/zero1.json` > > > > With config same in [examples](https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/examples/llama-2/lora.yml). > > Just added additionally >...
May want to keep track of https://github.com/huggingface/peft/issues/958 in case it is supported there.
looking at the shift/unshift code, it seems it's not packed sequence length aware, so that would need some modification (or simply not allow packed sequences to work w this features)
These will be helpful https://github.com/philipturner/metal-flash-attention https://github.com/ml-explore/mlx/issues/129
it's hard to say, with 300 rows, and 10% held out for the eval split, it could be randomness in the dataset that small that could lead to train loss...
@hengjiUSTC are you able to compare with the SFT trainer with proper label masking for instruct tuning?
You have completion only set to false with trl. You should start there. That should probably be true for that trainer to set the labels properly
It looks like you have a really small dataset. You should consider disabling sample packing.
Axolotl needs to be installed from source by GitHub cloning the repository currently. We have dependencies that aren't packages currently so we can't push axolotl as a pypi package currently
Is there another part that goes with this to optionally have the tokenization step be a bit more sparse for this feature?