Carlos Mocholí
Carlos Mocholí
@gkroiz Why does `block_size` run into recompilations but `256` doesn't? `256` would use less memory, but it could limit learning depending on your data's length.
Thanks for the explanation. For pretraining, one can decrease the `micro_batch_size`. The data is packed together in a sample so 4 batches of 10 should be approximately equal to 1...
I opened https://github.com/Lightning-AI/lit-parrot/pull/143 which does the above but automatically by saving a `config.json` file in the data directory with the optimal `max_seq_length`
Uh that's strange. Try pushing new commits and I'll debug it if it keeps happening
This sounds good to me too. Sorry for the confusion! @AngainorDev Would you like to update all occurrences together here?
All the finetune/ and pretrain/ scripts should be updated too
Hi! Here's the memory usage using current master (commit b29ca09) with falcon-7b and always passing `--precision 16-true` - **finetune/adapter.py**: 32.69 GB (`micro_batch_size=4`), 17.37 GB (`micro_batch_size=1`) - **finetune/adapter_v2.py**: 41.75 GB (`micro_batch_size=4`),...
I just merged some improvements to reduce the peak memory usage. Please pull the latest changes. I'll also be adding a guide for dealing with OOMs with #182. Hope this...
Thanks for reporting this! I can repro and see that `devices=2` requires 39.80 GB. I'll investigate :microscope:
Oh just noticed what's the issue. If you don't pass a `--strategy` it'll choose DDP You should add `--strategy fsdp` when using more than 1 device. This is explained in...