crumb
Results
1
issues of
crumb
Uses bitsandbytes adam optimizer instead of torch, adds very simple gradient accumulation, finetuning only bias/layernorms (tested, works very well and is very fast) and allows for different precisions easier. (sorry...