Locke

Results 13 comments of Locke

Hi, @gdestouet, There are two methods to save the best model during training. 1. If the model is fully trained on your training data, you can use [save_to_file](https://github.com/XtraComputing/thundersvm/blob/6e28da802e483fc741056a1768c825737c840cca/python/thundersvm/thundersvm.py#L439) and [load_from_file](https://github.com/Xtra-Computing/thundersvm/blob/6e28da802e483fc741056a1768c825737c840cca/python/thundersvm/thundersvm.py#L442)...

@JustinLin610 Thanks for your job. I wonder how to split the data into `validation set` and `test set`. There are 18,691 lines in the `valid.article.filter.txt`. How could I get the...

Thanks for your advice. We are working on this.

@HeyangQin Still encounter this with the deepspeed version 0.10.3, running step3 use llama2 + lora + zero3, v100*32G anaconda3.9/envs/dschat/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 52, in __setitem__ raise RuntimeError(f"{param.ds_summary()} already in registry") RuntimeError: {'id':...

> > Hello @iamsile @vittorio-perera. Could you provide a reproduction script for us to better investigate this issue? Thank you > > @HeyangQin This is a full record:#4175. ~I used...

> I had a similar problem on a lambdalabs mounted storage when loading an LLM adapter. I moved the adapter out of the mounted storage (towards ~). Then all worked...

You may follow the recent object-detection repo that supports batch_size>1 or multi_gpu, and adapt to our framework.