Locke comments

Results 13 comments of


                                            Locke

Avoid freeing pointer after copy [Python]

Hi, @gdestouet, There are two methods to save the best model during training. 1. If the model is fully trained on your training data, you can use [save_to_file](https://github.com/XtraComputing/thundersvm/blob/6e28da802e483fc741056a1768c825737c840cca/python/thundersvm/thundersvm.py#L439) and [load_from_file](https://github.com/Xtra-Computing/thundersvm/blob/6e28da802e483fc741056a1768c825737c840cca/python/thundersvm/thundersvm.py#L442)...

Gigaword Validation Data

@JustinLin610 Thanks for your job. I wonder how to split the data into `validation set` and `test set`. There are 18,691 lines in the `valid.article.filter.txt`. How could I get the...

Could you implement cross-validation with callbacks when to stop?

Thanks for your advice. We are working on this.

[BUG]running step3 use bloomz + lora + zero3, raise RuntimeError(f"{param.ds_summary()} already in registry")

@HeyangQin Still encounter this with the deepspeed version 0.10.3, running step3 use llama2 + lora + zero3, v100*32G anaconda3.9/envs/dschat/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 52, in __setitem__ raise RuntimeError(f"{param.ds_summary()} already in registry") RuntimeError: {'id':...

[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:

> > Hello @iamsile @vittorio-perera. Could you provide a reproduction script for us to better investigate this issue? Thank you > > @HeyangQin This is a full record:#4175. ~I used...

[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:

@iamsile Hi, Could you please tell me how to fix this? Many Thanks.

[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:

try the latest version driver and cuda.

Locke

Avoid freeing pointer after copy [Python]

Gigaword Validation Data

Could you implement cross-validation with callbacks when to stop?

[BUG]running step3 use bloomz + lora + zero3, raise RuntimeError(f"{param.ds_summary()} already in registry")

[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:

[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:

[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:

PPO使用zero3加载全参训练的奖励模型，奖励模型加载失败。

Sudden random bug

HI,How to achieve batch_size>1 or use multi_gpu，Can you give me an idea?