dpgen icon indicating copy to clipboard operation
dpgen copied to clipboard

undefined key `load_ckpt` is not allowed in strict mode or can't start the MD step

Open Joey-zhangcy opened this issue 4 years ago • 19 comments

Dear DP users, i'm new to deep modeling. When i followed the CH4 case on the website, i met some problems, when the key words load_cpkt in my param.json, the dp train will stop and report undefined key **load_ckpt is not allowed in strict mode**" when i delete the load_ckpt and restart the training, the first step-dp will finished, but there is no such model_devi file. i guess the lammps can't load the potential function from the first dp train step. but i can't figure it out? could some one help?

Any suggestions or comments will be much appreciated.

Thanks a lot.

All the best, Joey train (2).log

Joey-zhangcy avatar Apr 15 '22 12:04 Joey-zhangcy

The model_devi file should be generated from the second step.

njzjz avatar Apr 18 '22 20:04 njzjz

Dear njzjz: Much thanks to your kind reply. It is my negligence not to elaborate the problem carefully. I perfrom the dpgen run rather than dp train input.json. The model_devi file should be generated. Here is my param.json file. Could you please give me some advice? I would be much appreciated.

Thanks a lot.

All the best, Joey param.zip

Joey-zhangcy avatar Apr 19 '22 00:04 Joey-zhangcy

Dear njzjz: Much thanks to your kind reply. It is my negligence not to elaborate the problem carefully. I perfrom the dpgen run rather than dp train input.json. The model_devi file should be generated. Here is my param.json file. Could you please give me some advice? I would be much appreciated.

Thanks a lot.

All the best, Joey param.zip

"but there is no such model_devi file. " this is because the definition of "sys_configs" is not correct. you should change the json file as this: "sys_configs_prefix": "/public/home/zhangchengyi/lammps-practice/tutorials/tutorials-master/EXAMPLES/dpgen_cloudserver/CH4.POSCAR.01x01x01/01.scale_pert/sys-0004-0001/scale-1.000/", "sys_configs": [ [ "00000*/POSCAR" ], [ "00001*/POSCAR" ] ],

taipinghu avatar Apr 19 '22 04:04 taipinghu

Dear taipinghu: Much thanks for your advice, but it seems that such change doesn't work. I still appreciate for your help.

Thanks a lot.

All the best, Joey

Joey-zhangcy avatar Apr 19 '22 11:04 Joey-zhangcy

Dear taipinghu: Much thanks for your advice, but it seems that such change doesn't work. I still appreciate for your help.

Thanks a lot.

All the best, Joey

I think uploading all input files is more efficient to fix your error.

taipinghu avatar Apr 19 '22 15:04 taipinghu

Dear taipinghu: Much thanks for your help, here are all my input files while all the POSCARs are obtained from the examples in the dpgen. I would appreciate it if you could take the time to check it out!

Thanks a lot.

All the best, Joey all.zip

Joey-zhangcy avatar Apr 20 '22 00:04 Joey-zhangcy

Dear taipinghu: Much thanks for your help, here are all my input files while all the POSCARs are obtained from the examples in the dpgen. I would appreciate it if you could take the time to check it out!

Thanks a lot.

All the best, Joey all.zip

what is the error in dpgen run ?

taipinghu avatar Apr 20 '22 01:04 taipinghu

Dear taipinghu: Here is my output files ,it seems that i cann't upload the filefoler. I set the training steps as 2000 in the param.json. The first step training seems to finish by the lcurve.out, then it stopped. Thanks a lot.

All the best, Joey output.zip

Joey-zhangcy avatar Apr 20 '22 01:04 Joey-zhangcy

Dear taipinghu: Here is my output files ,it seems that i cann't upload the filefoler. I set the training steps as 2000 in the param.json. The first step training seems to finish by the lcurve.out, then it stopped. Thanks a lot.

All the best, Joey output.zip

the error is caused by dpdispathcer? you can change to the work dir and check if it works normmally.

taipinghu avatar Apr 20 '22 01:04 taipinghu

Dear taipinghu: The dpdispatcher seems to be generated by the dpgen. I followed your advice to move the file to another folder but dpgen still doesn't work. Thanks a lot.

All the best, Joey

Joey-zhangcy avatar Apr 20 '22 06:04 Joey-zhangcy

Dear taipinghu: I found an interesting thing. i retype the dpgen run param.json machine.json in the terminal. The code rerun and the model_devi appear but err shows. Could you give me some advice? err.txt Thanks a lot.

All the best, Joey

Joey-zhangcy avatar Apr 20 '22 11:04 Joey-zhangcy

Dear taipinghu: I found an interesting thing. i retype the dpgen run param.json machine.json in the terminal. The code rerun and the model_devi appear but err shows. Could you give me some advice? err.txt Thanks a lot.

All the best, Joey

I think first you should check if the path (sys_configs_prefix and sys_configs in parameter json file) is correct. You can goto 01.model_devi dir to see if some dirs like task.000.00000 is exist.

taipinghu avatar Apr 20 '22 14:04 taipinghu

Dear taipinghu: I went through the path(sys_configs in parameter json file) by the cd command. Nothing went wrong. There are just four .pb file, a cur_job.json, and an empty filefolder confs in the 01.model_devi filefolder. No such file named task.000.0000. Thanks a lot.

All the best, Joey

Joey-zhangcy avatar Apr 21 '22 00:04 Joey-zhangcy

Dear taipinghu: I went through the path(sys_configs in parameter json file) by the cd command. Nothing went wrong. There are just four .pb file, a cur_job.json, and an empty filefolder confs in the 01.model_devi filefolder. No such file named task.000.0000. Thanks a lot.

All the best, Joey

please check carefully again, I still think the path of sys_configs is incorrect. you can manually write a simple script to read param.json file and then print the sys_figs.

taipinghu avatar Apr 21 '22 01:04 taipinghu

Dear taipinghu: Thanks for your advice.while there is a data.init filefolder generated automatically in the iter.000000. All the sys_configs files are listed inside. I thought if the path of sys_configs is incorrect and the dpgen cannot find these file. These sys_configs files couldn't be listed here?

Thanks a lot.

All the best, Joey

Joey-zhangcy avatar Apr 21 '22 01:04 Joey-zhangcy

Dear taipinghu: Thanks for your advice.while there is a data.init filefolder generated automatically in the iter.000000. All the sys_configs files are listed inside. I thought if the path of sys_configs is incorrect and the dpgen cannot find these file. These sys_configs files couldn't be listed here?

Thanks a lot.

All the best, Joey

(1) data.init filefolder is originated from init_data_prefix and init_data_sys in param, rather than sys_configs_prefix and sys_configs. (2) as mentioned above, you find an empty folder in confs in 01.model_devi. This confs dirs save the lammps lmp format files, which are converted from POSCAR stated in os.path.join(sys_configs_prefix, sys_configs).

taipinghu avatar Apr 21 '22 01:04 taipinghu

Dear taipinghu: I really appreciate your help. No matter how i change the path style of the original file, it didn't work. I download the input-sys_configs from Internet, and change the path, it works. By the way, can I ask you one more question, that is, every time I run nohup dpgen run param.json machine.json, it stops after one step, and I need to retype the command in the terminal before I can run the next step, do you have any idea about this problem? Thanks a lot.

All the best, Joey

Joey-zhangcy avatar Apr 21 '22 23:04 Joey-zhangcy

Dear taipinghu: I really appreciate your help. No matter how i change the path style of the original file, it didn't work. I download the input-sys_configs from Internet, and change the path, it works. By the way, can I ask you one more question, that is, every time I run nohup dpgen run param.json machine.json, it stops after one step, and I need to retype the command in the terminal before I can run the next step, do you have any idea about this problem? Thanks a lot.

All the best, Joey

as for your first question, you shouled know that the workflow of dpgen contains three steps, i.e. 00.train, 01.model_devi, 02.fp. each step also contains three steps, e.g., make_train, run_train and post train. you can read the record.dpgen file to get the current step. It will be helpful for you to fix the error.

dpgen can automatically run above steps, unless you write a incorrect machine.json file (depend on your schedule system).

taipinghu avatar Apr 22 '22 01:04 taipinghu

Dear taipinghu: Thank you very much for your help, I will adjust the parameters carefully。 Thanks a lot.

All the best, Joey

Joey-zhangcy avatar Apr 22 '22 01:04 Joey-zhangcy

It seems that this problem has been solved, so I'll close this issue. If you have any questions yet, you can reopen this issue or create a new issue.

AnguseZhang avatar Sep 05 '22 09:09 AnguseZhang