AOGNet icon indicating copy to clipboard operation
AOGNet copied to clipboard

error when training by cifar10

Open Super-1123 opened this issue 5 years ago • 1 comments

Hi,I try to rerun this code to test this model's performance by using the 'python3.6 main.py --cfg cfgs/cifar10/aognet_cifar10_ps_4_bottleneck_1M.yaml --gpus 1,2'.At first everything seemed to be going smoothly,however,when it comes to epoch 280,it is stoped by an error: Traceback (most recent call last): File "main.py", line 133, in main() File "main.py", line 120, in main epoch_end_callback = checkpoint) File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/module/base_module.py", line 575, in fit callback(epoch, self.symbol, arg_params, aux_params) File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/callback.py", line 89, in _callback save_checkpoint(prefix, iter_no + 1, sym, arg, aux) File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/model.py", line 409, in save_checkpoint nd.save(param_name, save_dict) File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/ndarray/utils.py", line 273, in save keys)) File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/base.py", line 252, in check_call raise MXNetError(py_str(LIB.MXGetLastError())) mxnet.base.MXNetError: [09:52:25] src/io/local_filesys.cc:39: Check failed: std::fwrite(ptr, 1, size, fp) == size FileStream.Write incomplete I can't find the suitable solution to deal with this problem.So could you please tell me how to solve this problem?

Super-1123 avatar Apr 10 '19 02:04 Super-1123

sorry, I didn't have this problem before. It seems like an error to save checkpoint to file. Anyway, the code is based on old version of mxnet. The results cannot match the results reported in the paper. We'll release our new pytorch code very soon. Stay tuned.

xilaili avatar Apr 10 '19 18:04 xilaili