VIE
VIE copied to clipboard
no models files
hello, when i start training directly, i run the run_training.sh, but it shows there is no model.ckpt-50000 file. Can you help me solve this problem? Thanks!
The training will first train an IR model for 50000 steps. This should be done in the run_training.sh as the first step. I guess that step failed for some reason. You may just need to rerun the run_training.sh to see what happened in the first step.
The training will first train an IR model for 50000 steps. This should be done in the run_training.sh as the first step. I guess that step failed for some reason. You may just need to rerun the run_training.sh to see what happened in the first step.
thanks for your help. when i train an IR model, i met a small problem, the image loading has something wrong, can you help me solve it?Thanks!
From the error message, it seems that you are training the model on CPU and therefore not supporting some operations. Are you training on CPUs? I would not suggest so, as it can take forever for the model to train.
From the error message, it seems that you are training the model on CPU and therefore not supporting some operations. Are you training on CPUs? I would not suggest so, as it can take forever for the model to train.
Thanks for your instructions, i now can use GPUs, but when training the model, the progress bar did not move for a very long time, just as the picture below is shown. I do not know if I miss something operations ?
hello, when training an IR model for 50000 steps, the processing bar did not move, i found that the code in the framework.py line170 : res = self.validation_params[val_key]['valid_loop']['func'](self.sess, self.all_val_targets[val_key]), this can not be operated. i do not know if i miss some operations or do something wrong,can you give me some advice?
sorry for being late here, I don't quite know why this is happening, especially as other people are able to run the validation and training without any problem. And based on your log, your training can be run for at least one step. One thing possible is that you may not have enough computation resources to do the validation loading, like not enough cpus. You can try to vary parameter val_num_workers
by setting it to a lower number like 5 through add --val_num_workers 5
to the training commands.