kaggle_ndsb2017
kaggle_ndsb2017 copied to clipboard
run code on multiple GPUs
Hi, Julian,
I just start to run your step3_predict_nodules.py
using your trained model.
I found it only ran on 1 GPU even I assigned 2 GPUs to it by
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
I also muted config.gpu_options.per_process_gpu_memory_fraction = 0.5
because I am allowed to use the 2 GPUs totally, but the speed was still slow.
Could you let me know how to run the code on multiple GPUs? Thanks.
Hello, I think tensorflow sees no way to distribute the network over multiple GPU's. Although in theory it should be smart enough to split the batch in 2 parts en run eacht part on a separate GPU.
You could do this manually however. I cannot type it out for you but every patient needs roughly 30x30x30 (~900) predictions . If you predict half of them over GPU1 with a network and the other half over GPU2 with another instance of the network you will achieve 2x speedup.
Hi, Julian, On lines 348-349 of step4_train_submissions.py, it is the following:
if level == 1:
dst_dir += "level2/"
Why not level1?
Indeed I also had to look twice after this time.
The level 1 models are combined into level2 folder. The models in level2 are combined into the submission folder.
Also it gives an error on line 23 of step4_train_submissions.py: mass_df = pandas.read_csv(settings.BASE_DIR + "masses_predictions.csv"). It cannot find the masses_predictions.csv file. I searched this file name in the codes of first three steps,but cannot find it. Where do you generate this file?
step2_train_mass_segmenter.py Also has a predict phase. This one will generate this file.
You can also leave it out. It will not change the score very much.