kaggle_ndsb2017 icon indicating copy to clipboard operation
kaggle_ndsb2017 copied to clipboard

run code on multiple GPUs

Open shu-hai opened this issue 7 years ago • 5 comments

Hi, Julian, I just start to run your step3_predict_nodules.py using your trained model. I found it only ran on 1 GPU even I assigned 2 GPUs to it by os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" I also muted config.gpu_options.per_process_gpu_memory_fraction = 0.5 because I am allowed to use the 2 GPUs totally, but the speed was still slow.

Could you let me know how to run the code on multiple GPUs? Thanks.

shu-hai avatar Jun 15 '17 20:06 shu-hai

Hello, I think tensorflow sees no way to distribute the network over multiple GPU's. Although in theory it should be smart enough to split the batch in 2 parts en run eacht part on a separate GPU.

You could do this manually however. I cannot type it out for you but every patient needs roughly 30x30x30 (~900) predictions . If you predict half of them over GPU1 with a network and the other half over GPU2 with another instance of the network you will achieve 2x speedup.

juliandewit avatar Jun 16 '17 10:06 juliandewit

Hi, Julian, On lines 348-349 of step4_train_submissions.py, it is the following:

  if level == 1:
        dst_dir += "level2/"

Why not level1?

shu-hai avatar Jun 17 '17 04:06 shu-hai

Indeed I also had to look twice after this time.

The level 1 models are combined into level2 folder. The models in level2 are combined into the submission folder.

juliandewit avatar Jun 17 '17 05:06 juliandewit

Also it gives an error on line 23 of step4_train_submissions.py: mass_df = pandas.read_csv(settings.BASE_DIR + "masses_predictions.csv"). It cannot find the masses_predictions.csv file. I searched this file name in the codes of first three steps,but cannot find it. Where do you generate this file?

shu-hai avatar Jun 17 '17 07:06 shu-hai

step2_train_mass_segmenter.py Also has a predict phase. This one will generate this file.

You can also leave it out. It will not change the score very much.

juliandewit avatar Jun 18 '17 18:06 juliandewit