open-solution-mapping-challenge icon indicating copy to clipboard operation
open-solution-mapping-challenge copied to clipboard

predicting/evaluating issue

Open XYAskWhy opened this issue 6 years ago • 9 comments

When predicting or evaluating with python main.py -- predict(evaluate) --pipeline_name unet --chunk_size 5000, the following error occurs:

neptune: Executing in Offline Mode. neptune: Executing in Offline Mode. 2018-05-30 21-01-50 mapping-challenge >>> predicting /home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py:895: DtypeWarning: Columns (6,7) have mixed types. Specify dtype option on import or set low_memory=False. return ctx.invoke(self.callback, **ctx.params) neptune: Executing in Offline Mode. 0%| | 0/13 [00:00<?, ?it/s]2018-05-30 21-01-56 steps >>> step xy_inference adapting inputs 2018-05-30 21-01-56 steps >>> step xy_inference loading transformer... 2018-05-30 21-01-56 steps >>> step xy_inference transforming... 2018-05-30 21-01-56 steps >>> step xy_inference adapting inputs 2018-05-30 21-01-56 steps >>> step xy_inference loading transformer... 2018-05-30 21-01-56 steps >>> step xy_inference transforming... 2018-05-30 21-01-56 steps >>> step loader adapting inputs 2018-05-30 21-01-56 steps >>> step loader loading transformer... 2018-05-30 21-01-56 steps >>> step loader transforming... 2018-05-30 21-01-56 steps >>> step unet unpacking inputs 2018-05-30 21-01-56 steps >>> step unet loading transformer... 2018-05-30 21-01-58 steps >>> step unet transforming... 2018-05-30 21-01-58 steps >>> step mask_resize adapting inputs 2018-05-30 21-01-58 steps >>> step mask_resize loading transformer... 2018-05-30 21-01-58 steps >>> step mask_resize transforming... 2018-05-30 21-01-58 steps >>> step category_mapper adapting inputs 2018-05-30 21-01-58 steps >>> step category_mapper loading transformer... 2018-05-30 21-01-58 steps >>> step category_mapper transforming... 2018-05-30 21-01-58 steps >>> step mask_erosion adapting inputs 2018-05-30 21-01-58 steps >>> step mask_erosion loading transformer... 2018-05-30 21-01-58 steps >>> step mask_erosion transforming... 2018-05-30 21-01-58 steps >>> step labeler adapting inputs 2018-05-30 21-01-58 steps >>> step labeler loading transformer... 2018-05-30 21-01-58 steps >>> step labeler transforming... 2018-05-30 21-01-58 steps >>> step mask_dilation adapting inputs 2018-05-30 21-01-58 steps >>> step mask_dilation loading transformer... 2018-05-30 21-01-58 steps >>> step mask_dilation transforming... 2018-05-30 21-01-59 steps >>> step xy_inference adapting inputs 2018-05-30 21-01-59 steps >>> step xy_inference loading transformer... 2018-05-30 21-01-59 steps >>> step xy_inference transforming... 2018-05-30 21-01-59 steps >>> step xy_inference adapting inputs 2018-05-30 21-01-59 steps >>> step xy_inference loading transformer... 2018-05-30 21-01-59 steps >>> step xy_inference transforming... 2018-05-30 21-01-59 steps >>> step loader adapting inputs 2018-05-30 21-01-59 steps >>> step loader loading transformer... 2018-05-30 21-01-59 steps >>> step loader transforming... 2018-05-30 21-01-59 steps >>> step unet unpacking inputs 2018-05-30 21-01-59 steps >>> step unet loading transformer... 2018-05-30 21-01-59 steps >>> step unet transforming... 2018-05-30 21-01-59 steps >>> step mask_resize adapting inputs 2018-05-30 21-01-59 steps >>> step mask_resize loading transformer... 2018-05-30 21-01-59 steps >>> step mask_resize transforming... 2018-05-30 21-01-59 steps >>> step score_builder adapting inputs 2018-05-30 21-01-59 steps >>> step score_builder fitting and transforming... 2018-05-30 21-08-33 steps >>> step score_builder saving transformer... 2018-05-30 21-08-33 steps >>> step output adapting inputs Traceback (most recent call last): File "main.py", line 282, in action()4, 12.69it/s] File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call return self.main(*args, **kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "main.py", line 158, in predict _predict(pipeline_name, dev_mode, submit_predictions, chunk_size) File "main.py", line 169, in _predict prediction = generate_prediction(meta_test, pipeline, logger, CATEGORY_IDS, chunk_size) File "main.py", line 238, in generate_prediction return _generate_prediction_in_chunks(meta_data, pipeline, logger, category_ids, chunk_size) File "main.py", line 271, in _generate_prediction_in_chunks output = pipeline.transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 155, in transform step_inputs = self.adapt(step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 192, in adapt raw_inputs = [step_inputs[step_name][step_var] for step_name, step_var in step_mapping] File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 192, in raw_inputs = [step_inputs[step_name][step_var] for step_name, step_var in step_mapping] KeyError: 'images'

The error above may be caused by using --chunk_size 5000, since the program crashes exactly after 5000 iterations(?). But even if I don't specify chunk_size and just run python main.py -- predict --pipeline_name unet, another error occurs, which is the same error when I simply run python main.py -- train_evaluate_predict --pipeline_name unet --chunk_size 5000 as ReadMe suggests.

neptune: Executing in Offline Mode. neptune: Executing in Offline Mode. 2018-05-30 21-45-10 mapping-challenge >>> predicting /home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py:895: DtypeWarning: Columns (6,7) have mixed types. Specify dtype option on import or set low_memory=False. return ctx.invoke(self.callback, **ctx.params) neptune: Executing in Offline Mode. 2018-05-30 21-45-14 steps >>> step xy_inference adapting inputs 2018-05-30 21-45-14 steps >>> step xy_inference loading transformer... 2018-05-30 21-45-14 steps >>> step xy_inference transforming... 2018-05-30 21-45-14 steps >>> step xy_inference adapting inputs 2018-05-30 21-45-14 steps >>> step xy_inference loading transformer... 2018-05-30 21-45-14 steps >>> step xy_inference transforming... 2018-05-30 21-45-14 steps >>> step loader adapting inputs 2018-05-30 21-45-14 steps >>> step loader loading transformer... 2018-05-30 21-45-14 steps >>> step loader transforming... 2018-05-30 21-45-14 steps >>> step unet unpacking inputs 2018-05-30 21-45-14 steps >>> step unet loading transformer... 2018-05-30 21-45-17 steps >>> step unet transforming... 2018-05-30 21-45-17 steps >>> step mask_resize adapting inputs 2018-05-30 21-45-17 steps >>> step mask_resize loading transformer... 2018-05-30 21-45-17 steps >>> step mask_resize transforming... 2018-05-30 21-45-17 steps >>> step category_mapper adapting inputs 2018-05-30 21-45-17 steps >>> step category_mapper loading transformer... 2018-05-30 21-45-17 steps >>> step category_mapper transforming... 2018-05-30 21-45-17 steps >>> step mask_erosion adapting inputs 2018-05-30 21-45-17 steps >>> step mask_erosion loading transformer... 2018-05-30 21-45-17 steps >>> step mask_erosion transforming... 2018-05-30 21-45-17 steps >>> step labeler adapting inputs 2018-05-30 21-45-17 steps >>> step labeler loading transformer... 2018-05-30 21-45-17 steps >>> step labeler transforming... 2018-05-30 21-45-17 steps >>> step mask_dilation adapting inputs 2018-05-30 21-45-17 steps >>> step mask_dilation loading transformer... 2018-05-30 21-45-17 steps >>> step mask_dilation transforming... 2018-05-30 21-45-17 steps >>> step xy_inference adapting inputs 2018-05-30 21-45-17 steps >>> step xy_inference loading transformer... 2018-05-30 21-45-17 steps >>> step xy_inference transforming... 2018-05-30 21-45-17 steps >>> step xy_inference adapting inputs 2018-05-30 21-45-17 steps >>> step xy_inference loading transformer... 2018-05-30 21-45-17 steps >>> step xy_inference transforming... 2018-05-30 21-45-17 steps >>> step loader adapting inputs 2018-05-30 21-45-17 steps >>> step loader loading transformer... 2018-05-30 21-45-17 steps >>> step loader transforming... 2018-05-30 21-45-17 steps >>> step unet unpacking inputs 2018-05-30 21-45-17 steps >>> step unet loading transformer... 2018-05-30 21-45-17 steps >>> step unet transforming... 2018-05-30 21-45-17 steps >>> step mask_resize adapting inputs 2018-05-30 21-45-17 steps >>> step mask_resize loading transformer... 2018-05-30 21-45-17 steps >>> step mask_resize transforming... 2018-05-30 21-45-17 steps >>> step score_builder adapting inputs 2018-05-30 21-45-17 steps >>> step score_builder loading transformer... 2018-05-30 21-45-17 steps >>> step score_builder transforming... Traceback (most recent call last): File "main.py", line 282, in action()4:40, 12.93it/s] File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call return self.main(*args, **kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "main.py", line 158, in predict _predict(pipeline_name, dev_mode, submit_predictions, chunk_size) File "main.py", line 169, in _predict prediction = generate_prediction(meta_test, pipeline, logger, CATEGORY_IDS, chunk_size) File "main.py", line 240, in generate_prediction return _generate_prediction(meta_data, pipeline, logger, category_ids) File "main.py", line 252, in _generate_prediction output = pipeline.transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 152, in transform step_inputs[input_step.name] = input_step.fit_transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 109, in fit_transform step_output_data = self._cached_fit_transform(step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 117, in _cached_fit_transform step_output_data = self.transformer.transform(**step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 127, in transform for image, image_probabilities in tqdm(zip(images, probabilities)): File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 941, in iter for obj in iterable: File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 200, in _transform for image in tqdm(images): File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 941, in iter for obj in iterable: File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 137, in _transform for i, image in enumerate(images): File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 174, in _transform yield erode_image(image, self.selem_size) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 267, in erode_image eroded_image = binary_erosion(mask, selem=selem) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/skimage/morphology/misc.py", line 37, in func_out return func(image, selem=selem, *args, **kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/skimage/morphology/binary.py", line 42, in binary_erosion ndi.binary_erosion(image, structure=selem, output=out) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/scipy/ndimage/morphology.py", line 370, in binary_erosion output, border_value, origin, 0, brute_force) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/scipy/ndimage/morphology.py", line 227, in _binary_erosion if numpy.product(structure.shape,axis=0) < 1: File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1897, in product return um.multiply.reduce(a, axis=axis, dtype=dtype, out=out, **kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 175, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 6930) is killed by signal: Killed.

XYAskWhy avatar May 30 '18 14:05 XYAskWhy

@XYAskWhy the error above is caused by the mistake on our part in the inference mode of unet. We are running always unet_padded and unet_padded_tta in the inference mode and didn't catch that typo. I would suggest that you run evaluate again with the unet_padded on --chunk_size 5000 or unet_padded_tta on smaller chunk size to fit it in memory when combining tta predictions. My advice is to go with --chunk_size 200 with unet_padded_tta as it gives the best results.

jakubczakon avatar May 30 '18 16:05 jakubczakon

@jakubczakon Many thanks, but training configuration might not be practical, since most mainstream GPUs now have about 10G memory while the 20 images batch only use 2G. As a result, training is very slow. What's your suggestion on larger batch_size and corresponding learning rate?

XYAskWhy avatar May 31 '18 08:05 XYAskWhy

Very simple just change batch_size_train in the neptune.yaml. you change all other things there too. Including encoder network fron resnet34 to resnet152 or 101, learning rates training schedule and other stuff

jakubczakon avatar May 31 '18 09:05 jakubczakon

You can also train multi gpu. Remember to set num_workers to a higher number because that usually is the bottleneck

jakubczakon avatar May 31 '18 09:05 jakubczakon

how can you work Neptune in Offline Mode

hs0531 avatar May 20 '20 08:05 hs0531

Hi @hs0531

You can do something like this:

from neptune import OfflineBackend

neptune.init(backend=OfflineBackend())
...

as [explained here[(https://docs.neptune.ai/neptune-client/docs/neptune.html?highlight=offline).

In that case, nothing will be logged to Neptune -> I use it usually for debugging purposes.

jakubczakon avatar May 20 '20 09:05 jakubczakon

thank you

---Original--- From: "Jakub"<[email protected]> Date: Wed, May 20, 2020 17:35 PM To: "neptune-ai/open-solution-mapping-challenge"<[email protected]>; Cc: "hs0531"<[email protected]>;"Mention"<[email protected]>; Subject: Re: [neptune-ai/open-solution-mapping-challenge] predicting/evaluating issue (#123)

Hi @hs0531

You can do something like this: from neptune import OfflineBackend neptune.init(backend=OfflineBackend()) ...

as [explained here[(https://docs.neptune.ai/neptune-client/docs/neptune.html?highlight=offline).

In that case, nothing will be logged to Neptune -> I use it usually for debugging purposes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

hs0531 avatar May 20 '20 09:05 hs0531

thank you ---Original--- From: "Jakub"<[email protected]> Date: Wed, May 20, 2020 17:35 PM To: "neptune-ai/open-solution-mapping-challenge"<[email protected]>; Cc: "hs0531"<[email protected]>;"Mention"<[email protected]>; Subject: Re: [neptune-ai/open-solution-mapping-challenge] predicting/evaluating issue (#123) Hi @hs0531 You can do something like this: from neptune import OfflineBackend neptune.init(backend=OfflineBackend()) ... as [explained here[(https://docs.neptune.ai/neptune-client/docs/neptune.html?highlight=offline). In that case, nothing will be logged to Neptune -> I use it usually for debugging purposes. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

which version of neptune do you install. i do follow you but get "cannot import name OfflineBackend"

hs0531 avatar May 21 '20 01:05 hs0531

Hi @hs0531

You can do something like this:

from neptune import OfflineBackend

neptune.init(backend=OfflineBackend())
...

as [explained here[(https://docs.neptune.ai/neptune-client/docs/neptune.html?highlight=offline).

In that case, nothing will be logged to Neptune -> I use it usually for debugging purposes.

which version of neptune do you install. i do follow you but get "cannot import name OfflineBackend"

hs0531 avatar May 21 '20 01:05 hs0531