open-solution-mapping-challenge
open-solution-mapping-challenge copied to clipboard
KeyError: "['file_path_mask_eroded_3'] not in index"
When running local pure python with python main.py -- train_evaluate_predict --pipeline_name unet --chunk_size 5000 , the following error occurs, any help?
neptune: Executing in Offline Mode. neptune: Executing in Offline Mode. 2018-05-29 16-16-52 mapping-challenge >>> training neptune: Executing in Offline Mode. 2018-05-29 16-16-55 steps >>> step xy_train adapting inputs 2018-05-29 16-16-55 steps >>> step xy_train fitting and transforming... Traceback (most recent call last): File "main.py", line 282, in
action() File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call return self.main(*args, **kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "main.py", line 79, in train _train(pipeline_name, dev_mode) File "main.py", line 106, in _train pipeline.fit_transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform step_inputs[input_step.name] = input_step.fit_transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform step_inputs[input_step.name] = input_step.fit_transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform step_inputs[input_step.name] = input_step.fit_transform(data) [Previous line repeated 5 more times] File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 109, in fit_transform step_output_data = self._cached_fit_transform(step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 120, in _cached_fit_transform step_output_data = self.transformer.fit_transform(**step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 253, in fit_transform return self.transform(*args, **kwargs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/preprocessing/misc.py", line 17, in transform y = meta[self.y_columns].values File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/frame.py", line 2133, in getitem return self._getitem_array(key) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/frame.py", line 2177, in _getitem_array indexer = self.loc._convert_to_indexer(key, axis=1) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1269, in _convert_to_indexer .format(mask=objarr[mask])) KeyError: "['file_path_mask_eroded_3'] not in index"
@XYAskWhy Hi, did you generate metadata?
You can do that by running python main.py -- prepare_metadata and you also need to prepare masks by going python main.py -- prepare_masks
If you have already done that then open your metadata csv and check which columns are available. Remember that you can choose how to generate your target masks so your csv may contain different columns. You can choose which column should be used as target masks in pipeline_config.py:
Y_COLUMNS = ['file_path_mask_eroded_0_dilated_0']
I also encountered this problem. l will have a try. Thanks!
Thanks @jakubczakon , I had done prepare_metadata and prepare_masks then, and the problem is we must prepare masks first.
@XYAskWhy After I executed python main.py -- prepare_masks, this error still exists. What columns are in your stage1_metadata.csv ? There are only the following columns in my file: ImageId, file_path_image, is_train, is_valid, is_test, n_buildings. Is there anything wrong? What else do I need to do?
@dslwz2008 If you prepare metadata first, you need to redo it after you prepare mask. Then the newly generated csv file will include a extra column like 'file_path_mask_eroded_0_dilated_0'.
@XYAskWhy @dslwz2008 I will fix the readme today but yes as @XYAskWhy when metadata is created it looks for the folders with target masks and creates the columns based on that information. It may seem over the top at first glance but creating target masks for this problem is very far from trivial. The following ideas are all viable options:
- overlay target masks
- erode masks first and overlay
- erode large masks but dilate small masks and overlay (to increase signal for the small objects)
- drop border masks that are very thin (<2 pixels) and then overlay to decrease false signals from mislabeled edge objects
I hope this helps!
I re-executed commands
python main.py -- prepare_masks and
neptune experiment run main.py -- prepare_metadata \ --train_data \ --valid_data \ --test_data in order. However, there is still no file_path_mask_eroded_0_dilated_0 column in my file stage1_metadata.csv. I am using the master branch. What else do I need to do? @XYAskWhy @jakubczakon
@dslwz2008 what are your paths in the neptune.yaml ?
data_dir: /path/to/data
meta_dir: /path/to/data
masks_overlayed_dir: /path/to/masks_overlayed
masks_overlayed_eroded_dir: /path/to/masks_overlayed_eroded
experiment_dir: /path/to/work/dir
Can you confirm that your masks did generate? The mask overlayed folder should be around 100G
This is my neptune.yaml:
data_dir: /home/shenshen/Programs/mc_data meta_dir: /home/shenshen/Programs/mc_data masks_overlayed_dir: /home/shenshen/Programs/mc_dat_eroded_2_dilated_3
masks_overlayed_eroded_dir: /home/shenshen/Programs/mc_dat_eroded_2_dilated_3
experiment_dir: /home/shenshen/Programs/open-solution-mapping-challenge
I am not sure if the masks_overlayed_dir setting is correct. @jakubczakon
Ok I see. You just need to have something like:
masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/
and it will create this particular setting with eroded2_dilated_3 automatically. Below is the piece of the code that deals with this part:
images_path_to_write = images_path
masks_overlayed_dir_ = masks_overlayed_dir[:-1]
masks_dir_prefix = os.path.split(masks_overlayed_dir_)[1]
masks_overlayed_sufix_to_write = []
for masks_dir in os.listdir(meta_dir):
masks_dir_name = os.path.split(masks_dir)[1]
if masks_dir_name.startswith(masks_dir_prefix):
masks_overlayed_sufix_to_write.append(masks_dir_name[len(masks_dir_prefix):])
So you need to define your path ending with / . I will submit an issue to clean that up right away but I am not sure if I will have time to change that today as I want to do some last minute postprocessing of the newest models and start generate final submission.
@dslwz2008 @XYAskWhy by the way I updated the readme.
The most important part is that best training results were achieved when training with distance and size weighted loss so the pipeline that needs to be chosed is unet_weighted instead of the unet. Also when running predictions using replication padding+test time augmentation gave us significant improvements. The pipeline to run it is called unet_padded_tta
@dslwz2008 also I would change the
experiment_dir: /home/shenshen/Programs/open-solution-mapping-challenge
to something particular to this experiment. All the models will be saved in that directory so I am not sure if you want to have it as generic as open-solution-mapping-challenge. I usually have something like this:
experiment_dir: ...mapping-challenge/experiments/resnet34_crop256_erode2_dilate_3
or something like that.
OK. Thanks!
After python main.py -- prepare_masks, folder mc_dat_eroded_2_dilated_3 was generated. According to my statistics, it takes up 123.1GB of space.
When was this folder generated?
masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/
Well you misspecified the
masks_overlayed_dir:
So I dont think you have that folder.
Now there are 2 options. You could either specify it correctly:
masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/
and rerun generation of the masks (takes time)
or you could simply go
mv /home/shenshen/Programs/mc_dat_eroded_2_dilated_3 /home/shenshen/Programs/masks_overlayed
and rerun metadata creation
Thank you very much! I understand. I did not create masks_overlayed folder before prepare_masks. So how is this item set up? When is the content in it generated? masks_overlayed_eroded_dir: ???
this one should actually be dropped It is a remnant of older days when we only thought of 2 configurations of those target masks :) I will drop it from readme and yamls
well you generated all those masks with prepare_masks you just put it in a wrong directory
I finally configured correctly! Unfortunately, I did not find the column file_path_mask_eroded_0_dilated_0 in the generated stage1_metadata.csv file... The first time I created metadata, it took several hours, but now it's generated in less than a minute.So I doubt, is this site (neptune.ml) cached?
Well metadata generation should be pretty fast it's only filepath munging. Also since you are generating masks with erosion 2 and dilation 3 your path is actually file_path_mask_eroded_2_dilated_3 if I am correct.

This is the head of stage1_metadata.csv. No similar column appears.
Did you change the mask_overlayed dir in neptune.yaml to
masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/
and recreated the metadata ? Can you remove stage1_metadata.csv and run it again?
Yes, I have changed the masks_overlayed_dir in the neptune.yaml and delete stage1_metadata.csv. Then I ran the prepare_metadata again. The result is the same as in the picture above.
what does this folder /home/shenshen/Programs/masks_overlayed/ contain ?

Okey, I checked on my setup and I actually have folders like:
.../masks_overlayed_eroded_3_dilated_2
So i believe you should change the name of your directory to
../masks_overlayed_eroded_2_dilated_3
and rerun metadata generation and you will be ready to go.
Still no column file_path_mask_eroded_2_dilated_3. I'm going to carefully analyze the code and try again. Thank you very much.
Ok, cool. But one last try:
Change the folder name by:
mv ../masks_overlayed ../masks_overlayed_eroded_2_dilated_3
But LEAVE the name in the neptune.yaml as:
masks_overlayed_dir: ../masks_overlayed/
Rerun the metadata generation.
I think @XYAskWhy got it to work pretty quickly. Any advice?
Still not working... This is really a weird thing. How about re-clone this repo and start over again? Which branch do you recommend? @XYAskWhy How did you get it to work?
Got the local training running using the older master version, but still struggling with evaluating/predicting. The updated master version should be OK as well. @dslwz2008
Master should work, dev too as i am generating final predictions with it right now. @XYAskWhy are you running unet_padding_tta ? Check evaluate checkpoint.py script to see how to add missing transformers (just run touch transformer_name in transformers dir)