pylabel icon indicating copy to clipboard operation
pylabel copied to clipboard

IndexError: index 8 is out of bounds for axis 0 with size 7

Open robmarkcole opened this issue 2 years ago • 10 comments

Code:

#Specify path to the coco.json file
path_to_annotations = "data/coco.json"
#Specify the path to the images (if they are in a different folder than the annotations)
path_to_images = "data/image_patches/"

#Import the dataset into the pylable schema 
dataset = importer.ImportCoco(path_to_annotations, path_to_images=path_to_images, name="tanks_coco")
dataset.df.head(5)

Error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb Cell 7' in <cell line: 7>()
      [4](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=3)[ path_to_images = "data/image_patches/"
      ]()[6](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=5)[ #Import the dataset into the pylable schema 
----> ]()[7](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=6)[ dataset = importer.ImportCoco(path_to_annotations, path_to_images=path_to_images, name="tanks_coco")
      ]()[8](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=7)[ dataset.df.head(5)

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py:94, in ImportCoco(path, path_to_images, name)
     ]()[89](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=88)[ df.ann_category_id = df.ann_category_id.astype(str)
     ]()[91](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=90)[ df[
     ]()[92](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=91)[     ["ann_bbox_xmin", "ann_bbox_ymin", "ann_bbox_width", "ann_bbox_height"]
     ]()[93](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=92)[ ] = pd.DataFrame(df.ann_bbox.tolist(), index=df.index)
---> ]()[94](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=93)[ df.insert(8, "ann_bbox_xmax", df["ann_bbox_xmin"] + df["ann_bbox_width"])
     ]()[95](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=94)[ df.insert(10, "ann_bbox_ymax", df["ann_bbox_ymin"] + df["ann_bbox_height"])
     ]()[97](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=96)[ # debug print(df.info())
     ]()[98](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=97)[ 
     ]()[99](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=98)[ # Join the annotions with the information about the image to add the image columns to the dataframe

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py:4439, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   ]()[4436](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4435)[     raise TypeError("loc must be int")
   ]()[4438](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4437)[ value = self._sanitize_column(value)
-> ]()[4439](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4438)[ self._mgr.insert(loc, column, value)]()
File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py:1230, in BlockManager.insert(self, loc, item, value)
   [1220](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1219)[ """
   ]()[1221](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1220)[ Insert item at selected position.
   ]()[1222](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1221)[ 
   (...)
   ]()[1227](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1226)[ value : np.ndarray or ExtensionArray
   ]()[1228](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1227)[ """
   ]()[1229](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1228)[ # insert to the axis; this could possibly raise a TypeError
-> ]()[1230](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1229)[ new_axis = self.items.insert(loc, item)
   ]()[1232](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1231)[ if value.ndim == 2:
   ]()[1233](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1232)[     value = value.T

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:6602, in Index.insert(self, loc, item)
   ]()[6595](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6594)[ if arr.dtype != object or not isinstance(
   ]()[6596](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6595)[     item, (tuple, np.datetime64, np.timedelta64)
   ]()[6597](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6596)[ ):
   ]()[6598](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6597)[     # with object-dtype we need to worry about numpy incorrectly casting
   ]()[6599](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6598)[     # dt64/td64 to integer, also about treating tuples as sequences
   ]()[6600](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6599)[     # special-casing dt64/td64 https://github.com/numpy/numpy/issues/12550
   ]()[6601](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6600)[     casted = arr.dtype.type(item)
-> ]()[6602](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6601)[     new_values = np.insert(arr, loc, casted)
   ]()[6604](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6603)[ else:
   ]()[6605](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6604)[     # No overload variant of "insert" matches argument types
   ]()[6606](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6605)[     # "ndarray[Any, Any]", "int", "None"  [call-overload]
   ]()[6607](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6606)[     new_values = np.insert(arr, loc, None)  # type: ignore[call-overload]

File <__array_function__ internals>:180, in insert(*args, **kwargs)

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py:5280, in insert(arr, obj, values, axis)
   ]()[5278](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5277)[ index = indices.item()
   ]()[5279](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5278)[ if index < -N or index > N:
-> ]()[5280](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5279)[     raise IndexError(f"index {obj} is out of bounds for axis {axis} "
   ]()[5281](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5280)[                      f"with size {N}")
   ]()[5282](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5281)[ if (index < 0):
   ]()[5283](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5282)[     index += N

IndexError: index 8 is out of bounds for axis 0 with size 7]()

robmarkcole avatar Mar 11 '22 14:03 robmarkcole

Hello @robmarkcole , could you share your dataset with me? That will help me to debug the issue

alexheat avatar Mar 11 '22 15:03 alexheat

Hi @alexheat it is https://www.kaggle.com/towardsentropy/oil-storage-tanks

robmarkcole avatar Mar 11 '22 15:03 robmarkcole

Thank you. I will try to check it today

On Fri, Mar 11, 2022 at 7:52 AM Robin Cole @.***> wrote:

Hi @alexheat https://github.com/alexheat it is https://www.kaggle.com/towardsentropy/oil-storage-tanks

— Reply to this email directly, view it on GitHub <pylabel-project/pylabel#33>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5OL775ZZRSRVMYNJKC5ODU7NT3NANCNFSM5QPYAFWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

alexheat avatar Mar 11 '22 15:03 alexheat

Hello @robmarkcole I figured out what the problem is. The .json file for that dataset is missing some of the fields that are usually in a COCO annotation file. For example it doesn't have the height or width of the images:

   {
        "id": 5142,
        "file_name": "52_4_2.jpg"
    },
    

It also doesn't have some of the usual metadata about the annotations.

Unfortunately this library won't work with that dataset. If you are really stuck I can try and help work around the issue. (Which would require some more Python code to either augment the JSON or borrow some code from this library or add some additional logic to this library.)

alexheat avatar Mar 13 '22 04:03 alexheat

@alexheat do you know of any validators that could be run against the file? Did a brief search but did not find one, although https://github.com/levan92/cocojson looks useful here

robmarkcole avatar Mar 13 '22 06:03 robmarkcole

I am not aware of any validation tool or even an official place where the schema is documented. But the image height and width are included in the original COCO dataset. The height and width are required information to convert to other formats like YOLO because it requires calculating the size of the bounding box relative to the size of the image.

alexheat avatar Mar 13 '22 23:03 alexheat

this looks like the spec https://cocodataset.org/#format-data

for this issue, I suggest raising an exception if the keys are missing.

robmarkcole avatar Mar 14 '22 09:03 robmarkcole

I have updated the json file but still get the error... file on https://gist.github.com/robmarkcole/286951efe0830d3e205666f68c2b490e

Possibly due to mismatch of keys using id and image_id. Update, the following did not resolve:

for entry in in_json['annotations']:
    id = entry['image_id']
    entry['id'] = id

Possibly some conflict as raised in https://github.com/cocodataset/cocoapi/issues/95

Alternatively, might be due to some images having no annotations? Nope, remove those and still error

robmarkcole avatar Mar 14 '22 11:03 robmarkcole

OK the issue was missing area in the annotations

robmarkcole avatar Mar 14 '22 11:03 robmarkcole

this looks like the spec https://cocodataset.org/#format-data

for this issue, I suggest raising an exception if the keys are missing.

Thank you. Glad you got it working. I will keep the issue open as a reminder to add the better error handling.

alexheat avatar Mar 16 '22 04:03 alexheat