pylabel
pylabel copied to clipboard
IndexError: index 8 is out of bounds for axis 0 with size 7
Code:
#Specify path to the coco.json file
path_to_annotations = "data/coco.json"
#Specify the path to the images (if they are in a different folder than the annotations)
path_to_images = "data/image_patches/"
#Import the dataset into the pylable schema
dataset = importer.ImportCoco(path_to_annotations, path_to_images=path_to_images, name="tanks_coco")
dataset.df.head(5)
Error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb Cell 7' in <cell line: 7>()
[4](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=3)[ path_to_images = "data/image_patches/"
]()[6](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=5)[ #Import the dataset into the pylable schema
----> ]()[7](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=6)[ dataset = importer.ImportCoco(path_to_annotations, path_to_images=path_to_images, name="tanks_coco")
]()[8](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=7)[ dataset.df.head(5)
File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py:94, in ImportCoco(path, path_to_images, name)
]()[89](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=88)[ df.ann_category_id = df.ann_category_id.astype(str)
]()[91](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=90)[ df[
]()[92](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=91)[ ["ann_bbox_xmin", "ann_bbox_ymin", "ann_bbox_width", "ann_bbox_height"]
]()[93](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=92)[ ] = pd.DataFrame(df.ann_bbox.tolist(), index=df.index)
---> ]()[94](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=93)[ df.insert(8, "ann_bbox_xmax", df["ann_bbox_xmin"] + df["ann_bbox_width"])
]()[95](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=94)[ df.insert(10, "ann_bbox_ymax", df["ann_bbox_ymin"] + df["ann_bbox_height"])
]()[97](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=96)[ # debug print(df.info())
]()[98](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=97)[
]()[99](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=98)[ # Join the annotions with the information about the image to add the image columns to the dataframe
File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py:4439, in DataFrame.insert(self, loc, column, value, allow_duplicates)
]()[4436](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4435)[ raise TypeError("loc must be int")
]()[4438](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4437)[ value = self._sanitize_column(value)
-> ]()[4439](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4438)[ self._mgr.insert(loc, column, value)]()
File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py:1230, in BlockManager.insert(self, loc, item, value)
[1220](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1219)[ """
]()[1221](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1220)[ Insert item at selected position.
]()[1222](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1221)[
(...)
]()[1227](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1226)[ value : np.ndarray or ExtensionArray
]()[1228](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1227)[ """
]()[1229](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1228)[ # insert to the axis; this could possibly raise a TypeError
-> ]()[1230](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1229)[ new_axis = self.items.insert(loc, item)
]()[1232](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1231)[ if value.ndim == 2:
]()[1233](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1232)[ value = value.T
File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:6602, in Index.insert(self, loc, item)
]()[6595](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6594)[ if arr.dtype != object or not isinstance(
]()[6596](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6595)[ item, (tuple, np.datetime64, np.timedelta64)
]()[6597](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6596)[ ):
]()[6598](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6597)[ # with object-dtype we need to worry about numpy incorrectly casting
]()[6599](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6598)[ # dt64/td64 to integer, also about treating tuples as sequences
]()[6600](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6599)[ # special-casing dt64/td64 https://github.com/numpy/numpy/issues/12550
]()[6601](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6600)[ casted = arr.dtype.type(item)
-> ]()[6602](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6601)[ new_values = np.insert(arr, loc, casted)
]()[6604](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6603)[ else:
]()[6605](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6604)[ # No overload variant of "insert" matches argument types
]()[6606](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6605)[ # "ndarray[Any, Any]", "int", "None" [call-overload]
]()[6607](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6606)[ new_values = np.insert(arr, loc, None) # type: ignore[call-overload]
File <__array_function__ internals>:180, in insert(*args, **kwargs)
File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py:5280, in insert(arr, obj, values, axis)
]()[5278](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5277)[ index = indices.item()
]()[5279](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5278)[ if index < -N or index > N:
-> ]()[5280](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5279)[ raise IndexError(f"index {obj} is out of bounds for axis {axis} "
]()[5281](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5280)[ f"with size {N}")
]()[5282](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5281)[ if (index < 0):
]()[5283](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5282)[ index += N
IndexError: index 8 is out of bounds for axis 0 with size 7]()
Hello @robmarkcole , could you share your dataset with me? That will help me to debug the issue
Hi @alexheat it is https://www.kaggle.com/towardsentropy/oil-storage-tanks
Thank you. I will try to check it today
On Fri, Mar 11, 2022 at 7:52 AM Robin Cole @.***> wrote:
Hi @alexheat https://github.com/alexheat it is https://www.kaggle.com/towardsentropy/oil-storage-tanks
— Reply to this email directly, view it on GitHub <pylabel-project/pylabel#33>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5OL775ZZRSRVMYNJKC5ODU7NT3NANCNFSM5QPYAFWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Hello @robmarkcole I figured out what the problem is. The .json file for that dataset is missing some of the fields that are usually in a COCO annotation file. For example it doesn't have the height or width of the images:
{
"id": 5142,
"file_name": "52_4_2.jpg"
},
It also doesn't have some of the usual metadata about the annotations.
Unfortunately this library won't work with that dataset. If you are really stuck I can try and help work around the issue. (Which would require some more Python code to either augment the JSON or borrow some code from this library or add some additional logic to this library.)
@alexheat do you know of any validators that could be run against the file? Did a brief search but did not find one, although https://github.com/levan92/cocojson looks useful here
I am not aware of any validation tool or even an official place where the schema is documented. But the image height and width are included in the original COCO dataset. The height and width are required information to convert to other formats like YOLO because it requires calculating the size of the bounding box relative to the size of the image.
this looks like the spec https://cocodataset.org/#format-data
for this issue, I suggest raising an exception if the keys are missing.
I have updated the json file but still get the error... file on https://gist.github.com/robmarkcole/286951efe0830d3e205666f68c2b490e
Possibly due to mismatch of keys using id
and image_id
.
Update, the following did not resolve:
for entry in in_json['annotations']:
id = entry['image_id']
entry['id'] = id
Possibly some conflict as raised in https://github.com/cocodataset/cocoapi/issues/95
Alternatively, might be due to some images having no annotations? Nope, remove those and still error
OK the issue was missing area
in the annotations
this looks like the spec https://cocodataset.org/#format-data
for this issue, I suggest raising an exception if the keys are missing.
Thank you. Glad you got it working. I will keep the issue open as a reminder to add the better error handling.