tensorflow-recorder
tensorflow-recorder copied to clipboard
Add guard for non-image files in image directory input
It would be good to add some check in case there are non-image files in an image directory.
Describe the solution you'd like A simple filter would suffice, e.g.
If not image file:
skip
Describe alternatives you've considered A: Do nothing - potential for tool to fail while processing data, which could waste user's time B: Filter at the DataFrame level - best not to propagate errors downstream
Additional context
See client._read_image_directory.
@cfezequiel I wonder if it makes sense to switch to tf.io.gfile.glob this way we could provide pattern for images like
'dataset-folder/*/**/*.jpg'
This change would also simplify code quite a bit
Additionally, would be beneficial to do better processing of filenames. I had a dataset collected from the internet and it had , or could potentially have ".
Currently I fixed it locally, but could be addressed in the library itself
another ping @cfezequiel @mbernico
Hi @lc0 , thanks for the feedback, and apologies for the delay in response. It seems I wasn't getting any notifications for non-PR comments in this repo. That's an interesting idea. I can see how it could simplify parsing since the image files will be in one list. I think it would add a bit more burden to the user to specify a glob instead of just the directory path, but shouldn't be a big deal. We were also thinking of possible supporting other directory structures (e.g. label/image only) for flexibility.
Regarding filename processing, could you elaborate on the problem a bit more and the solution that you came up with? Feel free to send a PR btw and I'll be happy to review it.