tensorflow-recorder icon indicating copy to clipboard operation
tensorflow-recorder copied to clipboard

Add guard for non-image files in image directory input

Open cfezequiel opened this issue 5 years ago • 4 comments
trafficstars

It would be good to add some check in case there are non-image files in an image directory.

Describe the solution you'd like A simple filter would suffice, e.g.

If not image file: 
    skip

Describe alternatives you've considered A: Do nothing - potential for tool to fail while processing data, which could waste user's time B: Filter at the DataFrame level - best not to propagate errors downstream

Additional context See client._read_image_directory.

cfezequiel avatar Oct 01 '20 21:10 cfezequiel

@cfezequiel I wonder if it makes sense to switch to tf.io.gfile.glob this way we could provide pattern for images like

'dataset-folder/*/**/*.jpg'

This change would also simplify code quite a bit

lc0 avatar Oct 22 '20 09:10 lc0

Additionally, would be beneficial to do better processing of filenames. I had a dataset collected from the internet and it had , or could potentially have ".

Currently I fixed it locally, but could be addressed in the library itself

lc0 avatar Oct 22 '20 10:10 lc0

another ping @cfezequiel @mbernico

lc0 avatar Oct 29 '20 10:10 lc0

Hi @lc0 , thanks for the feedback, and apologies for the delay in response. It seems I wasn't getting any notifications for non-PR comments in this repo. That's an interesting idea. I can see how it could simplify parsing since the image files will be in one list. I think it would add a bit more burden to the user to specify a glob instead of just the directory path, but shouldn't be a big deal. We were also thinking of possible supporting other directory structures (e.g. label/image only) for flexibility.

Regarding filename processing, could you elaborate on the problem a bit more and the solution that you came up with? Feel free to send a PR btw and I'll be happy to review it.

cfezequiel avatar Nov 10 '20 17:11 cfezequiel