vaex
vaex copied to clipboard
✨ Lazy image reading support (reading, converting, resizing)
Example:
cc @xdssio
I think the printing of the image in different sizes in an overkill, have the image always look as in the first "image" column is fine, but the resize is definitely important.
printing of the image in different sizes
this is not something vaex is doing, if you look at the code you see I created 3 image columns, just for a demo.
We might want a helper function to find all files in a dir.
Something like this orso
def get_paths(path, suffix=None, resize=None):
if os.path.isfile(path):
files = [path]
if os.path.isdir(path):
files = []
if suffix is not None:
files = [str(path) for path in Path(path).rglob(f"*{suffix}")]
else:
for suffix in ['jpg', 'png', 'jpeg', 'ppm', 'thumbnail']:
files.extend([str(path) for path in Path(path).rglob(f"*{suffix}")])
num_skipped = 0
ignores = set([])
for file in files:
try:
fobj = open(file, "rb")
is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
finally:
fobj.close()
if not is_jfif:
num_skipped += 1
logger.error(f"file {path} is corrupted - ignore")
ignores.add(file)
files = [file for file in files if file not in ignores]
return files
Hi @maartenbreddels @xdssio are there plans to continue/merge these efforts? We use Vaex for a lot of processing now and plan to use it for images in the near future :-]
@JovanVeljanoski what do you think?
Yeah, I think this is definitely worth looking into at some point soon. I think it would be quite cool do various types of pre-processing on all images instead of per-batch, especially for deep NN stuff. I wonder what impact that would have.
We are working hard on the next major version, and the roadmap for that is pretty much fixed i believe. It revolves around stabilizing features like shift
and diff
as well as major improvements to the internal "pipeline" of vaex dataframes, together with various bugfixes, performance improvements and the like.
After that I do not know what the plan is yet, so we could look into this. We typically look at what is most in demand or has the highest impact. Or if someone is willing to fund/sponsor the development of certain features, it would get a priority.
Do you agree @maartenbreddels ?
I agree, i'd like to have a feature for displaying images, but lets focus on those things you mention first, and work on an example that uses/requires images, unless funding ups the priority.
To continue the discussion. I propose an "Image" column.
df = vaex.open_images("dir/with/images/*.png", column_name='image) # orso
df = vaex.from_array(image=[np.fromstring(image_data)]) # this is for loading an image in server
df['image'].path
df['image'].shape
df['image'].pixels
# or
df['image'].array
# or
df['image'].matrix