vaex icon indicating copy to clipboard operation
vaex copied to clipboard

✨ Lazy image reading support (reading, converting, resizing)

Open maartenbreddels opened this issue 3 years ago • 8 comments

Example:

image

cc @xdssio

maartenbreddels avatar Mar 26 '21 14:03 maartenbreddels

I think the printing of the image in different sizes in an overkill, have the image always look as in the first "image" column is fine, but the resize is definitely important.

xdssio avatar Apr 13 '21 09:04 xdssio

printing of the image in different sizes

this is not something vaex is doing, if you look at the code you see I created 3 image columns, just for a demo.

maartenbreddels avatar Apr 13 '21 10:04 maartenbreddels

We might want a helper function to find all files in a dir.

Something like this orso

def get_paths(path, suffix=None, resize=None):
    if os.path.isfile(path):
        files = [path]
    if os.path.isdir(path):
        files = []
        if suffix is not None:
            files = [str(path) for path in Path(path).rglob(f"*{suffix}")]
        else:
            for suffix in ['jpg', 'png', 'jpeg', 'ppm', 'thumbnail']:
                files.extend([str(path) for path in Path(path).rglob(f"*{suffix}")])
    num_skipped = 0
    ignores = set([])
    for file in files:
        try:
            fobj = open(file, "rb")
            is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
        finally:
            fobj.close()
        if not is_jfif:
            num_skipped += 1
            logger.error(f"file {path} is corrupted - ignore")
            ignores.add(file)
    files = [file for file in files if file not in ignores]
 return files

xdssio avatar Apr 13 '21 10:04 xdssio

Hi @maartenbreddels @xdssio are there plans to continue/merge these efforts? We use Vaex for a lot of processing now and plan to use it for images in the near future :-]

Ben-Epstein avatar Oct 04 '21 13:10 Ben-Epstein

@JovanVeljanoski what do you think?

maartenbreddels avatar Oct 04 '21 13:10 maartenbreddels

Yeah, I think this is definitely worth looking into at some point soon. I think it would be quite cool do various types of pre-processing on all images instead of per-batch, especially for deep NN stuff. I wonder what impact that would have.

We are working hard on the next major version, and the roadmap for that is pretty much fixed i believe. It revolves around stabilizing features like shift and diff as well as major improvements to the internal "pipeline" of vaex dataframes, together with various bugfixes, performance improvements and the like.

After that I do not know what the plan is yet, so we could look into this. We typically look at what is most in demand or has the highest impact. Or if someone is willing to fund/sponsor the development of certain features, it would get a priority.

Do you agree @maartenbreddels ?

JovanVeljanoski avatar Oct 04 '21 18:10 JovanVeljanoski

I agree, i'd like to have a feature for displaying images, but lets focus on those things you mention first, and work on an example that uses/requires images, unless funding ups the priority.

maartenbreddels avatar Oct 05 '21 19:10 maartenbreddels

To continue the discussion. I propose an "Image" column.

df = vaex.open_images("dir/with/images/*.png", column_name='image) # orso
df = vaex.from_array(image=[np.fromstring(image_data)]) # this is for loading an image in server

df['image'].path
df['image'].shape

df['image'].pixels
# or
df['image'].array
# or 
df['image'].matrix

xdssio avatar Oct 07 '21 09:10 xdssio