exif-rs icon indicating copy to clipboard operation
exif-rs copied to clipboard

`read_raw(&self, data: Vec<u8>)` should take a reference

Open flukejones opened this issue 3 years ago • 9 comments

read_raw(&self, data: Vec<u8>) should be taking a reference, eg: data: &[u8]. Is there any reason not to do so here?

flukejones avatar Sep 15 '20 22:09 flukejones

Exif fields are lazily parsed. Keeping a copy of unparsed data steam in the struct Exif makes it self-contained, and users can handle the struct without being bothered by the lifetime of the buffer.

kamadak avatar Sep 17 '20 11:09 kamadak

I understand that, but it would also be good to have an option to refer to a buffer also. Maybe both can be done?

flukejones avatar Sep 21 '20 01:09 flukejones

Ran into the same issue, and it's fatal in my case: I need to parse EXIF in uploads before saving them to disk, and copying the entire buffer (up to 50MiB in size) just to read some EXIF doesn't seem ideal. This would be a lot more palatable if I could create metadata-only copies.

1e100 avatar Nov 14 '21 23:11 1e100

There could be two possible interfaces for shared buffers:

// (1) Parses immediately, borrows nothing.
Reader::read_raw_nonlazy(..., data: &[u8]) -> Exif

// (2) Parses lazily using the borrowed buffer.
Reader::read_raw_ref<'a>(..., data: &'a [u8]) -> ExifRef<'a>

If a raw Exif block is big and you want to avoid copying data, (1) will not meet the spirit of your goal, because when 50 MiB buffer is parsed immediately, struct Exif will grow that large. So, I will go with (2) though users need to care about the lifetime of the buffer. Any opinions?

@1e100 BTW, I am curious about your use cases. Are your raw Exif blocks really that large, or are you parsing TIFF data, where TIFF image data and Exif attributes are intermixed and the total size could be 50 MiB?

kamadak avatar Dec 01 '21 13:12 kamadak

I just need a few metadata fields before I save the file that's all. Files can be quite large though. It's not just the exif block - it's the entire uploaded file

1e100 avatar Dec 01 '21 13:12 1e100

Reader::read_from_container will only keep a copy of the Exif block, so no need to worry about the size if an image is large but not the Exif block. (Except TIFF)

kamadak avatar Dec 01 '21 14:12 kamadak

Thanks for the tip, but unfortunately it could be TIFF as well. Out of curiosity (and for others to find), why is it different?

1e100 avatar Dec 01 '21 14:12 1e100

The Exif standard re-uses TIFF's tag-value format to store Exif fields. For images files other than TIFF, a small independent Exif block (which is in TIFF format) is embedded in an image file. For TIFF files, Exif fields (in TIFF format) and TIFF image data (of course in TIFF format as well) are merged. Unfortunately, some Exif fields are inherited from TIFF, and the locations of those fields are specified by the TIFF standard, thus metadata cannot be gathered in one place but are scattered in a TIFF file.

Anyway, if we have a TIFF image on memory and want to avoid copying the entire buffer, read_raw_ref could help. I will consider including such an API in the next version.

kamadak avatar Dec 01 '21 15:12 kamadak

Reader::read_raw_ref<'a>(..., data: &'a [u8]) -> ExifRef<'a>

I like this! I have &[u8] raw exif data and just need to extract and store some basic exif metadata from it, then immediately drop the Exif, the &[u8] slice, the buffer backing the slice, etc. The allocation (to_vec() on the slice) is not a big issue I guess, but it's not really needed in my use case. And I guess this is a pretty common use case.

w-flo avatar Apr 28 '23 08:04 w-flo