commons-imaging icon indicating copy to clipboard operation
commons-imaging copied to clipboard

[IMAGING-319] Fix for lost first character on String

Open StefanOltmann opened this issue 1 year ago • 17 comments

I encountered a bug while testing the Kotlin Multiplatform port (https://github.com/ashampoo/kim) of this library. The bug is a critical issue causing metadata corruption during the write process.

This bug has been reported at https://issues.apache.org/jira/browse/IMAGING-319. Fortunately, a functional fix was identified by @gwlucastrig, although, for some reason, it hasn't been incorporated into a pull request. All credit for the fix goes to him.

Here is the equivalent code in my library: https://github.com/Ashampoo/kim/blob/01fbb7825d3ccf53170c4625aad63b713614919e/src/commonMain/kotlin/com/ashampoo/kim/format/tiff/write/TiffWriterLossless.kt#L257-L261

I can confirm that the fix is effective. Removing that specific line causes my unit tests to fail, particularly with the test image photo_28.jpg.

Unfortunately, I'm unable to include this image in the pull request since I don't own it. The image falls under the Unsplash license, requiring attribution, and I cannot make decisions regarding this for Apache.

StefanOltmann avatar Feb 04 '24 12:02 StefanOltmann

@StefanOltmann Thank you for your PR. You'll need a unit test to avoid future regressions and prove this fixes something.

garydgregory avatar Feb 04 '24 12:02 garydgregory

@garydgregory Do you want me add the image and add Unsplash attribution to your NOTICE file like this?

https://github.com/Ashampoo/kim/blob/5d1b28e4423e6bc240196d3ad7db25495da095c9/NOTICE.txt#L13-L15

StefanOltmann avatar Feb 04 '24 13:02 StefanOltmann

@StefanOltmann Sounds good to me, ty!

garydgregory avatar Feb 04 '24 13:02 garydgregory

Alright, this might take a bit of time. I've added it to the regular test files, but none of the existing tests are failing. I'm not entirely sure what they cover, but it seems like they are overlooking some aspects.

In my library, the JpegRewriterTest is able to identify this issue. Unfortunately, the author of ExifRewriterRoundtripTest (https://github.com/apache/commons-imaging/pull/275 by @charleshope) never had the chance to complete it. I believe that completing this test would help to showcase problem and proof that this is the correct fix.

This pull request is also related to https://issues.apache.org/jira/browse/IMAGING-351, as it addresses the same underlying problem.

Also it's worth to note that it does not always affect String fields. It can corrupt any field / offset. In the provided sample the MakerNotes get corrupted, due to an offset shift.

@kinow FYI

StefanOltmann avatar Feb 04 '24 13:02 StefanOltmann

@StefanOltmann I don't see any new tests here.

garydgregory avatar Feb 04 '24 13:02 garydgregory

I don't see any new tests here.

I did not commit it. I added the file above locally to the test folders hoping that one of the rewrite tests will pick it up and fail.

It gets actually processed, but the existing tests do not fail, because they don’t detect the issue. There is apparently a lack of coverage here.

The test that might pick it up is still disabled, because the author never finished it.

See this comment: https://github.com/apache/commons-imaging/pull/275#pullrequestreview-1305362377

The offsets are actually expected to change and must be ignored. The wrong orientation value is most likely a problem that came from changing the byte order.

I guess I need to fix that test first to have a proper unit test, before I can add an image that makes it fail.

StefanOltmann avatar Feb 04 '24 13:02 StefanOltmann

@StefanOltmann Thanks for the update. We're glad to get help to improve it all 😃

garydgregory avatar Feb 04 '24 14:02 garydgregory

@garydgregory I have fixed the test first. As soon you pulled the change I will refresh this PR.

StefanOltmann avatar Feb 24 '24 09:02 StefanOltmann

@garydgregory I have fixed the test first. As soon you pulled the change I will refresh this PR.

See my comments in your other PR. To summarize, please provide your main change and test in the same PR.

garydgregory avatar Feb 24 '24 16:02 garydgregory

See my comments in your other PR. To summarize, please provide your main change and test in the same PR.

As I commented on the other PR you will not see a proof that it fixes something then.

You demanded a proof, because you won't believe my words. So I fixed the roundtrip test that it actually tests something and will show the problem.

If I commit the test and the fix together you won't see it.

This is why I instructed you to drop the test file locally to see it fail.

StefanOltmann avatar Feb 24 '24 17:02 StefanOltmann

@garydgregory As you requested #367 is a combined PR of bugfix and repaired unit test plus a file that shows the problem.

StefanOltmann avatar Feb 24 '24 18:02 StefanOltmann

A PR should include main and test changes, not one PR for test changes, and a different PR for main changes.

garydgregory avatar Feb 28 '24 15:02 garydgregory

A PR should include main and test changes, not one PR for test changes, and a different PR for main changes.

Yes, but the bugfix and the test are unrelated.

One PR fixes a bug without a test. The other PR completes a test that’s right now disabled, because it wasn’t working.

StefanOltmann avatar Feb 28 '24 15:02 StefanOltmann

@kinow

Oh, I just see that @gwlucastrig once provided a PR for his fix in #203.

For some reason he closed the PR a few months later, but the fix is still valid.

Looks like a unit test was missing. I provided that with #366, but unfortunately we don't have a test image that can be used to proof the test. You will need to verify that locally without checking the image in.

You could also pull his PR, because he is the one who found and fixed the bug.

StefanOltmann avatar Mar 10 '24 21:03 StefanOltmann

Oh, I will have to re-read the issues and pull requests as I'm a bit out of the loop, but thanks for pinging me here, @StefanOltmann ! I know @gwlucastrig from previous issues and emails. Let's give him some days, and I can ping him by email too if needed. I had a look and the PR was indeed closed... but I cannot recall if there was any reason for that :disappointed_relieved:

You could also pull his PR, because he is the one who found and fixed the bug.

Sounds good too, although I am sure he wouldn't mind, will keep it in mind to see if I can pull the branches locally and maybe merge commits or combine to give credit where it's due. I will add something in the changes.xml that's used to publish the website as well.

Thanks @StefanOltmann !

kinow avatar Mar 10 '24 21:03 kinow

(in the meantime, approved the GH Actions to run -- gotta go :runner: )

kinow avatar Mar 10 '24 21:03 kinow

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 71.58%. Comparing base (fd2b5b8) to head (416ce7e). Report is 79 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #359      +/-   ##
============================================
+ Coverage     71.48%   71.58%   +0.09%     
- Complexity     3580     3584       +4     
============================================
  Files           352      352              
  Lines         16919    16907      -12     
  Branches       2648     2648              
============================================
+ Hits          12095    12103       +8     
+ Misses         3785     3764      -21     
- Partials       1039     1040       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 10 '24 21:03 codecov-commenter

Still no unit test. I'm not sure how we can avoid a future regression without one. 😞

garydgregory avatar Mar 25 '24 11:03 garydgregory

Still no unit test. I'm not sure how we can avoid a future regression without one. 😞

Wrong. There is a unit test in #366.

StefanOltmann avatar Mar 25 '24 14:03 StefanOltmann

Reviewed #366, I think that one can be merged. That pull request fixes a @Disabled test that does the roud-trip test for EXIF data.

Once that's merged, and the test is re-enabled, there is a test image in #366 that should fail. I think here we have to:

  1. Wait for #366 to be merged
  2. Use the test image to confirm the test fails for that image
  3. Check out this branch, re-run the test with the new image
  4. Review the license for that image, and if we really cannot use it, then decide what to do (I would spend some time checking if a similar image can be crafted or found somewhere else)
  5. Once the image is added, confirm the code here is being covered by the test+new image, and approve and merge! :rocket:

Cheers

kinow avatar Mar 29 '24 16:03 kinow

@kinow Great news, thanks!

Maybe we can use the image attached to https://issues.apache.org/jira/browse/IMAGING-319. I just asked the reporter.

StefanOltmann avatar Mar 29 '24 18:03 StefanOltmann

@kinow Great news, thanks!

Maybe we can use the image attached to https://issues.apache.org/jira/browse/IMAGING-319. I just asked the reporter.

Excellent idea! That'd save us a lot of time, thanks for the initiative, @StefanOltmann :+1:

kinow avatar Mar 29 '24 20:03 kinow

Merged your other PR, @StefanOltmann (thanks very much for that), I'll push a new commit here but that'll be just to rebase it for now :+1:

kinow avatar Apr 13 '24 16:04 kinow

@kinow Yes, you can pull this one or the original #203, which are both the same change @gwlucastrig proposed in JIRA. Credit for this should solely go to him. :)

Unfortunately the author of https://issues.apache.org/jira/browse/IMAGING-319 did not respond. So we could ask legal if attaching an image to the JIRA ticket can be seen as a permission to use that one. Just to make progress and have this nasty bug fixed for 1.0-alpha5

StefanOltmann avatar Apr 13 '24 16:04 StefanOltmann

Thanks @StefanOltmann . I searched my hard drive for JPEG images to check if I could reproduce the issue with any, but only the IMAGING-319 had this issue (maybe it happened in another field, but I only tested for the Offset Time like the issue reported).

package org.apache.commons.imaging;

import org.apache.commons.imaging.common.ImageMetadata;
import org.apache.commons.imaging.formats.jpeg.JpegImageMetadata;
import org.apache.commons.imaging.formats.jpeg.exif.ExifRewriter;
import org.apache.commons.imaging.formats.tiff.TiffImageMetadata;
import org.apache.commons.imaging.formats.tiff.write.TiffOutputSet;
import org.apache.commons.io.IOUtils;

import java.io.BufferedOutputStream;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.Objects;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Search for images that cause the bug in IMAGING-319.
 */
public class LibraryTest {

    public static void main(String[] args) throws ImagingException, IOException, InterruptedException {
        File path = new File("/home/kinow/Desktop/haystack/");
        String needle = "Offset Time\\s+:\\s+";
        Pattern pattern = Pattern.compile(String.format("^%s$", needle), Pattern.MULTILINE);
        for (File source : Objects.requireNonNull(path.listFiles(s -> s.getName().toLowerCase().endsWith(".jpg")))) {
            File result = Files.createTempFile("test_", ".jpg").toFile();
            try {
                final ImageMetadata metadata = Imaging.getMetadata(source);
                final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
                if (jpegMetadata != null) {
                    final TiffImageMetadata exif = jpegMetadata.getExif();
                    if (exif != null) {
                        TiffOutputSet outputSet = exif.getOutputSet();
                        BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
                        new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);

                        String[] cmd = {
                                "/bin/sh",
                                "-c",
                                String.format("exiftool %s | grep \"Offset Time\"", result.getAbsolutePath())
                        };
                        Process process = Runtime.getRuntime().exec(cmd);
                        process.waitFor();
                        String output = IOUtils.toString(process.getInputStream(), Charset.defaultCharset());
                        // System.out.println(output);

                        Matcher m = pattern.matcher(output);
                        if (m.find()) {
                            System.out.printf("Bug detected in %s%n", source.getAbsolutePath());
                        }
                    }
                }
            } catch (RuntimeException|ImagingException ignore) {
                ignore.printStackTrace();
            } // invalid metadata or bad format
        }
    }
}

I will debug the code again to either craft a test image by hand, or write a test code + mock. Let's see if that works...

kinow avatar Apr 13 '24 21:04 kinow

@kinow I spent some time looking around different sources, too. Luckily I just found another test image.

It's https://github.com/drewnoakes/metadata-extractor-images/blob/main/jpg/Sony%20DigitalMavica.jpg

There is no LICENSE file in the repository, but the README says You are free to use these media files however you wish.

For my metadata library I used some other images of that repository, too. I added them to my NOTICE for completeness: https://github.com/Ashampoo/kim/blob/9ab5fb627d3d450286c94d1d7bb0f3335629c6c2/NOTICE.txt#L17C1-L19C73

But I believe this is not strictly necessary in this case.

What do you think? Sounds like "public domain with no attribution required" to me.

StefanOltmann avatar Apr 13 '24 21:04 StefanOltmann

Thanks for searching @StefanOltmann !

I modified the script to search for any field in the diff output that is empty, and ran on some personal photos and found one!

Code:

package org.apache.commons.imaging;

import org.apache.commons.imaging.common.ImageMetadata;
import org.apache.commons.imaging.formats.jpeg.JpegImageMetadata;
import org.apache.commons.imaging.formats.jpeg.exif.ExifRewriter;
import org.apache.commons.imaging.formats.tiff.TiffImageMetadata;
import org.apache.commons.imaging.formats.tiff.write.TiffOutputSet;
import org.apache.commons.io.IOUtils;

import java.io.BufferedOutputStream;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.Objects;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Search for images that cause the bug in IMAGING-319.
 */
public class LibraryTest {
//
//    public static void main2(String[] args) throws ImagingException, IOException {
//        File source = new File("/home/kinow/Desktop/haystack/20220514_102409.jpg");
//        File result = new File("/home/kinow/Desktop/haystack/editted-20220514_102409.jpg");
//        final ImageMetadata metadata = Imaging.getMetadata(source);
//        final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
//        final TiffImageMetadata exif = jpegMetadata.getExif();
//        TiffOutputSet outputSet = exif.getOutputSet();
//        BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
//        new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);
//    }

    public static void main(String[] args) throws ImagingException, IOException, InterruptedException {
        File path = new File("/home/kinow/Desktop/haystack/");
        String needle = ".*([\\s+])*\\s+:\\s+";
        Pattern pattern = Pattern.compile(String.format("^%s$", needle), Pattern.MULTILINE);
        for (File source : Objects.requireNonNull(path.listFiles(s -> s.getName().toLowerCase().endsWith(".jpg")))) {
            File result = Files.createTempFile("test_", ".jpg").toFile();
            try {
                final ImageMetadata metadata = Imaging.getMetadata(source);
                final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
                if (jpegMetadata != null) {
                    final TiffImageMetadata exif = jpegMetadata.getExif();
                    if (exif != null) {
                        TiffOutputSet outputSet = exif.getOutputSet();
                        BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
                        new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);

                        String[] cmd = {
                                "/bin/sh",
                                "-c",
                                String.format("/home/kinow/Desktop/haystack/test.sh %s %s", source.getAbsolutePath(), result.getAbsolutePath())
                                // String.format("exiftool %s | grep \"Offset Time\"", result.getAbsolutePath())
                        };
                        Process process = Runtime.getRuntime().exec(cmd);
                        process.waitFor();
                        String output = IOUtils.toString(process.getInputStream(), Charset.defaultCharset());
                        // System.out.println(output);

                        Matcher m = pattern.matcher(output);
                        if (m.find()) {
                            System.out.printf("Bug detected in %s%n", source.getAbsolutePath());
                        }
                    }
                }
            } catch (RuntimeException|ImagingException ignore) {
                ignore.printStackTrace();
            } // invalid metadata or bad format
        }
    }
}

Test image (I drew something some years ago, covered parts of it that were irrelevant).

20220514_102409

To confirm this is causing the issue, I tried the following:

  • Copied it in data.images.jpg.8, ran ExifRewriterRoundtripTest and it failed for that file (master)
  • Changed to the code of this pull request, all tests passed! :tada:
  • Then re-ran the sample code from JIRA, then compared EXIF metadata with exiftool + diff
kinow@ranma:~/Desktop/haystack$ diff <(exiftool 20220514_102409.jpg) <(exiftool editted-20220514_102409.jpg)
2c2
< File Name                       : 20220514_102409.jpg
---
> File Name                       : editted-20220514_102409.jpg
5,8c5,8
< File Modification Date/Time     : 2024:04:14 00:29:34+02:00
< File Access Date/Time           : 2024:04:14 00:29:37+02:00
< File Inode Change Date/Time     : 2024:04:14 00:29:34+02:00
< File Permissions                : -rwxrwxrwx
---
> File Modification Date/Time     : 2024:04:14 00:36:13+02:00
> File Access Date/Time           : 2024:04:14 00:36:13+02:00
> File Inode Change Date/Time     : 2024:04:14 00:36:13+02:00
> File Permissions                : -rw-rw-r--
15c15
< Camera Model Name               : SM-S901E
---
> Camera Model Name               : 
29c29
< Offset Time                     : +12:00
---
> Offset Time                     : +12:00.
40c40
< Thumbnail Offset                : 586
---
> Thumbnail Offset                : 38

(Camera Model Name ended up being replaced by empty string, like the JIRA example)

  • Finally, added a quick test code in TiffImageWriterLossless.java (fixed in this branch) with the condition where the bug was fixed, see below
                if (offset != length && (offset & 1L) != 0 && length > outputItemLength) {
                    int i = 1; // breakpoint here to just confirm the file triggers this...
                }

And the breakpoint was reached for the iPhone photo from JIRA, and also for my image file :tada:

I will clean up my local modifications, and add my test image to this branch, then rebase + squash. Feel free to check the image file and confirm it if you'd like, @StefanOltmann. Thank you for the help, and for the JIRA OP for providing a useful code snippet to verify the issue.

kinow avatar Apr 13 '24 22:04 kinow

I am glad that we finally have a usable test image for this PR! 🎉💪

StefanOltmann avatar Apr 13 '24 22:04 StefanOltmann

@kinow By porting this library to Kotlin I found and fixed many minor bugs, but this was the only really serious one as it corrupts users data in a non-recoverable way. (For example the problem of dropping the GpsVersionId, which is optional and has always the same value if the file is written/modified by ExifTool, is not a serious bug.)

I don’t know what the planned release cycles are, but the last release was two years ago. I would recommend against allowing this bug to stay for another two years. People might use this library in production, so I think the next release should be soon. You may agree that the problem is serious enough.

(I don’t understand why Gary made a release with knowledge of this bug anyway.)

StefanOltmann avatar Apr 13 '24 23:04 StefanOltmann

@kinow By porting this library to Kotlin I found and fixed many minor bugs, but this was the only really serious one as it corrupts users data in a non-recoverable way. (For example the problem of dropping the GpsVersionId, which is optional and has always the same value if the file is written/modified by ExifTool, is not a serious bug.)

:+1:

I don’t know what the planned release cycles are, but the last release was two years ago. I would recommend against allowing this bug to stay for another two years. People might use this library in production, so I think the next release should be soon. You may agree that the problem is serious enough.

We do not really have planned release cycles. Sometimes volunteers have time for cutting releases, sometimes they get busy with other projects, $work, life, etc. There are times where we have a lot of people active, and times of low bandwidth.

For me it is more or less based on weather and some yearly milestones (cold? ==> open source for me, taxes season or good weather ==> no open source for a while for me).

And for the past few years I have been working with Python on HPC with climate models. So I don't have $work time to dedicate to open source Java projects like Commons. But if I work again on Java projects, I will probably be more active around here.

(I don’t understand why Gary made a release with knowledge of this bug anyway.)

RERO, release early, release often (as often as we can). If we have unreleased changes, and one of us (Apache committers) has time, it's common that we just go around releasing multiple components so users can use at least what was not released but fixed.

Unfortunately, I think Gary is the only one that has been doing that, so I must thank him for going around and keeping the release train going on Apache Commons components. We may be slow to fix/release, but believe me, without what Gary is doing, we would definitely be much slower!

Thanks!

kinow avatar Apr 13 '24 23:04 kinow