[IMAGING-319] Fix for lost first character on String
I encountered a bug while testing the Kotlin Multiplatform port (https://github.com/ashampoo/kim) of this library. The bug is a critical issue causing metadata corruption during the write process.
This bug has been reported at https://issues.apache.org/jira/browse/IMAGING-319. Fortunately, a functional fix was identified by @gwlucastrig, although, for some reason, it hasn't been incorporated into a pull request. All credit for the fix goes to him.
Here is the equivalent code in my library: https://github.com/Ashampoo/kim/blob/01fbb7825d3ccf53170c4625aad63b713614919e/src/commonMain/kotlin/com/ashampoo/kim/format/tiff/write/TiffWriterLossless.kt#L257-L261
I can confirm that the fix is effective. Removing that specific line causes my unit tests to fail, particularly with the test image photo_28.jpg.
Unfortunately, I'm unable to include this image in the pull request since I don't own it. The image falls under the Unsplash license, requiring attribution, and I cannot make decisions regarding this for Apache.
@StefanOltmann Thank you for your PR. You'll need a unit test to avoid future regressions and prove this fixes something.
@garydgregory Do you want me add the image and add Unsplash attribution to your NOTICE file like this?
https://github.com/Ashampoo/kim/blob/5d1b28e4423e6bc240196d3ad7db25495da095c9/NOTICE.txt#L13-L15
@StefanOltmann Sounds good to me, ty!
Alright, this might take a bit of time. I've added it to the regular test files, but none of the existing tests are failing. I'm not entirely sure what they cover, but it seems like they are overlooking some aspects.
In my library, the JpegRewriterTest is able to identify this issue. Unfortunately, the author of ExifRewriterRoundtripTest (https://github.com/apache/commons-imaging/pull/275 by @charleshope) never had the chance to complete it. I believe that completing this test would help to showcase problem and proof that this is the correct fix.
This pull request is also related to https://issues.apache.org/jira/browse/IMAGING-351, as it addresses the same underlying problem.
Also it's worth to note that it does not always affect String fields. It can corrupt any field / offset. In the provided sample the MakerNotes get corrupted, due to an offset shift.
@kinow FYI
@StefanOltmann I don't see any new tests here.
I don't see any new tests here.
I did not commit it. I added the file above locally to the test folders hoping that one of the rewrite tests will pick it up and fail.
It gets actually processed, but the existing tests do not fail, because they don’t detect the issue. There is apparently a lack of coverage here.
The test that might pick it up is still disabled, because the author never finished it.
See this comment: https://github.com/apache/commons-imaging/pull/275#pullrequestreview-1305362377
The offsets are actually expected to change and must be ignored. The wrong orientation value is most likely a problem that came from changing the byte order.
I guess I need to fix that test first to have a proper unit test, before I can add an image that makes it fail.
@StefanOltmann Thanks for the update. We're glad to get help to improve it all 😃
@garydgregory I have fixed the test first. As soon you pulled the change I will refresh this PR.
@garydgregory I have fixed the test first. As soon you pulled the change I will refresh this PR.
See my comments in your other PR. To summarize, please provide your main change and test in the same PR.
See my comments in your other PR. To summarize, please provide your main change and test in the same PR.
As I commented on the other PR you will not see a proof that it fixes something then.
You demanded a proof, because you won't believe my words. So I fixed the roundtrip test that it actually tests something and will show the problem.
If I commit the test and the fix together you won't see it.
This is why I instructed you to drop the test file locally to see it fail.
@garydgregory As you requested #367 is a combined PR of bugfix and repaired unit test plus a file that shows the problem.
A PR should include main and test changes, not one PR for test changes, and a different PR for main changes.
A PR should include main and test changes, not one PR for test changes, and a different PR for main changes.
Yes, but the bugfix and the test are unrelated.
One PR fixes a bug without a test. The other PR completes a test that’s right now disabled, because it wasn’t working.
@kinow
Oh, I just see that @gwlucastrig once provided a PR for his fix in #203.
For some reason he closed the PR a few months later, but the fix is still valid.
Looks like a unit test was missing. I provided that with #366, but unfortunately we don't have a test image that can be used to proof the test. You will need to verify that locally without checking the image in.
You could also pull his PR, because he is the one who found and fixed the bug.
Oh, I will have to re-read the issues and pull requests as I'm a bit out of the loop, but thanks for pinging me here, @StefanOltmann ! I know @gwlucastrig from previous issues and emails. Let's give him some days, and I can ping him by email too if needed. I had a look and the PR was indeed closed... but I cannot recall if there was any reason for that :disappointed_relieved:
You could also pull his PR, because he is the one who found and fixed the bug.
Sounds good too, although I am sure he wouldn't mind, will keep it in mind to see if I can pull the branches locally and maybe merge commits or combine to give credit where it's due. I will add something in the changes.xml that's used to publish the website as well.
Thanks @StefanOltmann !
(in the meantime, approved the GH Actions to run -- gotta go :runner: )
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 71.58%. Comparing base (
fd2b5b8) to head (416ce7e). Report is 79 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #359 +/- ##
============================================
+ Coverage 71.48% 71.58% +0.09%
- Complexity 3580 3584 +4
============================================
Files 352 352
Lines 16919 16907 -12
Branches 2648 2648
============================================
+ Hits 12095 12103 +8
+ Misses 3785 3764 -21
- Partials 1039 1040 +1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Still no unit test. I'm not sure how we can avoid a future regression without one. 😞
Still no unit test. I'm not sure how we can avoid a future regression without one. 😞
Wrong. There is a unit test in #366.
Reviewed #366, I think that one can be merged. That pull request fixes a @Disabled test that does the roud-trip test for EXIF data.
Once that's merged, and the test is re-enabled, there is a test image in #366 that should fail. I think here we have to:
- Wait for #366 to be merged
- Use the test image to confirm the test fails for that image
- Check out this branch, re-run the test with the new image
- Review the license for that image, and if we really cannot use it, then decide what to do (I would spend some time checking if a similar image can be crafted or found somewhere else)
- Once the image is added, confirm the code here is being covered by the test+new image, and approve and merge! :rocket:
Cheers
@kinow Great news, thanks!
Maybe we can use the image attached to https://issues.apache.org/jira/browse/IMAGING-319. I just asked the reporter.
@kinow Great news, thanks!
Maybe we can use the image attached to https://issues.apache.org/jira/browse/IMAGING-319. I just asked the reporter.
Excellent idea! That'd save us a lot of time, thanks for the initiative, @StefanOltmann :+1:
Merged your other PR, @StefanOltmann (thanks very much for that), I'll push a new commit here but that'll be just to rebase it for now :+1:
@kinow Yes, you can pull this one or the original #203, which are both the same change @gwlucastrig proposed in JIRA. Credit for this should solely go to him. :)
Unfortunately the author of https://issues.apache.org/jira/browse/IMAGING-319 did not respond. So we could ask legal if attaching an image to the JIRA ticket can be seen as a permission to use that one. Just to make progress and have this nasty bug fixed for 1.0-alpha5
Thanks @StefanOltmann . I searched my hard drive for JPEG images to check if I could reproduce the issue with any, but only the IMAGING-319 had this issue (maybe it happened in another field, but I only tested for the Offset Time like the issue reported).
package org.apache.commons.imaging;
import org.apache.commons.imaging.common.ImageMetadata;
import org.apache.commons.imaging.formats.jpeg.JpegImageMetadata;
import org.apache.commons.imaging.formats.jpeg.exif.ExifRewriter;
import org.apache.commons.imaging.formats.tiff.TiffImageMetadata;
import org.apache.commons.imaging.formats.tiff.write.TiffOutputSet;
import org.apache.commons.io.IOUtils;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.Objects;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Search for images that cause the bug in IMAGING-319.
*/
public class LibraryTest {
public static void main(String[] args) throws ImagingException, IOException, InterruptedException {
File path = new File("/home/kinow/Desktop/haystack/");
String needle = "Offset Time\\s+:\\s+";
Pattern pattern = Pattern.compile(String.format("^%s$", needle), Pattern.MULTILINE);
for (File source : Objects.requireNonNull(path.listFiles(s -> s.getName().toLowerCase().endsWith(".jpg")))) {
File result = Files.createTempFile("test_", ".jpg").toFile();
try {
final ImageMetadata metadata = Imaging.getMetadata(source);
final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
if (jpegMetadata != null) {
final TiffImageMetadata exif = jpegMetadata.getExif();
if (exif != null) {
TiffOutputSet outputSet = exif.getOutputSet();
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);
String[] cmd = {
"/bin/sh",
"-c",
String.format("exiftool %s | grep \"Offset Time\"", result.getAbsolutePath())
};
Process process = Runtime.getRuntime().exec(cmd);
process.waitFor();
String output = IOUtils.toString(process.getInputStream(), Charset.defaultCharset());
// System.out.println(output);
Matcher m = pattern.matcher(output);
if (m.find()) {
System.out.printf("Bug detected in %s%n", source.getAbsolutePath());
}
}
}
} catch (RuntimeException|ImagingException ignore) {
ignore.printStackTrace();
} // invalid metadata or bad format
}
}
}
I will debug the code again to either craft a test image by hand, or write a test code + mock. Let's see if that works...
@kinow I spent some time looking around different sources, too. Luckily I just found another test image.
It's https://github.com/drewnoakes/metadata-extractor-images/blob/main/jpg/Sony%20DigitalMavica.jpg
There is no LICENSE file in the repository, but the README says You are free to use these media files however you wish.
For my metadata library I used some other images of that repository, too. I added them to my NOTICE for completeness: https://github.com/Ashampoo/kim/blob/9ab5fb627d3d450286c94d1d7bb0f3335629c6c2/NOTICE.txt#L17C1-L19C73
But I believe this is not strictly necessary in this case.
What do you think? Sounds like "public domain with no attribution required" to me.
Thanks for searching @StefanOltmann !
I modified the script to search for any field in the diff output that is empty, and ran on some personal photos and found one!
Code:
package org.apache.commons.imaging;
import org.apache.commons.imaging.common.ImageMetadata;
import org.apache.commons.imaging.formats.jpeg.JpegImageMetadata;
import org.apache.commons.imaging.formats.jpeg.exif.ExifRewriter;
import org.apache.commons.imaging.formats.tiff.TiffImageMetadata;
import org.apache.commons.imaging.formats.tiff.write.TiffOutputSet;
import org.apache.commons.io.IOUtils;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.Objects;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Search for images that cause the bug in IMAGING-319.
*/
public class LibraryTest {
//
// public static void main2(String[] args) throws ImagingException, IOException {
// File source = new File("/home/kinow/Desktop/haystack/20220514_102409.jpg");
// File result = new File("/home/kinow/Desktop/haystack/editted-20220514_102409.jpg");
// final ImageMetadata metadata = Imaging.getMetadata(source);
// final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
// final TiffImageMetadata exif = jpegMetadata.getExif();
// TiffOutputSet outputSet = exif.getOutputSet();
// BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
// new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);
// }
public static void main(String[] args) throws ImagingException, IOException, InterruptedException {
File path = new File("/home/kinow/Desktop/haystack/");
String needle = ".*([\\s+])*\\s+:\\s+";
Pattern pattern = Pattern.compile(String.format("^%s$", needle), Pattern.MULTILINE);
for (File source : Objects.requireNonNull(path.listFiles(s -> s.getName().toLowerCase().endsWith(".jpg")))) {
File result = Files.createTempFile("test_", ".jpg").toFile();
try {
final ImageMetadata metadata = Imaging.getMetadata(source);
final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
if (jpegMetadata != null) {
final TiffImageMetadata exif = jpegMetadata.getExif();
if (exif != null) {
TiffOutputSet outputSet = exif.getOutputSet();
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);
String[] cmd = {
"/bin/sh",
"-c",
String.format("/home/kinow/Desktop/haystack/test.sh %s %s", source.getAbsolutePath(), result.getAbsolutePath())
// String.format("exiftool %s | grep \"Offset Time\"", result.getAbsolutePath())
};
Process process = Runtime.getRuntime().exec(cmd);
process.waitFor();
String output = IOUtils.toString(process.getInputStream(), Charset.defaultCharset());
// System.out.println(output);
Matcher m = pattern.matcher(output);
if (m.find()) {
System.out.printf("Bug detected in %s%n", source.getAbsolutePath());
}
}
}
} catch (RuntimeException|ImagingException ignore) {
ignore.printStackTrace();
} // invalid metadata or bad format
}
}
}
Test image (I drew something some years ago, covered parts of it that were irrelevant).
To confirm this is causing the issue, I tried the following:
- Copied it in
data.images.jpg.8, ranExifRewriterRoundtripTestand it failed for that file (master) - Changed to the code of this pull request, all tests passed! :tada:
- Then re-ran the sample code from JIRA, then compared EXIF metadata with
exiftool+diff
kinow@ranma:~/Desktop/haystack$ diff <(exiftool 20220514_102409.jpg) <(exiftool editted-20220514_102409.jpg)
2c2
< File Name : 20220514_102409.jpg
---
> File Name : editted-20220514_102409.jpg
5,8c5,8
< File Modification Date/Time : 2024:04:14 00:29:34+02:00
< File Access Date/Time : 2024:04:14 00:29:37+02:00
< File Inode Change Date/Time : 2024:04:14 00:29:34+02:00
< File Permissions : -rwxrwxrwx
---
> File Modification Date/Time : 2024:04:14 00:36:13+02:00
> File Access Date/Time : 2024:04:14 00:36:13+02:00
> File Inode Change Date/Time : 2024:04:14 00:36:13+02:00
> File Permissions : -rw-rw-r--
15c15
< Camera Model Name : SM-S901E
---
> Camera Model Name :
29c29
< Offset Time : +12:00
---
> Offset Time : +12:00.
40c40
< Thumbnail Offset : 586
---
> Thumbnail Offset : 38
(Camera Model Name ended up being replaced by empty string, like the JIRA example)
- Finally, added a quick test code in
TiffImageWriterLossless.java(fixed in this branch) with the condition where the bug was fixed, see below
if (offset != length && (offset & 1L) != 0 && length > outputItemLength) {
int i = 1; // breakpoint here to just confirm the file triggers this...
}
And the breakpoint was reached for the iPhone photo from JIRA, and also for my image file :tada:
I will clean up my local modifications, and add my test image to this branch, then rebase + squash. Feel free to check the image file and confirm it if you'd like, @StefanOltmann. Thank you for the help, and for the JIRA OP for providing a useful code snippet to verify the issue.
I am glad that we finally have a usable test image for this PR! 🎉💪
@kinow By porting this library to Kotlin I found and fixed many minor bugs, but this was the only really serious one as it corrupts users data in a non-recoverable way. (For example the problem of dropping the GpsVersionId, which is optional and has always the same value if the file is written/modified by ExifTool, is not a serious bug.)
I don’t know what the planned release cycles are, but the last release was two years ago. I would recommend against allowing this bug to stay for another two years. People might use this library in production, so I think the next release should be soon. You may agree that the problem is serious enough.
(I don’t understand why Gary made a release with knowledge of this bug anyway.)
@kinow By porting this library to Kotlin I found and fixed many minor bugs, but this was the only really serious one as it corrupts users data in a non-recoverable way. (For example the problem of dropping the GpsVersionId, which is optional and has always the same value if the file is written/modified by ExifTool, is not a serious bug.)
:+1:
I don’t know what the planned release cycles are, but the last release was two years ago. I would recommend against allowing this bug to stay for another two years. People might use this library in production, so I think the next release should be soon. You may agree that the problem is serious enough.
We do not really have planned release cycles. Sometimes volunteers have time for cutting releases, sometimes they get busy with other projects, $work, life, etc. There are times where we have a lot of people active, and times of low bandwidth.
For me it is more or less based on weather and some yearly milestones (cold? ==> open source for me, taxes season or good weather ==> no open source for a while for me).
And for the past few years I have been working with Python on HPC with climate models. So I don't have $work time to dedicate to open source Java projects like Commons. But if I work again on Java projects, I will probably be more active around here.
(I don’t understand why Gary made a release with knowledge of this bug anyway.)
RERO, release early, release often (as often as we can). If we have unreleased changes, and one of us (Apache committers) has time, it's common that we just go around releasing multiple components so users can use at least what was not released but fixed.
Unfortunately, I think Gary is the only one that has been doing that, so I must thank him for going around and keeping the release train going on Apache Commons components. We may be slow to fix/release, but believe me, without what Gary is doing, we would definitely be much slower!
Thanks!