metadata-extractor icon indicating copy to clipboard operation
metadata-extractor copied to clipboard

Update all date-time code to java.util

Open cowwoc opened this issue 3 years ago • 6 comments

Please update all the date-time code (e.g. Date, DateFormat and so on) to use newer java.util classes.

Example of guilty code: https://github.com/drewnoakes/metadata-extractor/blob/41e0950402e719d381c57038e220b2090b0aa5e3/Source/com/drew/metadata/Directory.java#L908

Justification: https://stackoverflow.com/q/65984112/14731

cowwoc avatar Jan 31 '21 21:01 cowwoc

As an aside, any method that returns a date (such as https://github.com/drewnoakes/metadata-extractor/blob/41e0950402e719d381c57038e220b2090b0aa5e3/Source/com/drew/metadata/exif/ExifSubIFDDirectory.java#L129) should probably return null if the string representation is not a valid date (e.g. 0000:00:00 00:00:00).

This is easy to detect using DateTimeFormatter. For example DateTimeFormatter.ofPattern("yyyy:MM:dd HH:mm:ss").parse("0000:00:00 00:00:00") throws DateTimeParseException("Text '0000:00:00 00:00:00' could not be parsed: Invalid value for YearOfEra (valid values 1 - 999999999/1000000000): 0").

cowwoc avatar Jan 31 '21 21:01 cowwoc

It would be a shame make the library incompatible for all pre Java 8 projects for this. The lack of thread safety for these classes shouldn't be an argument, that goes for most classes in Java. I never expect anything to be thread-safe unless explicitly stated to be. It's only a problem if you share instances between threads, which is the same rule as with any other object..?

When it comes to convenience, it should be easy for those that wish to convert them to another type before further processing.

Nadahar avatar Jan 31 '21 21:01 Nadahar

Definitely reluctant to raise the required Java version.

Seems like that particular invalid date could receive special treatment somehow. Open to proposals.

drewnoakes avatar Feb 01 '21 01:02 drewnoakes

@Nadahar @drewnoakes No one is asking you to raise the required Java version. The stackoverflow article I linked to mentions a backport: https://www.threeten.org/threetenbp/

Further, I think you misunderstood the problem. This has nothing to do with thread safety. This has to do with silent failures. The old date classes fail silently or do the wrong thing for many edge cases. java.util results in more correct, up-to-date behavior and throws a clear exception if you're parsing an invalid date/time which is very useful for end-users.

cowwoc avatar Feb 01 '21 15:02 cowwoc

Thread safety came up because I followed your link and tried to find the reason for the "recommendation" to avoid java.util.Date and SimpleDateFormat. I see such claims all the time, and never accept them on their face value. Many people seems to "recommend" newer versions of things merely because they are newer, as if things magically become "better" as a consequence of being written at a later stage. I don't subscribe to this idea at all, so I tried to find what the arguments were, and ended up finding thread-safety and "convenience" as the only arguments.

From what I can understand, you don't seem to actually care about any of this, you just want to avoid having invalid timestamps seem legit. There should be a simpler way to achieve this, and returning null seems reasonable to me. The "challenge" is how to recognize such invalid values without writing a full parser.

I can think of checks like checking for 0 values in month or day, but it would still require some amount of parsing to determine what part is what. A "simple" check might involve checking for only zeroes, separators and white-space for example. Yet another possibility is, considering that we're dealing with media files, and photographs have only been with us for some 200 years, to simply avoid anything older than 1800 or so to be invalid.

I don't know what is the best way to check for invalid input, but I think that's what's needed. Looking at how the parser is made, it's no wonder that it will come up with "something" when processing invalid input:

https://github.com/drewnoakes/metadata-extractor/blob/41e0950402e719d381c57038e220b2090b0aa5e3/Source/com/drew/metadata/Directory.java#L874-L886

https://github.com/drewnoakes/metadata-extractor/blob/41e0950402e719d381c57038e220b2090b0aa5e3/Source/com/drew/metadata/Directory.java#L906-L919

When the parsing results in an Exception it will simply move on to the next "pattern" and try again. The parser doesn't parse the whole string, it only parses from the beginning until it has a "valid result". It's easy to see other invalid inputs that would result in date and times that aren't indicated by the input data. So, from all we know, it might actually be the last "pattern" (yyyy) that actually produce the result, and it will return something for any 4 digits from the beginning of the string.

As I see it, Directory.getDate() might be a little too eager to find some timestamp, while I'd say that no result is preferable to a wrong result.

Nadahar avatar Feb 01 '21 21:02 Nadahar

@Nadahar I don't think my particular case has to do with Directory.getDate() being too eager to return some timestamp (though I can see how this could be the case sometimes).

Regarding detecting invalid timestamps, you get this (and so much more) for free if you migrate to the java.util backport.

The way I see it, you want to loop over the list of available parsers but if one of them indicates that the date is invalid (not just a parsing error) then the code should fail right away instead of continuing to try other patterns.

cowwoc avatar Feb 01 '21 22:02 cowwoc

I have created issue #609 that is related, but I believe we only need to address the proper handling of this placeholder value or enable the lenient option.

StefanOltmann avatar Apr 05 '23 11:04 StefanOltmann

@drewnoakes I think you can close this as „not planned“. The problem has been solved and @Nadahar gave good reasons to not migrate the date library.

StefanOltmann avatar Feb 04 '24 08:02 StefanOltmann