DateFormat garbage characters in output
What are you trying to do?
Format the date/time with the following code:
DateFormat.getTimeInstance(DateFormat.SHORT, Locale.US).format(date)
Expected behaviour: The result string contains the properly formatted date and no garbage/extraneous characters.
Observed behaviour: The result string contains the date, but also contains garbage characters instead of a space preceeding the "AM" / "PM" text.
Any other comments: Tested with the following: OpenJDK Runtime Environment Temurin-21.0.2+13 (build 21.0.2+13-LTS) OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS)
Also tested SUCCESSFULLY (i.e. no garbage in the output) with the following, and other JVM's: OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)
Here is example output using cat -v and xxd:
4:49M-bM-^@M-/PM
00000000: 343a 3530 e280 af50 4d0a 4:50...PM.
Here is the code of a complete test program:
import java.text.DateFormat;
import java.util.Date;
import java.util.Locale;
public class FormatDateSimplest {
public static void main(String[] args) {
DateFormat timeFormat = DateFormat.getTimeInstance(DateFormat.SHORT, Locale.US);
Date date = new Date();
System.out.println(timeFormat.format(date));
}
}
I'm going to transfer this to the support repository - temurin-build is for the scripts that build and distribute Temurin.
Also tested SUCCESSFULLY (i.e. no garbage in the output) with the following, and other JVM's:
When you say "other JVMs" are you suggesting it passes with an equivalent OpenJDK version from other vendors? Can you say which ones?
THank you @sxa. Other JVMs are the following:
Eclipse Temurin 17 JDK
openjdk version "17.0.9" 2023-10-17
OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)
OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, sharing)
Eclipse Temurin 11 JDK
openjdk version "11.0.24" 2024-07-16
OpenJDK Runtime Environment Temurin-11.0.24+8 (build 11.0.24+8)
OpenJDK 64-Bit Server VM Temurin-11.0.24+8 (build 11.0.24+8, mixed mode)
Java 1.8.0_261 (Oracle)
java version "1.8.0_261"
Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
The labels are mine - the lines under the lables are the output of "java -version".
@artnaseef What you are observing is another version of https://bugs.openjdk.org/browse/JDK-8324308 caused by the CLDR 42.0 update done in JDK 20 (also included in JDK 21). What you need to do is use a custom formatter to get the simple space (over the horizontal non-breaking space before AM/PM) if that is what you need in JDK 21+. Hope that helps.
Is there a straight-forward way to get the plain-text / ASCII-compatible DateFormat.SHORT equivalent?
I'm having a little trouble wrapping my head around java.text date formatting including non-breaking spaces. I've never heard of CLDR before.
BTW, I notice the labeled "WAITING ON OP"? Is there something more I need to do here?
I haven't dug into this too much, but a quick query to Copilot gives me:
To ensure that the output format has a simple space instead of any unexpected characters before AM/PM, you can use SimpleDateFormat from the java.text package. Here’s the updated code:
import java.text.SimpleDateFormat;
import java.util.Date;
public class FormatDateSimplest {
public static void main(String[] args) {
// Define the custom format
SimpleDateFormat timeFormat = new SimpleDateFormat("h:mm a");
// Create a new date instance for the current time
Date date = new Date();
// Format the date and print it
System.out.println(timeFormat.format(date));
}
}
Explanation: 1. Pattern "h:mm a": • h - Hour in 12-hour format (1-12). • mm - Minutes (00-59). • a - AM/PM marker. • There is a single space between the time and the AM/PM marker. 2. Why SimpleDateFormat? • SimpleDateFormat allows you to define custom formatting patterns explicitly, so there is no ambiguity with locale-based formatting issues (such as non-breaking spaces).
When you run this code, the output will look like this:
4:49 PM
with a regular space before AM/PM._
I haven't dug into this too much, but a quick query to Copilot gives me:
To ensure that the output format has a simple space instead of any unexpected characters before AM/PM, you can use SimpleDateFormat from the java.text package. Here’s the updated code:
import java.text.SimpleDateFormat; import java.util.Date; public class FormatDateSimplest { public static void main(String[] args) { // Define the custom format SimpleDateFormat timeFormat = new SimpleDateFormat("h:mm a"); // Create a new date instance for the current time Date date = new Date(); // Format the date and print it System.out.println(timeFormat.format(date)); } }Explanation: 1. Pattern "h:mm a": • h - Hour in 12-hour format (1-12). • mm - Minutes (00-59). • a - AM/PM marker. • There is a single space between the time and the AM/PM marker. 2. Why SimpleDateFormat? • SimpleDateFormat allows you to define custom formatting patterns explicitly, so there is no ambiguity with locale-based formatting issues (such as non-breaking spaces).
When you run this code, the output will look like this:
4:49 PM
with a regular space before AM/PM._
Thank you for the response. In my case, the formatted date is going to individuals who may be anywhere geographically, so I don't want to use fixed date and time formats - I want to use the formats that are specific to their locale. Ignore the hard-coded locale in my snippet please.
You could try if -Djava.locale.providers=COMPAT works, but that option is gone in later JDKs.
Thanks Severin. So the standard (CLDR?) does not address this?
Perhaps this is just my lack of understanding UTF-8. Is it reasonable to expect standard regular expression processors (e.g. java.lang.Matcher) to treat this non-breaking space as a space (e.g. matching with \s predefined character class in a java regex)?
Art
On Tue, Jan 21, 2025 at 8:29 AM Severin Gehwolf @.***> wrote:
You could try if -Djava.locale.providers=COMPAT works, but that option is gone in later JDKs.
— Reply to this email directly, view it on GitHub https://github.com/adoptium/adoptium-support/issues/1213#issuecomment-2605048219, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCCNZWZXT2Y7UGOV3BB7XD2LZRXDAVCNFSM6AAAAABUV3XIO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBVGA2DQMRRHE . You are receiving this because you were mentioned.Message ID: @.***>
Thanks Severin. So the standard (CLDR?) does not address this?
Perhaps this is just my lack of understanding UTF-8. Is it reasonable to expect standard regular expression processors (e.g. java.lang.Matcher) to treat this non-breaking space as a space (e.g. matching with \s predefined character class in a java regex)?
Art …
Just tested it, and the regex failed to match the non-breaking character.
Art
Any thoughts on how to pursue this further?
It feels to me like the JDK is doing the wrong thing here since some of the text tools seem to make use of the full UTF-8 space (e.g. the date formatting), while others ignore it (e.g. regex).
If there is a desire to go all-in with UTF-8, then shouldn't the regex handle it? This is a breaking issue.
Is there another / more-appropriate place to raise this concern?
Feel free to raise this issue on core-libs-dev on the OpenJDK project.
treat this non-breaking space as a space (e.g. matching with \s predefined character class in a java regex)?
The \s is defined in javadoc as:
\s A whitespace character: [ \t\n\x0B\f\r] if UNICODE_CHARACTER_CLASS is not set. See Unicode Support.
That doesn't include a narrow non-breaking space, AFAIK.
The worst part of this - in my opinion - is that I am now seriously considering to stop using UTF-8 and go back to older standards, like good-old ASCII. Solving problems is great, but at what expense? The complexity here is unclear, and it is breaking.
We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable. It will be closed soon unless the stale label is removed by a committer, or a new comment is made.
I'm going to close this as this is not specific to Eclipse Temurin builds. You can post your challenge with UTF-8 handling at core-libs-dev (https://mail.openjdk.org/mailman/listinfo/core-libs-dev)