guava
guava copied to clipboard
Release SourceCodeEscapers
Original issue created by [email protected] on 2013-12-19 at 07:40 PM
Tasks:
- Figure out the proper names for everything. (Internal bug 3102010, comment 10 has some principles for this.)
- Determine which of the escapers for each language we actually want. One possible decision is that we want none of the current escapers. For example, does it make sense for our Java escaper to escape printable non-ASCII characters, as it currently does? At least inside Google, our .java files are UTF-8. Considerations: If it does make sense to escape them, then the escaper should perhaps be named "javaAsciiCharEscaper." Also, is there any chance that the set of characters that we would need to escape would change over time? (For example, aren't there some other "newline-like" characters?) Or might it make sense to escape any confusable characters? Is there a standard enough set of confusable characters (maybe for domain names? though of course it might change), or would it be best to avoid non-ASCII altogether? Maybe this is reason to have both javaAsciiCharEscaper and javaCharEscaper ("javaUnicodeCharEscaper?").
Hmm, is there any plan yet to open source the source code escapers? If not, we may want to consider closing this issue, because at this rate it looks like it's never going to happen. :)
@cushon has done so in GoogleJavaFormat (or ErrorProne, I don't remember?) but package-private. I think it's a question of API and not feasibility.
I wonder if we'd be willing to release it in a separate, small standalone library. It seems niche enough that maybe we could do that as we experiment with the API? But perhaps it needs some package-private details that would make that challenging.
GoogleJavaFormat (or ErrorProne, I don't remember?)
@cpovirk has a good sense of the API questions that would need to be answered. These are the copies I know of:
- https://github.com/google/error-prone/blob/8beb267881dacf3146e47a8b366d013a4c94c127/check_api/src/main/java/com/google/errorprone/util/SourceCodeEscapers.java
- https://github.com/google/turbine/blob/fdf5ba9682da33dcb5544e19f1511685c1bfa817/java/com/google/common/escape/SourceCodeEscapers.java
- https://github.com/google/closure-compiler/blob/cd22079fd65b57995352b92695f1a1ad60145012/src/com/google/javascript/jscomp/deps/SourceCodeEscapers.java
Here's a slightly edited version of what I wrote internally just recently:
-
Do we really want to continue to have a single
javaCharEscaper()
/javaStringEscaper()
that escapes both"
and'
characters, or should we instead rename the existing method tojavaCharOrStringEscaper()
, migrate everyone to that, and then introduce separatejavaCharEscaper()
andjavaStringEscaper()
methods for those use specific cases? Users might be annoyed if we insert\'
into their strings unnecessarily? -
It's also not 100% clear that users want non-ASCII characters escaped in all cases. But we need to be careful about exotic line breaks and perhaps about easily confused characters.
-
If we do keep one escaper, is
javaCharEscaper()
the right name for it when probably most users will be escaping for strings?
(Of course we continue to recommend that most people use higher-level APIs. But someone has to write the low-level APIs that those people use :))
I don't imagine we'll do this unless someone comes along internally who is willing to survey the existing internal callers and see what features they actually want.
Just so I'm clear on this, if Google decides to keep javaCharEscaper()
and open sources it, and/or does the same for javaStringEscaper()
, would those escapers just escape '
into \'
and "
into \"
respectively? Or would they affect other characters as well in some way?
And would you mind clarifying for me which APIs you're referring to when you say "higher-level APIs" and "low-level APIs"?
The escapers would escape other characters, too, like \n
and control characters. We'd have to figure out exact details.
We encourage people to use tools like JavaPoet rather than manually escape strings.
Ah okay, understood and understood. :)
any update on opensourcing the source code escapers? We use it in an internal project that we are going to opensource
Sadly no. Cloning them to other projects remains sad but fairly easy and harmless (since they're static methods).
Another possible consideration here is that with Java 15 and newer, text blocks can be used to represent strings that contain '
and "
without escaping, e.g.::
System.err.print(
"""
hello 'text' "blocks"
""");
It would be nice for SourceCodeEscapers
to provide a way to escape everything except '
and "
.
So I think this is a vote for separate methods for different levels of escaping (javaCharEscaper()
/ javaStringEscaper()
/ javaTextBlockEscaper()
), instead of having a one-size-fits-all javaCharOrStringEscaper()
that escapes '
and "
.
Cloning them to other projects remains sad but fairly easy and harmless (since they're static methods).
One thing to keep in mind for any projects considering this is that cloning them under a new package name should be preferred (e.g. com.example.foo.escape
, ~~not com.google.common.escape
~~).
That avoids a duplicate definition of the class if the cloned copy includes a package-info
(or when Guava's version of the class is eventually open-sourced). Most of the clones I've seen are already doing that, but the turbine one wasn't, which came up here.