guava Release SourceCodeEscapers

Original issue created by [email protected] on 2013-12-19 at 07:40 PM

Tasks:

Figure out the proper names for everything. (Internal bug 3102010, comment 10 has some principles for this.)
Determine which of the escapers for each language we actually want. One possible decision is that we want none of the current escapers. For example, does it make sense for our Java escaper to escape printable non-ASCII characters, as it currently does? At least inside Google, our .java files are UTF-8. Considerations: If it does make sense to escape them, then the escaper should perhaps be named "javaAsciiCharEscaper." Also, is there any chance that the set of characters that we would need to escape would change over time? (For example, aren't there some other "newline-like" characters?) Or might it make sense to escape any confusable characters? Is there a standard enough set of confusable characters (maybe for domain names? though of course it might change), or would it be best to avoid non-ASCII altogether? Maybe this is reason to have both javaAsciiCharEscaper and javaCharEscaper ("javaUnicodeCharEscaper?").

Oct 31 '14 17:10 gissuebot

Hmm, is there any plan yet to open source the source code escapers? If not, we may want to consider closing this issue, because at this rate it looks like it's never going to happen. :)

Feb 23 '18 01:02 jbduncan

@cushon has done so in GoogleJavaFormat (or ErrorProne, I don't remember?) but package-private. I think it's a question of API and not feasibility.

I wonder if we'd be willing to release it in a separate, small standalone library. It seems niche enough that maybe we could do that as we experiment with the API? But perhaps it needs some package-private details that would make that challenging.

Feb 23 '18 03:02 ronshapiro

GoogleJavaFormat (or ErrorProne, I don't remember?)

@cpovirk has a good sense of the API questions that would need to be answered. These are the copies I know of:

https://github.com/google/error-prone/blob/8beb267881dacf3146e47a8b366d013a4c94c127/check_api/src/main/java/com/google/errorprone/util/SourceCodeEscapers.java
https://github.com/google/turbine/blob/fdf5ba9682da33dcb5544e19f1511685c1bfa817/java/com/google/common/escape/SourceCodeEscapers.java
https://github.com/google/closure-compiler/blob/cd22079fd65b57995352b92695f1a1ad60145012/src/com/google/javascript/jscomp/deps/SourceCodeEscapers.java

Feb 23 '18 03:02 cushon

Here's a slightly edited version of what I wrote internally just recently:

Do we really want to continue to have a single javaCharEscaper() / javaStringEscaper() that escapes both " and ' characters, or should we instead rename the existing method to javaCharOrStringEscaper(), migrate everyone to that, and then introduce separate javaCharEscaper() and javaStringEscaper() methods for those use specific cases? Users might be annoyed if we insert \' into their strings unnecessarily?
It's also not 100% clear that users want non-ASCII characters escaped in all cases. But we need to be careful about exotic line breaks and perhaps about easily confused characters.
If we do keep one escaper, is javaCharEscaper() the right name for it when probably most users will be escaping for strings?

(Of course we continue to recommend that most people use higher-level APIs. But someone has to write the low-level APIs that those people use :))

I don't imagine we'll do this unless someone comes along internally who is willing to survey the existing internal callers and see what features they actually want.

Feb 23 '18 20:02 cpovirk

Just so I'm clear on this, if Google decides to keep javaCharEscaper() and open sources it, and/or does the same for javaStringEscaper(), would those escapers just escape ' into \' and " into \" respectively? Or would they affect other characters as well in some way?

Feb 23 '18 20:02 jbduncan

And would you mind clarifying for me which APIs you're referring to when you say "higher-level APIs" and "low-level APIs"?

Feb 23 '18 20:02 jbduncan

The escapers would escape other characters, too, like \n and control characters. We'd have to figure out exact details.

We encourage people to use tools like JavaPoet rather than manually escape strings.

Feb 23 '18 20:02 cpovirk

Ah okay, understood and understood. :)

Feb 23 '18 20:02 jbduncan

any update on opensourcing the source code escapers? We use it in an internal project that we are going to opensource

Jul 17 '19 15:07 jDramaix

Sadly no. Cloning them to other projects remains sad but fairly easy and harmless (since they're static methods).

Jul 17 '19 15:07 cpovirk

Another possible consideration here is that with Java 15 and newer, text blocks can be used to represent strings that contain ' and " without escaping, e.g.::

System.err.print(
    """
    hello 'text' "blocks"
    """);

It would be nice for SourceCodeEscapers to provide a way to escape everything except ' and ".

So I think this is a vote for separate methods for different levels of escaping (javaCharEscaper() / javaStringEscaper() / javaTextBlockEscaper()), instead of having a one-size-fits-all javaCharOrStringEscaper() that escapes ' and ".

Feb 23 '24 23:02 cushon

Cloning them to other projects remains sad but fairly easy and harmless (since they're static methods).

One thing to keep in mind for any projects considering this is that cloning them under a new package name should be preferred (e.g. com.example.foo.escape, ~~not com.google.common.escape~~).

That avoids a duplicate definition of the class if the cloned copy includes a package-info (or when Guava's version of the class is eventually open-sourced). Most of the clones I've seen are already doing that, but the turbine one wasn't, which came up here.

Sep 11 '24 15:09 cushon

guava guava copied to clipboard

Release SourceCodeEscapers

guava
guava copied to clipboard