guava icon indicating copy to clipboard operation
guava copied to clipboard

Release SourceCodeEscapers

Open gissuebot opened this issue 10 years ago • 12 comments

Original issue created by [email protected] on 2013-12-19 at 07:40 PM


Tasks:

  • Figure out the proper names for everything. (Internal bug 3102010, comment 10 has some principles for this.)
  • Determine which of the escapers for each language we actually want. One possible decision is that we want none of the current escapers. For example, does it make sense for our Java escaper to escape printable non-ASCII characters, as it currently does? At least inside Google, our .java files are UTF-8. Considerations: If it does make sense to escape them, then the escaper should perhaps be named "javaAsciiCharEscaper." Also, is there any chance that the set of characters that we would need to escape would change over time? (For example, aren't there some other "newline-like" characters?) Or might it make sense to escape any confusable characters? Is there a standard enough set of confusable characters (maybe for domain names? though of course it might change), or would it be best to avoid non-ASCII altogether? Maybe this is reason to have both javaAsciiCharEscaper and javaCharEscaper ("javaUnicodeCharEscaper?").

gissuebot avatar Oct 31 '14 17:10 gissuebot

Hmm, is there any plan yet to open source the source code escapers? If not, we may want to consider closing this issue, because at this rate it looks like it's never going to happen. :)

jbduncan avatar Feb 23 '18 01:02 jbduncan

@cushon has done so in GoogleJavaFormat (or ErrorProne, I don't remember?) but package-private. I think it's a question of API and not feasibility.

I wonder if we'd be willing to release it in a separate, small standalone library. It seems niche enough that maybe we could do that as we experiment with the API? But perhaps it needs some package-private details that would make that challenging.

ronshapiro avatar Feb 23 '18 03:02 ronshapiro

GoogleJavaFormat (or ErrorProne, I don't remember?)

@cpovirk has a good sense of the API questions that would need to be answered. These are the copies I know of:

  • https://github.com/google/error-prone/blob/8beb267881dacf3146e47a8b366d013a4c94c127/check_api/src/main/java/com/google/errorprone/util/SourceCodeEscapers.java
  • https://github.com/google/turbine/blob/fdf5ba9682da33dcb5544e19f1511685c1bfa817/java/com/google/common/escape/SourceCodeEscapers.java
  • https://github.com/google/closure-compiler/blob/cd22079fd65b57995352b92695f1a1ad60145012/src/com/google/javascript/jscomp/deps/SourceCodeEscapers.java

cushon avatar Feb 23 '18 03:02 cushon

Here's a slightly edited version of what I wrote internally just recently:

  • Do we really want to continue to have a single javaCharEscaper() / javaStringEscaper() that escapes both " and ' characters, or should we instead rename the existing method to javaCharOrStringEscaper(), migrate everyone to that, and then introduce separate javaCharEscaper() and javaStringEscaper() methods for those use specific cases? Users might be annoyed if we insert \' into their strings unnecessarily?

  • It's also not 100% clear that users want non-ASCII characters escaped in all cases. But we need to be careful about exotic line breaks and perhaps about easily confused characters.

  • If we do keep one escaper, is javaCharEscaper() the right name for it when probably most users will be escaping for strings?

(Of course we continue to recommend that most people use higher-level APIs. But someone has to write the low-level APIs that those people use :))

I don't imagine we'll do this unless someone comes along internally who is willing to survey the existing internal callers and see what features they actually want.

cpovirk avatar Feb 23 '18 20:02 cpovirk

Just so I'm clear on this, if Google decides to keep javaCharEscaper() and open sources it, and/or does the same for javaStringEscaper(), would those escapers just escape ' into \' and " into \" respectively? Or would they affect other characters as well in some way?

jbduncan avatar Feb 23 '18 20:02 jbduncan

And would you mind clarifying for me which APIs you're referring to when you say "higher-level APIs" and "low-level APIs"?

jbduncan avatar Feb 23 '18 20:02 jbduncan

The escapers would escape other characters, too, like \n and control characters. We'd have to figure out exact details.

We encourage people to use tools like JavaPoet rather than manually escape strings.

cpovirk avatar Feb 23 '18 20:02 cpovirk

Ah okay, understood and understood. :)

jbduncan avatar Feb 23 '18 20:02 jbduncan

any update on opensourcing the source code escapers? We use it in an internal project that we are going to opensource

jDramaix avatar Jul 17 '19 15:07 jDramaix

Sadly no. Cloning them to other projects remains sad but fairly easy and harmless (since they're static methods).

cpovirk avatar Jul 17 '19 15:07 cpovirk

Another possible consideration here is that with Java 15 and newer, text blocks can be used to represent strings that contain ' and " without escaping, e.g.::

System.err.print(
    """
    hello 'text' "blocks"
    """);

It would be nice for SourceCodeEscapers to provide a way to escape everything except ' and ".

So I think this is a vote for separate methods for different levels of escaping (javaCharEscaper() / javaStringEscaper() / javaTextBlockEscaper()), instead of having a one-size-fits-all javaCharOrStringEscaper() that escapes ' and ".

cushon avatar Feb 23 '24 23:02 cushon

Cloning them to other projects remains sad but fairly easy and harmless (since they're static methods).

One thing to keep in mind for any projects considering this is that cloning them under a new package name should be preferred (e.g. com.example.foo.escape, ~~not com.google.common.escape~~).

That avoids a duplicate definition of the class if the cloned copy includes a package-info (or when Guava's version of the class is eventually open-sourced). Most of the clones I've seen are already doing that, but the turbine one wasn't, which came up here.

cushon avatar Sep 11 '24 15:09 cushon