protobuf FR: Text proto format could support multi-line strings

In a text proto file like:

something: <
  description: "this\nwould\n naturally be a\n multiline string"
  ...
>

It would be awesome if we could e.g.:

something: <
  description: `
this
would
naturally be a
 multiline string
`
  ...
>

This could be used for e.g. attaching markdown documentation to configuration entries which could then be displayed nicely. I'm not 100% on backticks (implies quoting in the parser)... Other things like a bash style <<EOM .... EOM demarcation could also work and would be easier to parse.

Thoughts?

Mar 05 '16 00:03 riannucci

+1 this idea (whatever syntax it takes). Having the ability to write free-flowing text in the text protobuf is great for people who want to use protobufs but still have it be human readable when you have multiline text.

Mar 22 '16 23:03 bwendling

FWIW, we did actually implement this as a crappy pre-processor over text proto (using the <<EOM ... EOM syntax). The tests ("examples") are here: https://github.com/luci/luci-go/blob/master/common/proto/multiline_test.go

It also will do a python-style textwrap.dedent on it, so you can do:

something: <
  description: <<EOM
    this
    would
    naturally be a
      multiline string
  EOM
  ...
>

Which would be the equivalent of:

something: <
  description: "this\nwould\nnaturally be a\n  multiline string"
  ...
>

(note the beautiful indentation and lack of weird prefix spaces in the translated version :wink:)

Mar 23 '16 00:03 riannucci

Hmm...it does appear to be an option, but it's discouraged. I don't know the reasoning for it.

Mar 23 '16 00:03 bwendling

This would be a great feature, we're looking at needing to do large blocks of text in a proto for api documentation, and it would be great to do this! In lieu of this we'll probably make a preprocessor as @riannucci is using, but that is much less elegant. Thanks!

May 05 '17 19:05 aselle

+1 for this.

Mar 15 '18 06:03 laike9m

+1 again.

Mar 23 '18 00:03 DonGar

So, what's the process to getting this implemented? Does someone simply submit a PR?

May 03 '18 19:05 maguro

I'm going to take a stab at this. I'm going to use backticks. Not sure when, or if, I'll get this done. Feel free to ping me.

May 04 '18 21:05 maguro

So, using backticks was a breeze to implement, but I realized it’s a huge pain for people who want to embed multi-line markdown strings. The current tokenizer only looks ahead one character; I suspect that this is why there's been no movement on this feature request. I’m extending it to look ahead three characters to accommodate triple single and double quotes.

May 16 '18 13:05 maguro

Multi-line string value is already supported as far as I can tell, see: https://github.com/google/protobuf/blob/ed4321d1cb33199984118d801956822842771e7e/src/google/protobuf/text_format_unittest.cc#L748

You will need to format it like this:

something: <
  description:
    "this\n"
    "would\n"
    " naturally be a\n"
    " multiline string"
  ...
>

Probably not as convenient as the proposed ones, but it's also less magical (i.e., don't magically remove indentation and line wraps here and there).

@maguro Before you put more work into it, please make sure someone on protobuf-team is supportive of this feature (+a few other team members @acozzette @liujisi ).

May 16 '18 19:05 xfxyjwf

The unit test TEST_F(TextFormatTest, ParseConcatenatedString) ensures that multi-part strings concatenate correctly. I don't think that it has anything to do with multi-line strings.

I understand that that one can use a list of multi-part strings with each part suffixed with \n. It's not just inconvenient, it's brittle and makes things pretty illegible. Looking at the comments and emoticons posted on this issue, it seems that the wider community has a very poor opinion on that workaround as well.

My proposal is not to remove indentation, etc. Whatever is in the string is what you get; any formatting can and should be done after compilation. It also is backward compatible.

I posted my comments are several weeks ago and the ticket is several years old. How can we proceed in a more timely manner?

May 20 '18 17:05 maguro

For changes that will affect multiple languages, there needs to be a formal proposal and it needs to be approved before proceeding. There is no fast path and the process is Google internal only. This change unfortunately falls into this category and has to be carried through by a Google engineer. It doesn't need to be a protobuf team member though. If you are a Google engineer and is motivated enough to push this feature through all the proto text parsers in Google, feel free to send us a proposal doc. When the proposal is approved, we will accept code changes (from anyone, internally or on github). For external contributors, your best bet is probably to convince someone in Google to go through the process on your behalf.

My personal take on this proposal is that it doesn't provide enough benefit to justify the effort. To me multi-line strings aren't so different from multi-part strings, and there are a lot of text proto parsers we need to update if we are to support it.

May 22 '18 00:05 xfxyjwf

What are "text proto parsers"?

May 22 '18 20:05 maguro

@maguro Protobuf implementations that support text format apparently have implemented such parsers. Some implementations are opensourced (as in this github repository), but many of them are not and only exist inside Google. Also protobuf implementations are not the only text proto parsers. We have tools that process/convert/format text protos. For example, if you check-in a text proto file in our internal code base, a tool will parse and format the file according to a formatting guideline, which by the way has specific rules about how to format multi-part strings.

May 22 '18 21:05 xfxyjwf

So, these tools parse raw proto source files? If that is the case, then if a proto file did not use the new multi-line string, then they should be fine, correct?

May 22 '18 22:05 maguro

@maguro If nobody uses the new multi-line string, they will be fine.

May 22 '18 22:05 xfxyjwf

So, there doesn't seem to be a problem on that front then; if there is then one could add a flag to turn it off. I'll try to find a Google engineer to shepherd this on your side.

FWIW, I really don't how one would prefer

string scope = 2 [(niantic.cfg.fld) = {
      description: "To get an ID Token to authenticate a user, you are required to specify\n"
                   "the scope identifier `openid`. For example: `scope=openid`.\n"
                   "\n"
                   "Additionally, to access private user data from the Yahoo APIs, include\n"
                   "the relevant [API scope\n"
                   "identifiers](https://developer.yahoo.com/oauth2/guide/yahoo_scopes/index.html#scope-identifiers).\n"
                   "The scopes can be delimited by a\n"
                   "space or comma. In the example below, the scope identifier is specified\n"
                   "for requesting the ID Token and an Access Token that provides read\n"
                   "access to the Yahoo Mail API:\n"
                   "\n"
                   "* `scope=openid mail-r`\n"
                   "* `scope=openid,mail-r`"
  }];

over the much cleaner

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   To get an ID Token to authenticate a user, you are required to specify
                   the scope identifier `openid`. For example: `scope=openid`.

                   Additionally, to access private user data from the Yahoo APIs, include
                   the relevant [API scope
                   identifiers](https://developer.yahoo.com/oauth2/guide/yahoo_scopes/index.html#scope-identifiers).
                   The scopes can be delimited by a
                   space or comma. In the example below, the scope identifier is specified
                   for requesting the ID Token and an Access Token that provides read
                   access to the Yahoo Mail API:

                   * `scope=openid mail-r`
                   * `scope=openid,mail-r`
                   """
  }];

May 23 '18 00:05 maguro

Good news. I got a Google manager to shepherd this. I was also able to maintain the zero-copy stream semantics. I'll try to get the patch submitted later this week, more likely the weekend.

Jun 06 '18 14:06 maguro

@maguro

over the much cleaner [...]

From your example, what would this become?

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   foo
                   """

Would the description be: foo or \n\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\sfoo\n\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s

And what about this?

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   foo
      bar
                   """

Aug 15 '18 18:08 Goodwine

It would be the latter. The compiler should not make any asssumptions about the intentions of the author.

Aug 18 '18 11:08 maguro

What is the conclusion on this? Does this feature get implemented?

Apr 03 '19 16:04 zzhang2019

Yeah, let me dust this off and chat w/ our lawyers.

Apr 15 '19 20:04 maguro

Looking good:

Apr 23 '19 08:04 maguro

The plugin is aware of two kinds of strings, ones that are for string values and ones that are for string names, e.g. import paths:

When a string value is used somewhere a string name is required, the proper error occurs:

Apr 23 '19 14:04 maguro

Ok, as you can see above, it supports both backtick and triple-double-quotes to delimit multiline strings. I'm just making that clear.

Apr 28 '19 14:04 maguro

Coming from a Python background, I was surprised by the backtick syntax. Then I read about the JavaScript Template Strings. But wouldn't it be confusing for users, since this is not really a template?

May 03 '19 22:05 amauryfa

Coming from a Python background, I was surprised by the backtick syntax. Then I read about the JavaScript Template Strings. But wouldn't it be confusing for users, since this is not really a template?

One is free to use """ if happens to be stuck in a JavaScript universe and gets confused as to when they are editing a proto file vs a JavaScript file. :)

Seriously though, I'm not married to backticks and I'm happy to remove them if that's the consensus here.

May 11 '19 14:05 maguro

Interested in this. @maguro was this agreed on internally inside of google in the end?

For others, more info here: https://github.com/protocolbuffers/protobuf/issues/1297#issuecomment-390825524

May 15 '19 10:05 mikelnrd

Please first sign cla for #6078 to move further.

Jun 11 '19 23:06 BSBandme

@maguro, I'm a little confused that what you are describing, as well as the PR (#6078) is about adding a new syntax for multi-line string literals in the protobuf source syntax. However, this issue is about adding multi-line string literal support to the protobuf text format.

I guess you only looked at changing the compiler since that is the only piece that can be changed without impacting all of the languages/runtimes (which all have their own implementation of the text format)?

Jul 24 '19 19:07 jhump