protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Parser erroneously allows whitespace in package declaration.

Open JakeWharton opened this issue 8 years ago • 9 comments

We had a case where protoc allowed a package declaration that had newlines in it but other tooling based on the spec failed to parse the file. I believe this should have failed to parse inside protoc as well.

Relevant parts of the spec:

package = "package" fullIdent ";"
fullIdent = ident { "." ident }
ident = letter { letter | decimalDigit | "_" }
letter = "A" … "Z" | "a" … "z"
decimalDigit = "0" … "9"

Example showing protoc allowing newlines:

$ echo "package example.

foo

;

message Foo {}" > test.proto

$ protoc --version
libprotoc 3.0.0

$ protoc --java_out=. test.proto
[libprotobuf WARNING google/protobuf/compiler/parser.cc:547] No syntax specified for the proto file: test.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)

$ tree
.
├── example
│   └── foo
│       └── Test.java
└── test.proto

JakeWharton avatar Sep 16 '16 20:09 JakeWharton

Yes this seems like a bug.

haberman avatar Mar 08 '17 03:03 haberman

@anandolee It's the syntax spec that needs a fix here. A package name spreading across multiple lines is needed when the name is very long. For example:

message Foo {
  optional a.very.very.long.name.that.can.not.fit.into.one.line.without.exceeding
      .80.char.limit.Bar value = 1;
}

xfxyjwf avatar Feb 14 '18 19:02 xfxyjwf

@xfxyjwf Is the syntax spec checked in anywhere? What actually needs to be fixed here?

haberman avatar Jun 11 '18 20:06 haberman

The syntax spec here: https://developers.google.com/protocol-buffers/docs/reference/proto2-spec

The spec needs to specify where spaces are allowed and where not.

xfxyjwf avatar Jun 11 '18 22:06 xfxyjwf

I suspect this is still a bug...

fowles avatar Jul 01 '22 21:07 fowles

There is definitely code in the wild that expects qualified names (dot-separated sequence of identifiers, such as in type names, extension names in message literals and custom options) to allow whitespace, including newlines, in between name components. (Admittedly, this might be less likely in package declarations.) But either way, this would not be a backwards compatible change.

There is definitely an issue with the clarity in the spec (or lack thereof). Which productions are lexical elements (where whitespace is generally disallowed between adjacent rule references)? Which are syntax elements (where whitespaces are not only allowed but sometimes required between adjacent rule references)? The whole dichotomy of tokenization vs. parsing is not mentioned at all, and there is nothing in the spec about whitespace or comments.

jhump avatar Aug 15 '22 21:08 jhump

FWIW, most languages that use dots as a "selector" operator for building qualified names do support whitespaces around the dots, even if most users don't ever write code that way. C++, Java, Go, etc...

Admittedly, Go does not allow you to define a package name with a dot -- it must be a simple identifier (and the fully-qualified name is its import path). But still, the following are both valid:

// test.go
package main
import "fmt"
import "runtime"
func main() {
 fmt . Print ( runtime. GOMAXPROCS ( -1))
}
// test.java
package com . bufbuild . parser;

import java . util .Arrays;

class Test {
  public static void main(String[] args) {
    System . out . println ( Arrays . asList(args) );
  }
}

Having said all that, there is one place where the extra whitespace is more likely to be considered a bug: type URLs in the text format for Any messages.

// this is valid
syntax = "proto3";

package foo.bar;

import "google/protobuf/any.proto";
import "google/protobuf/descriptor.proto";

extend google.protobuf.MessageOptions {
  google.protobuf.Any xtra = 10101;
}

message Foo {
  string s = 1;
  uint64 i = 2;

  option (xtra) = {
    [ type . googleapis . com / foo . bar . Foo ] < s: "foo",  i: 123 >
  };
}

jhump avatar Sep 23 '22 13:09 jhump

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar Jun 23 '24 10:06 github-actions[bot]

AFAIK, this hasn't been fixed and should remain open.

That said, I'm not sure that this is solely a documentation issue. In the original description, @JakeWharton points out that "protoc allowed a package declaration that had newlines in it but other tooling based on the spec failed to parse the file." That's seems to point to a functionality issue along with the documentation issues.

If we do want to update the spec to reflect the ability to have whitespace characters between the package name elements, I'll need some direction on how to show that. I'm not EBNF proficient.

Logofile avatar Jun 23 '24 16:06 Logofile