dmd icon indicating copy to clipboard operation
dmd copied to clipboard

Fix Issue 23999 - literal suffixes dont mix well with template instan…

Open ntrel opened this issue 1 year ago • 15 comments

…tiations

ntrel avatar Jun 21 '23 12:06 ntrel

Thanks for your pull request and interest in making D better, @ntrel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
23999 enhancement literal suffixes dont mix well with template instantiations

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15339"

dlang-bot avatar Jun 21 '23 12:06 dlang-bot

@dkorpel Then can it just be a dmd diagnostic error rather than a language error?

ntrel avatar Jun 21 '23 15:06 ntrel

Is the current dangling else error part of the grammar?

Edit: I see that's actually a warning, not an error.

ntrel avatar Jun 21 '23 15:06 ntrel

This is a special case that's impossible to express in the lexical grammar.

It could be expressed in the lexical grammar by allowing arbitrary identifiers as a suffix and checking it later:

StringPostfix:
    Identifier

The lexer or another part of the compiler could later verify that the suffix is supported.

tim-dlang avatar Jun 22 '23 16:06 tim-dlang

@tim-dlang That might work for strings, but for an integer literal and float literal to use too (to also solve the comment in bugzilla), that would make it ambiguous for a digit character followed by an IdentifierStart token, from the grammar POV.

That said, we already have 2 special case rules, which are both for floating point literals which don't obey maximal munch (see end of this section): https://dlang.org/spec/lex.html#source_text

Those rules actually change the meaning of the tokens! So we could add a special rule saying:

$(P A $(GLINK StringLiteral), $(GLINK IntegerLiteral) or $(GLINK FloatLiteral)
which ends with an $(GLINK IdentifierStart) character cannot be followed by an 
*IdentifierStart* character at the start of the next token.)

And that rule would only forbid certain patterns rather than redefining them.

I would much rather do that and make it an error, as gcc and clang do. That's because people probably often don't use the -w switch to show warnings.

More importantly, we can never add any new literal suffixes without breaking code if we don't have an error.

ntrel avatar Jun 22 '23 17:06 ntrel

@ntrel I don't think it would be ambigous for integer or float literals. The lexer would accept as many identifier characters as possible, but later error on invalid characters.

For comparison, C/C++ use multiple phases in the compiler. First a number is lexed as a pp-number, which allows an arbitrary suffix. A later phase distinguishes between integer-literal and floating-literal, which is more strict for the suffix, but can not split the token. Multiple phases are not necessary for D, because there is no preprocessor.

But I also think it would be best to just add a special case and make it an error.

tim-dlang avatar Jun 22 '23 17:06 tim-dlang

It could be expressed in the lexical grammar by allowing arbitrary identifiers as a suffix and checking it later:

That's pretty smart!

dkorpel avatar Jun 22 '23 17:06 dkorpel

More importantly, we can never add any new literal suffixes without breaking code if we don't have an error.

Well, you would be breaking code now without adding new literal suffixes. That being said, I'd much rather add an error than a warning. Warnings are bad.

dkorpel avatar Jun 22 '23 18:06 dkorpel

@tim-dlang The grammar you linked for floating-literal seems to require either a . or e, in D you can have float literals that don't: 1F - that would conflict with e.g. 1L integer literals. I think I have solved this: https://github.com/dlang/dlang.org/pull/3646

That grammar change doesn't try to disallow a hex literal from having an identifier immediately following it, because:

  • I would only disallow those if the last digit is a-f, not when it's a decimal digit.
  • It's not as confusing as the string/decimal suffix running into an identifier, as at least you have the leading 0x to remind you it's hex.

So if the spec pull is OK, I need to update this pull to remove the hex literal warning and make the others errors again.

ntrel avatar Jun 22 '23 21:06 ntrel

@tim-dlang The grammar you linked for floating-literal seems to require either a . or e, in D you can have float literals that don't: 1F - that would conflict with e.g. 1L integer literals. I think I have solved this: dlang/dlang.org#3646

Yes, 1F makes it more complicated.

That grammar change doesn't try to disallow a hex literal from having an identifier immediately following it, because:

I think it would be more consistent if hex literals behave the same. Consider the following example:

  Foo!0x321On;

Depending on the font, you may not easily see if it is 0x231 On or 0x3210 n, so an error message could be helpful. The same problem exists for normal integer literals. Currently the pull request does not reject 321On.

tim-dlang avatar Jun 23 '23 16:06 tim-dlang

@tim-dlang I'm going to focus just on suffixed literals for this pull. Also good point about the unicode whitespace characters, I think I'll just drop the unicode detection.

ntrel avatar Jun 23 '23 16:06 ntrel

This is ready to go now.

ntrel avatar Jul 14 '23 18:07 ntrel

I'll ask what Walter thinks of this

dkorpel avatar Jul 16 '23 18:07 dkorpel

I note that the way this is implemented, we can never add any additional suffix characters. I suggest instead that the check should be for any suffixes that are not valid suffixes.

WalterBright avatar Mar 21 '24 06:03 WalterBright

this code is fairly out of sync with the current lexer. Please rebase.

WalterBright avatar Mar 21 '24 06:03 WalterBright