doc icon indicating copy to clipboard operation
doc copied to clipboard

$_r_-a_-k_u-_'_s-_ identifiers

Open 0rir opened this issue 5 years ago • 9 comments

https://docs.raku.org/syntax/identifiers says:

An ordinary identifier is composed of a leading alphabetic character which may be followed by one or more alphanumeric characters. It may also contain isolated, embedded apostrophes ' and/or hyphens -, provided that the next character is each time alphabetic.

I suggest this as more nearly correct:

An ordinary identifier is composed following two rules, and from four kinds of characters. The character kinds are:

  1. alphabetic characters,
  2. numeric characters,
  3. underscore,
  4. and the separators which are the apostrophe and the hyphen.

The rules:

  1. Each separator separates two sections.
  2. Every section must start with an alphabetic character or underscore.

Empty sections are implied, which might be made explicit. '<'alpha'>' is not alphabetic. The topic var supersedes the above.

0rir avatar Oct 18 '20 23:10 0rir

Presuming that Rakudo behavior is correct, the passage quoted is wrong.

I started this issue as just a language clean up. I subscribe to the people don't like to read proposition. I would prefer that the first sentence more strongly motivate the reader to read further.

During that effort I noticed that the docs are wrong or Rakudo is wrong.

The major nit is that a learner is very apt to want to create identifiers much before they care to learn about regexes and grammars. So conflating <alpha> with alphabetic or <alnum> with alphanumeric is wrong.

My phrase topic var supercedes the above was a waste of words.

0rir avatar Oct 19 '20 14:10 0rir

Perhaps:

An ordinary identifier is a group of characters. The group is a leading underscore or alphabetic character which may be followed by one or more underscores or alphanumeric characters. It may also be multiple such groups separated by a single apostrophes ' or hyphens-.

0rir avatar Nov 22 '21 16:11 0rir

Maybe?

An ordinary_identifier is one or more characters. Beyond single-character ordinary_identifiers (which are alphabetic but not underscore (i.e. <alpha> minus _, or more simply <:L>), characters are composed into groups wherein a leading underscore or alphabetic character ( i.e. <alpha>+) may be followed by zero or more underscores or alphanumeric characters (i.e. <alnum>*). Even more expressively, an ordinary_identifier may also be multiple such <alpha>+<alnum>* groups separated by a single apostrophe ' or hyphen -, leading to quite natural and concise variable identification.

Note (from the docs): Somewhat confusingly, a predefined regex <ident> is known as a "Basic identifier" and is found at https://docs.raku.org/language/regexes#Predefined_Regexes . It has no support for ' or -:

<ident> Basic identifier (no support for C<'> or C<->). Same as C« <.alpha> \w* »

jubilatious1 avatar Nov 22 '21 21:11 jubilatious1

Do we (should we?) distinguish a basic_identifier from an ordinary_identifier ?

https://docs.raku.org/language/regexes#Predefined_Regexes

<ident> Basic identifier (no support for C<'> or C<->). Same as C« <.alpha> \w* »

A basic_identifier is one or more characters. Beyond single-character basic_identifiers (which are alphabetic but not underscore (i.e. <alpha> minus _, or more simply <:L>), characters are composed into one-or-more groups wherein a leading underscore or alphabetic character ( i.e. <alpha>+) may be followed by zero or more underscores or alphanumeric characters (i.e. <alnum>*). Conceptually one may think of basic_identifiers as [<alpha>+<alnum>*]+, which includes various "camelCase" and "snake_case" forms.

Even more expressively, an ordinary_identifier may also be multiple such <alpha>+<alnum>* groups separated by a single apostrophe ' or hyphen -, leading to quite natural and concise variable identification.

jubilatious1 avatar Nov 23 '21 21:11 jubilatious1

Beyond single-character ordinary_identifiers (which are alphabetic but not underscore (i.e. <alpha> minus _, or more simply <:L>)

I don't believe that the second part of that is correct: _ can be an identifier by itself, for example in my \_ = 42 or sub _ { }. ($_, @_, and %_ are already taken, but that doesn't mean that _ is an invalid identifier.) So I don't think we need to describe single-characters as a special case at all, which simplifies things a bit.

codesections avatar Nov 23 '21 22:11 codesections

Wow, I never would have guessed.

I'm not sure I agree with that design decision (presuming it was a conscious decision), to re-make the entire language and still allow something like my \_ = 42 or sub _ { }. I would have guessed my \_ would have been disallowed, or possibly reserved. Same with sub _ { }.

But I better stop here, before I offend someone.

jubilatious1 avatar Nov 24 '21 04:11 jubilatious1

I'm not sure I agree with that design decision (presuming it was a conscious decision), to re-make the entire language and still allow something like my \_ = 42

I agree that my \_ = 42 is a bad idea – but we also allow even worse/more unreadable identifiers such as my \ᱹ = 42 (that's \x[1C79] in case it doesn't display properly in your font – a glyph that represents a letter according to Unicode). The thing about ᱹ and _ is that they're obviously bad choices for single-character variable names in virtually all cases, so I'm not too bothered by the fact that Raku allows them. That's especially true because forbidding _ as a single character variable would add complexity to the rules for what characters are allowed (as we've just been discussing).

Personally, I'm kind of glad Raku keeps the rules fairly simple and then trusts users not to abuse them other than as a prank.

codesections avatar Nov 24 '21 16:11 codesections

FYI, I did a somewhat extensive analysis of <ident> and "identifier" in this SO answer. The first section may be of interest starting at:

The built in ident rule does precisely the same as if it were declared as ...

and the section The rest of this answer provides a ToC to orient readers on what might be of interest in the rest of the answer.

raiph avatar Dec 18 '21 23:12 raiph

My primary complaint against this doc is alphanumeric and alphabetic do not equate to <alnum> and <alpha>.

The bare fix:

An ordinary identifier is composed of a leading alphabetic or underscore character which may be followed by one or more alphanumeric and/or underscore characters. It may also contain isolated, embedded apostrophes ' and/or hyphens -, provided that the next character is each time alphabetic or the underscore.

Or, trying for a smoother ride and attempting to be less ASCII:

An ordinary identifier can be composed in two ways. One form is a letter or underscore which may be followed by more letters, underscores, and/or digits. The other form is multiple short forms separated by isolated apostrophes' and/or hyphens-.

0rir avatar Jan 15 '22 22:01 0rir