RFCs icon indicating copy to clipboard operation
RFCs copied to clipboard

Make stropped (backticked) identifiers style sensitive

Open omentic opened this issue 3 years ago • 16 comments

Abstract

Make stropped (enclosed in backticks) identifiers literal identifiers. This proposal would make writing low-level 1:1 C wrappers a smoother process while not breaking past code* and also not affecting general usage of style insensitivity.

Literal identifiers, for this purpose, means any string of letters, digits, and underscores: with no restrictions on beginning with a letter, ending with an underscore, or having multiple repeated neighboring underscores. They are not subject to identifier equality.

*: see Backwards Compatibility section

Motivation

CC. @xigoi: I hope you don't mind me turning your idea into an RFC before you 😄

One of the problems with style insensitivity brought up on the (very long) #456 is that it does not mesh well with creating 1:1 or automating creating C wrappers, despite Nim's great FFI. Making stropped identifiers style sensitive would alleviate one of the pains of working with low level (and other) libraries like OpenGL/Xkbcommon/etc that are bound to have conflicting identifiers.

Currently conflicting identifiers have to be manually renamed to follow some sort of convention, leading to horrors like so: my xkbcommon wrapper. If stropping made it possible to refer to those constants in a style-sensitive manner, having to manually distinguish these could be avoided.

Note: "stropping" for those unfamiliar means enclosing an identifier in a pair of backticks. See Nim's Wikipedia for examples.

Note: "style sensitive" in this RFC means equivalent-with-C style sensitive: including trailing underscores being valid, etc.

Description

One benefit of style insensitivity is that it discourages poorly named identifiers: for example, NvColorFormat_V8_U8 and NvColorFormat_V8U8 being different constants in C OpenGL is frankly, stupid, and should be avoided in idiomatic Nim code. A good argument could be made that style sensitivity should not be allowed even when wrapping C to not carry poor design decisions over to Nim code.

A counterargument to this point is that standard C function names (my_long_type_prefix_then_the_actual_function_name()) are also bad practice and the best thing to do is write a higher-level wrapper. Writing that higher-level wrapper, then, is made much easier if you can have a 1:1 equivalent low-level C wrapper to work with and don't have to bother manually renaming conflicting identifiers.

A downside of this is that it does complicate Nim's style insensitivity rules: but hey, stropping's pretty obscure anyway and this would be used almost exclusively for C wrapping that I don't think it'd be too much of a problem.

Code Examples

# General usage
let `style_sensitive` = "Hello world"
echo `style_sensitive` # valid
echo `styleSensitive` # fails
echo `stylesensitive` # fails
echo style_sensitive # fails

# Expected use (while writing wrappers)
const
  `XKB_KEY_ch`* = 0xfea0
  `XKB_KEY_Ch`* = 0xfea1
  `XKB_KEY_CH`* = 0xfea2
  `XKB_KEY_c_h`* = 0xfea3
  `XKB_KEY_C_h`* = 0xfea4
  `XKB_KEY_C_H`* = 0xfea5

echo `XKB_KEY_Ch` # stropped identifiers are not ambiguous

# Nothing changes for non-stropped identifiers
let foo = helloWorld()
let bar = hello_world()

Backwards Compatibility

Stropping is currently used for the following:

Literal identifiers are an extension of making keywords usable as normal variables. There is no conflict with defining operators. Template concatenation may coexist alongside stropping for literal identifiers: as template concatenation will always have at least one space, and literal identifiers are prohibited from having spaces.

Because style insensitivity applies to keywords and template concatenations may contain arbitrary identifiers: this would technically be a breaking change. However: as stropped identifiers are typically not exported to be used by other programmers, I think it is highly unlikely to break much if any code.

omentic avatar Aug 20 '22 01:08 omentic

@j-james, I believe you meant to say my idea posted back in June. Which is totally understandable given xigoi omitted credit. I was happy to see xigoi mention making an RFC out of it and even happier to see that you did 👍.

Zectbumo avatar Aug 20 '22 01:08 Zectbumo

Backticks can also be used for template identifier concatenation -

template thing(a, b, c: untyped): untyped {.dirty.} = 
  let `a b` = c

thing(first, Name, 12)
echo first_name
echo firstName

What about identifiers that were originally introduces in this context?

Another similar concern

import macros
macro bar(fooBar) =
  result = quote do:
    var `foobar` = 1 # foobar vs fooBar ? what if there are other identifiers in scope ?
bar(z)
echo z

I brough up similar question about a year ago https://discord.com/channels/371759389889003530/768367394547957761/881803458624229406 and those were counterexamples - I'm not sure if there is a way to meaningfully tackle them, but maybe you will figure out a better solution


Might also add support for __c_identifier as well - make stropping just mean "read the thing in backticks as I wrote it, and treat it as an identifier wholly" instead of "we allow keywords, but still check for leading _ etc. - after all if we are talking interop, this is also a concern.

var `__c`: int # Does not work, triggers 'invalid token: _ (\95)'
var `a_a-12_f`: int # Does not work, triggers 'double underscore error'
var `a:::b`: int # Works
var `a::b`: int # Does not work, lexing error with 'identifier expected'

haxscramper avatar Aug 20 '22 20:08 haxscramper

It would be yet another feature but one could add double-backtick syntax to mean verbatim identifier.

var ``__c``: int
var ``a_a-12_f``: int
var ``a:::b``: int
var ``a::b``: int

metagn avatar Aug 20 '22 20:08 metagn

Last two items are bugs IMO, just like the second one - there is no "double identifier" in it anyway.

var `^&S%DF%^&:::::::SDF` = 12

this is allowed now. Why a::b isn't?

haxscramper avatar Aug 21 '22 09:08 haxscramper

The backticks are also used for the operator/procedure call syntax. You can use the backticks to call an operator like a normal proc.

Frankly I would prefer having a pragma that I could push to make everything in a C wrapper module style insensitive. This implies new overload resolution rules, however, since what happens when a style insensitive identifier overloads a style sensitive one must be addressed.

As for overload resolution, I think the only real option is to eliminate style sensitive identifiers from the overload set completely when they do not fully match the style of the call, otherwise simply adding a style insensitive equivalent in some other module could make the style sensitive identifier become, in practice, style insensitive if both modules are imported.

barcharcraz avatar Aug 23 '22 00:08 barcharcraz

It would be yet another feature but one could add double-backtick syntax to mean verbatim identifier.

var ``__c``: int
var ``a_a-12_f``: int
var ``a:::b``: int
var ``a::b``: int

What does verbatim mean to you, and how is this different from {.codegendecl.}. I would love some mechanism that allowed constructing identifiers that are illegal in C, but legal in the executable/object format (like ?square@@YAHXZ, MSVC C++ mangling). I think you would have to do this with weakrefs/weak aliases (with msvc this involves using an assembler, and in masm you can only define one weakref per asm file due to assembler bugs), so maybe it isn't worth a special nim feature.

barcharcraz avatar Aug 23 '22 01:08 barcharcraz

Basically

var ``a_a-12_f``: int

is equal to

macro foo =
  let name = ident("a_a-12_f")
  result = quote do:
    var `name`: int
foo()

This is how stropping works in other languages. The way normal accents work is like a stream of expressions, in this case it would just be like a string. Normal accents do not support spaces or stuff like 2D.

This would not affect codegen, it would be mangled like any other identifier. However, it could enforce style sensitivity if desired.

metagn avatar Aug 23 '22 08:08 metagn

I made the suggestion in the other RFC of using double backticks for byte-identical lookups. That leaves all the single backtick semantics the same and it does not matter if this looks ugly because it's only there for very specific ABI reasons which should be buried in a wrapper that nimizes the interface.

IcedQuinn avatar Aug 23 '22 08:08 IcedQuinn

What does verbatim mean to you, and how is this different from {.codegendecl.}

I'm talking about frontend level, not backend-specific details of how things are generated in the C/C++/Js/vmgen etc. Verbatim means - exactly as written, without any alterations and alternatives.

haxscramper avatar Aug 23 '22 16:08 haxscramper

Which is totally understandable given xigoi omitted credit.

Sorry, I totally forgot where I'd seen the idea :P

xigoi avatar Sep 03 '22 21:09 xigoi

There should be a mention in this RFC about how literals behave inside backticks. I presume this would still work the same:

var `2.0`: int
assert `2d`.addr == `2.0`.addr

var `"\L"`: int
assert `"\10"`.addr == `'\l'`.addr

thx @metagn for pointing out that there is processing of literals inside backticks. btw, backticks do support spaces and 2d, you just have to write it like this: `"my space"` and `"2d"`

The following code would act differently if this RFC is implemented:

var `"2d"`: int
assert `"2d"`.addr == `"2D"`.addr # throws error in the future

The compiler would then report in the future Error: undeclared identifier: '2D':

Zectbumo avatar Sep 23 '22 08:09 Zectbumo

Revisiting this RFC: while I still like it and think it would be extremely useful for C FFI, I don't quite know how to handle the cases that @haxscramper brought up (also, I agree with support for __c_identifier).

omentic avatar Apr 04 '23 21:04 omentic

@haxscramper I have updated the abstract and backwards compatibility sections to address conflicts with the existing language features you brought up.

I am conflicted about whether to define a literal identifier as a sequence of non-whitespace characters or as a C identifier though. On one hand, supporting - among other things would make this useful for Racket FFI and other languages with different identifier restrictions. On the other hand, that adds more complexity, and we would have to deal with what @Zectbumo brought up.

I am also conflicted on whether this RFC is useful with the revelation of https://github.com/nim-lang/RFCs/issues/484#issuecomment-1493272439 (`" "` and `""" """` have similar functionality). If not, then the exact semantics of those should be documented, preferably in the FFI section.

omentic avatar May 18 '23 04:05 omentic

This PR is still helpful. Quote-backticks remain style insensitive.

proc `"verbatium"`() =
  echo "verbatium"

# Error: redefinition of 'verbatium'; previous declaration above
proc `"verBaTium"`() =
  echo "verBaTium"

proc `"__verBa_tiU_m"`() =
  echo "__verNa_tiU_m"

`"verbatium"`()
`"verBaTium"`()
`"__verBa_tiU_m"`()

omentic avatar May 21 '23 03:05 omentic

The way to interface with C or Racket etc is with pragmas, the Nim name doesn't have to have any connection with the Racket name and can adhere to the Nim style guide which is one of the few guides that understands that human brains translate written words into sounds internally. And an underscore has no sound.

Araq avatar May 21 '23 04:05 Araq

I agree, except the problem is that writing down those sounds can get pretty complicated when you're dealing with case-and-underscore-sensitive identifiers. And an unfortunate number of C libraries - especially those working with keyboard input - don't distinguish themselves in any way aside from case.

I envision this RFC as providing a uniform way for tools like Futhark (or manual wrappers) to provide the raw C API, which then is wrapped into a more idiomatic, Nim-like interface by higher-level functions calling those C functions and data types that will be exported.

The problem is, just pragmas are not enough alone: you have to call those functions in order to create the more idiomatic ones, and in order to call those functions you currently have to come up with Nim-compliant names.

const # FIXME: absolutely disgusting
  XKB_KEY_ch_nocap_nocap* = 0xfea0
  XKB_KEY_Ch_cap_nocap* = 0xfea1
  XKB_KEY_CH_cap_cap* = 0xfea2
  XKB_KEY_c_h_nocap_underscore_nocap* = 0xfea3
  XKB_KEY_C_h_cap_underscore_nocap* = 0xfea4
  XKB_KEY_C_H_cap_underscore_cap* = 0xfea5
# Expected use (while writing wrappers)
const
  `XKB_KEY_ch`* = 0xfea0
  `XKB_KEY_Ch`* = 0xfea1
  `XKB_KEY_CH`* = 0xfea2
  `XKB_KEY_c_h`* = 0xfea3
  `XKB_KEY_C_h`* = 0xfea4
  `XKB_KEY_C_H`* = 0xfea5
  

omentic avatar May 21 '23 05:05 omentic