Potential to increase the identifier character limit?
Description
The Naming Rules state that the character limit for identifiers is 31 characters. This low character limit starts to feel very restrictive as a project grows, especially when working with or translating existing interfaces. This issue has been created to identify if there is a demand for increasing this limit, and to discuss the viability of doing so.
Example 1
One example of this is when working with Objc and Metal. It is common practice to register the Objc Selectors during program initialisation as an optimisation, preventing the additional runtime overhead of these lookups.
There are over 1400 metal selectors alone, with a particularly long one being:
drawPatches:patchStart:patchCount:patchIndexBuffer:patchIndexBufferOffset:instanceCount:baseInstance:tessellationFactorBuffer:tessellationFactorBufferOffset:tessellationFactorBufferInstanceStride:
The code to store this in a global selector, and how Apple handle this in their own Metal-CPP bindings, might look like this:
Sel drawPatches_patchStart_patchCount_patchIndexBuffer_patchIndexBufferOffset_instanceCount_baseInstance_tessellationFactorBuffer_tessellationFactorBufferOffset_tessellationFactorBufferInstanceStride_ = sel_registerName("drawPatches:patchStart:patchCount:patchIndexBuffer:patchIndexBufferOffset:instanceCount:baseInstance:tessellationFactorBuffer:tessellationFactorBufferOffset:tessellationFactorBufferInstanceStride:");
To get this within the 31 character limit, we have to sacrifice a lot of the meaning of the identifier. To solve this case, we can abbreviate using just the first letter of each parameter and by shortening the first word, resulting in: s_drawIdxPtch_ppppccibttt_
While an abbreviation scheme like this can work well internally, it gives a less than desirable experience for users when this code is consumed as a library as the identifiers need to be manually looked up to confirm their content.
Another small issue with this scheme is that code completion is much less reliable due to the first word being abbreviated, where inputting s_drawIndexed would omit the above example.
Example 2
Another example is where there are a lot of similarly named selectors:
"drawIndexedPrimitives:indexCount:indexType:indexBuffer:indexBufferLength:"
"drawIndexedPrimitives:indexCount:indexType:indexBuffer:indexBufferLength:instanceCount:"
"drawIndexedPrimitives:indexCount:indexType:indexBuffer:indexBufferLength:instanceCount:baseVertex:baseInstance:"
"drawIndexedPrimitives:indexCount:indexType:indexBuffer:indexBufferOffset:"
"drawIndexedPrimitives:indexCount:indexType:indexBuffer:indexBufferOffset:instanceCount:"
"drawIndexedPrimitives:indexCount:indexType:indexBuffer:indexBufferOffset:instanceCount:baseVertex:baseInstance:"
"drawIndexedPrimitives:indexType:indexBuffer:indexBufferLength:indirectBuffer:"
"drawIndexedPrimitives:indexType:indexBuffer:indexBufferOffset:indirectBuffer:indirectBufferOffset:"
It becomes very difficult to abbreviate each item, while still maintaining a way to uniquely identify each one in a way that conveys its intent.
Additional
I have not had any compilation errors or side effects when compiling with long identifiers. I am not sure if this is a bug, or if the behaviour is simply undefined.
It is my understanding that C has a 31 character limit for external identifiers, and 63 character limit for internal identifiers, while LLVM doesn't place any restriction on identifier character limits.
@Book-reader has discovered (discord) that the std lib contains a number of identifiers that are over the 31 character limit.
There may be optimisations in the compiler that rely on 31 character limits which should be considered.
The Microsoft C compiler allows up to 247 characters for internal and external identifiers.
Reading through the C Standard I came across the following footnote under "Future language direction":
6.11.3 External names 1 Restriction of the significance of an external name to fewer than 255 characters (considering each universal character name or extended source character as a single character) is an obsolescent feature that is a concession to existing implementations.
I am pretty neutral on this topic, but I think that enforcing a low character limit promotes writing cleaner code and actually thinking about naming your stuff. Paired with the 80 character rule of thumb, I think that it is a solid restriction.
I think this is not what you think. Do you have a "not working" c3 example?
Module names are limited to 31 characters. This example works totally fine:
import std;
module test;
import std;
fn void main(){
String drawPatches_patchStart_patchCount_patchIndexBuffer_patchIndexBufferOffset_instanceCount_baseInstance_tessellationFactorBuffer_tessellationFactorBufferOffset_tessellationFactorBufferInstanceStride = "Ok\n";
io::print(drawPatches_patchStart_patchCount_patchIndexBuffer_patchIndexBufferOffset_instanceCount_baseInstance_tessellationFactorBuffer_tessellationFactorBufferOffset_tessellationFactorBufferInstanceStride);
}
I don't have a non-working example, but have confirmed that it is any identifier. The fact that it works seems like a bug, which is a concern if it will be fixed in future.
The limit is not enforced true. We could possibly say something like 127 or a 255 limit? I want it limited though.
Note that the cname doesn't have a limit, just the identifier name. In this case:
drawPatches:patchStart:patchCount:patchIndexBuffer:patchIndexBufferOffset:instanceCount:baseInstance:tessellationFactorBuffer:tessellationFactorBufferOffset:tessellationFactorBufferInstanceStride:
You'd probably do:
extern fn drawPatches(Patches* patches, NSInteger patchStart, NSInteger patchCount ... ) @cname("drawPatches:patchStart:patchCount:...")
In the case of overloads one might consider a macro to dispatch to the right function.
Thoughts on 127 limit?
Regarding the cname, this isn't applicable to my example as the selectors are stored in global variables that are populated at runtime; they aren't linked to existing c functions. It's a pretty niche use case and not something I would expect to be addressed specifically, but does provide a good example of where a lower character limit can cause friction and impact potential library quality due to the need for heavy abbreviation.
Are there any optimisations that can be gained from a lower limit? If not, my preference would to be to allow the full 255 characters as I am an advocate of descriptive naming. Identifier limits are something that can be enforced with static-analysis tools if desired.
Is it planned to throw a compile-time error when going over the limit?