zig
zig copied to clipboard
style guide: switch to snake case for functions
I understand the general rule does not use snake_case for function. Why do eql_slice_u8 and hash_slice_u8 break this rule?
There are actually quite a few places where the std lib breaks formatting guidelines or is generally inconsistent. For example, many fields in the structs produced by @typeInfo are snake_case despite being of type type, std.mem has readIntBE and readInt while std.io has readIntBe and readIntVar, etc.
I think it is just a case of being pretty low on everyone's priority list until zig nears 1.0.0.
I would also want to standardize names (printAlloc vs allocPrint, isXXXX setXXXX getXXXX findXXXX) and common param orders (ie is alloc always the first param except when passing self). Also the stdlib should set a good example and use enums judiciously even for simple flags.
I am amending this proposal with how to handle initialisms and acronyms in TitleCase things. In summary: Don't mix-mangle the case of initialisms or acronyms.
Before:
fn readU32Be() u32 {}
openZ() open0()
Os
Json
Url
XmlHttpRequest
ServeHttp
ServeHttpOs
After:
fn read_u32_be() u32 {}
open_z() open0()
OS
JSON
URL
XML_HTTP_Request (use underscore even in TitleCase to indicate border between initialisms/acronyms)
XML_HTTP_Request2
XML_HTTP_RequestRead_JSON (no underscore between Request and Read)
XML_HTTP_RequestRead_JSON_u32 (don't mangle primitive type names for the sake of TitleCase)
XML_HTTP_Request_u32 (need underscore to show where Request ends and u32 begins)
XML_HTTP_Request_u32_2
ServeHTTP
ServeHTTP_OS
If those things were functions:
os()
json()
url()
xml_http_request()
xml_http_request2()
xml_http_request_read_json()
xml_http_request_read_json_u32()
xml_http_request_u32()
xml_http_request_u32_2()
serve_http()
serve_http_os()
Reasoning:
Our previous camelCase style has flaws:
- The information it attempts to communicate does not work for single-word function names.
- It requires case-mangling acronyms/initialisms.
This is unfortunately a widely breaking change and feels annoying to update code just for an arbitrary style decision, so my suggestion for how to roll this out, if accepted, would be to wait until we have the ability to do identifier-renaming (the feature that some IDEs provide) and it works well. That feature would ease the burden of transitioning a codebase from one style to the other, and we could even provide a script for people to run which would make the changes automatically (the script would have listed all the std lib identifiers that changed; it would not arbitrarily change users' own identifiers).
I came here to issues hoping to find there was a plan to fix the style guide.
Awesome to see this will be addressed.
The ayes would seem to have it (good, IMHO).
@andrewrk For those of us busy building Zig libraries, it would be rather helpful to know soonest whether this proposal--per the examples you gave above--will be accepted or not, so as to not have to frustrate downstream users with massive renames later on.
Note to future self in about 2 weeks: don't postpone this one to 0.8.0 make a decision!!
Right now the style guide helps to see when something is a function or a variable or a namespace, with this only type would be easily recognisable. In IDE you would see the type kind (and much more) on hover, however it is nice to have that communicated in plaintext.
That's true, and was the main motivation for status quo style, but it doesn't work for functions with only 1 word in them, so there was always that cognitive dissonance.
BTW did you know that ZLS+vscode has semantic highlighting? It makes functions a different color, so you don't even have to hover.
I haven't tried ZLS yet.
Before i mostly used Java which has similar preferred style to Zig and i not even once had a problem naming a function that has an acronym or an abbreviation (for me readU32BE is okay, however more idiomatic name in zig would be readU32(.Big, ...); as for single word functions that is not a big problem, but it certainly does not worth extending to all functions. When making single word variables of type function i often add a suffix readFn). Also it seems that Java's preferred style is no longer all caps acronyms, for example few recently added classes java.time.chrono.IsoChronology and java.net.http.HttpClient
While semantic syntax highlighting is nice it requires editor support and the experience for non LSP users is degraded. Basic completion methods work rather well the minute the second word is hit which would be made less effective by snake_case matching functions, variables, and so on. The shape change is, in my opinion, much easier to follow as TitleCaseTypes clearly separate from regularFunctions and variables_with_snake_case.
On the topic of semantic highlighting, switching highlighting based on what you want to focus on as in https://buttondown.email/hillelwayne/archive/syntax-highlighting-is-a-waste-of-an-information/ would (arguably) be better with a shape difference since the highlight focus changes based on the task and removing the difference between functions and other things for the other modes.
FWIW I'm a fan of camelCase for function/value distinction, and a fan of case mangling to avoid awkwardness of things like XMLHTTPRFCReader.
Talked with @andrewrk and @thejoshwolfe. We decided to keep approximately the current naming conventions, but clear up any ambiguous cases with the following specific algorithm:
To produce a name, first write out the name all lowercase with spaces. Then, consider the type of variable:
- instantiated (non-namespace) type => capitalize each word and delete the spaces.
- function => capitalize each word except the first and delete the spaces.
- anything else => replace the spaces with underscores.
If at any time you would delete a space and the character to the right of the space is a number, instead replace the space with an underscore. anytype values fall into the "anything else" category.
Some examples:
xml http request => XmlHttpRequest
read u32 be => readU32Be
read u 32 id => readU_32Id
open u32 2 => openU32_2
os handle => OsHandle
a b test => ABTest
ab test => AbTest
value as u32 => value_as_u32
my u32 module => my_u32_module
This transformation is both unambiguous and invertible.
@SpexGuy any recommendations for if the name starts with a number?
Uh, taking off my "official" hat for a second, but maybe don't? I don't think it's unreasonable to disallow identifiers which start with numbers in the standard library.
@SpexGuy one such example might be crypto functions; at the moment we are going to have:
pub const aead = struct {
pub const aegis = struct {
pub const Aegis128L = @import("crypto/aegis.zig").Aegis128L;
pub const Aegis256 = @import("crypto/aegis.zig").Aegis256;
};
.....
};
It would make a lot of sense to have aead.aegis.128L instead of aead.aegis.Aegis128L, where we only really keep the Aegis prefix so that the name doesn't start with a number.
AEGIS128l is the name of the algorithm, not "128l". Splitting that name wouldn't make sense, and would break grepability as well.
Oh, yeah, if you would capitalize a word and it starts with a number, leave it as is, and the space to the left of it will always become an underscore because it's a number to the right of a space. So under these rules it would be
aegis 128 l => Aegis_128L. That's a little weird but unambiguous.
@SpexGuy @andrewrk should we link the answer above in the FAQ, so the current style guide is easily discoverable?
any recommendations for if the name starts with a number?
Identifiers can't start with numbers unless you use @"123" syntax:
IDENTIFIER
<- !keyword [A-Za-z_] [A-Za-z0-9_]* skip
/ "@\"" string_char* "\"" skip
I realize this proposal has been closed, but I started zig a while ago and I find the camelCase for function very disturbing.
I don't want to start a debate, but I'll just share my experience as a developer.
In short, I can read TitleCase and snake_case fine, but I have a lot of trouble with camelCase. I've been coding for 30 years, and its always been a pain point. In my head when I see a camelCase word there is a long pause, like it was written camel. Case. Which makes the code much harder for me to read.
Apparently, I am not alone, I found a study about this https://ieeexplore.ieee.org/document/5521745 but I do not know how significant this is.
Again, I do not want to start a debate, nor even think of changing anything in zig. But I do use snake_case for functions in all my code (and TitleCase for types). And I hope the code I write (libraries (I have a lot of projects)..) will be accepted by the zig community without being frowned upon as "bad style".
Seeing this reopened has given me renewed hope that the official style guide might be switched to snake_case for functions. I personally find snake_case significantly more readable than camelCase, especially for longer multi-word identifiers and at small font sizes/greater viewing distance.
If anyone would like to see a significant example of mission-critical Zig code written with snake_case functions, check out tigerbeetle. In particular, the Viewstamped Replication protocol implementation in replica.zig is good reading. Consider the readability of e.g count_message_and_receive_quorum_exactly_once() vs countMessageAndReceiveQuorumExactlyOnce() at different font sizes and viewing distances. Also try taking off your glasses if you wear them, if you don't use your empathy and imagination.
As pointed out above, using camelCase to distinguish function identifiers from other snake_case identifiers only works for identifiers with more than one word which makes me question the entire premise. A rule like this is a lot less valuable when all it can say about single word identifiers is "maybe it's a function, maybe not."
I'd also like to note that most if not all open source C projects I have contributed to are written with snake_case functions. Perhaps this is largely influenced by the Linux kernel style guide. In any case, switching to snake_case function names for Zig code may make working on projects using a mix of C and Zig easier.
I've never been a fan of camel case either.
Given how it's used, it's fairly obvious to understand if an identifier is a local identifier or a function, especially since Zig doesn't allow shadowing.
So, camel case doesn't improve readability, and doesn't help avoid confusion/bugs either.
I just stumbled on an HN article that point to as study about this.
https://whatheco.de/2013/02/16/camelcase-vs-underscores-revisited/
The conclusion is that snake_case is about 20% faster to read.
In prior studies the result was that it was a bit slower to write but if we want to emphasis on reading it seems that this may matters.
--
Nicolas Goy https://www.kuon.ch
-------- Original Message -------- From: Frank Denis @.> Sent: January 23, 2023 11:58:47 AM GMT+01:00 To: ziglang/zig @.> Cc: Nicolas Goy @.>, Mention @.> Subject: Re: [ziglang/zig] style guide: switch to snake case for functions (#1097)
I've never been a fan of camel case either.
Given how it's used, it's fairly obvious to understand if an identifier is a local identifier or a function, especially since Zig doesn't allow shadowing.
So, camel case doesn't improve readability, and doesn't help avoid confusion/bugs either.
Anyone have any ideas how to roll out a change like this with minimal pain and suffering?
Write a script that can be used with other codebase too and do it in one commit per branch. The script should traverse the codebase, produce a list of identifier for manual check (this can take several iterations) and then replace them everywhere, doc included.
--
Nicolas Goy https://www.kuon.ch
-------- Original Message -------- From: Andrew Kelley @.> Sent: January 26, 2023 8:07:53 AM GMT+01:00 To: ziglang/zig @.> Cc: Nicolas Goy @.>, Mention @.> Subject: Re: [ziglang/zig] style guide: switch to snake case for functions (#1097)
Anyone have any ideas how to roll out a change like this with minimal pain and suffering?
It'd be nice to have the old and new names coexist for a release cycle so that third-party code can reasonably target both the latest tagged version of Zig and development builds. As an example:
| Zig Version | shaveYakHerd |
shave_yak_herd |
|---|---|---|
| v0.10.0 | Present | N/A |
| v0.11.0-dev | Deprecated (Alias) | Added |
| v0.11.0 | Deprecated (Alias) | Present |
| v0.12.0-dev | Removed | Present |
I think const shaveYakHerd = shave_yak_herd; // Deprecated in v0.11.0. would be sufficient. But maybe there's an opportunity for something fancier like const shaveYakHerd = std.meta.deprecated(shave_yak_herd, semver_0110) that can @compileLog something helpful after a certain version is exceeded or a "log deprecations" flag is set.
update zig fmt to do the conversion, announce a quick code freeze on many channels, merge as many prs as possible, run zig fmt on the main zig repo, allow prs again
As zig is still in what we could call "in development" I think it is reasonable to not have a compatibility layer but provide a script that would update third party codebase in a single run.
For example, root.log did break and now it is std_options.logFn so the community is used to it.
We can provide the script and ask the community to test it to gather feedback. I think that would be easier to deal with it this way. With a clear before and after.
Deprecation is more justified when there is a rewrite involved to upgrade the code, if all there is to it is to run a script, I think it is simpler without it.
--
Nicolas Goy https://www.kuon.ch
Assuming this proposal applies to zig's builtin functions as well, I think those should be automatically updated by zig fmt without any user intervention (i.e. transform @intCast() to @int_cast()). I don't think zig fmt should convert identifiers from camelCase to snake_case, we've explicitly stated that such conversion is out of scope for zig fmt in the past.
I personally don't think trying to expose both camelCase and snake_case identifiers for a full release cycle is worth the effort. As @kuon says, semantics aren't changing and updating should be as simple as a find and replace. I think the best thing we can do to reduce pain is to provide good tooling to do this find and replace at scale.
To put my money where my mouth is, I am currently writing a prototype program to convert camelCase to snake_case. Here is my planned CLI, based on zig fmt:
pub const usage_camel2snake =
\\Usage: zig camel2snake [file]...
\\
\\ Converts camelCase identifiers to snake_case in the input
\\ files and modifies them in-place. Arguments can be files
\\ or directories, which are searched recursively.
\\
\\ Only identifiers appearing in the whitelist and not appearing
\\ in the blacklist are converted. By default both lists are empty
\\ and nothing will be converted.
\\
\\Options:
\\ -h, --help Print this help and exit
\\ --color [auto|off|on] Enable or disable colored error messages
\\ --exclude [file] Exclude file or directory from processing
\\ --whitelist fooBar Add identifier fooBar to the whitelist
\\ --blacklist fooBar Add identifier fooBar to the blacklist
\\ --whitelist-all Whitelist all identifiers
\\
;
It may also be desirable to have --whitelist-file and --blacklist-file options that read a list of newline-separated identifiers from a file. Another interesting option would be --whitelist-std which would whitelist a hardcoded list of camelCase symbols exposed by the standard library at the point in time that the standard library was converted to snake_case.
Regardless of how the whitelist/blacklist are constucted, the method of operation I intend to use is as follows:
- Tokenize the target .zig files using a custom tokenizer that ignores the zig grammar and splits the input into contiguous chunks of upper/lower case ASCII plus numbers and containing at least one capital ascii character. For example the following code would be tokenized as
sliceTo, fooBar, fooBar:
/// see also std.mem.sliceTo()
pub fn fooBar() []const u8 {
return "fooBar";
}
This has the advantage of finding camelCase identifiers that appear in comments and string literals, which should also be updated.
- Write the raw source code to the output file without doing any formatting. The only change is to convert tokens from the above tokenization that appear in the whitelist but not the blacklist to snake case. This will use the following simple algorithm:
fn camel2snake(camel: []const u8, writer: anytype) !void {
for (camel) |c| {
if (std.ascii.isUpper(c)) {
try writer.writeByte('_');
try writer.writeByte(std.ascii.toLower(c));
} else {
try writer.writeByte(c);
}
}
}
I'm sure there are some edge cases that will be missed by this relatively simple approach. What I don't know is how prevalent they will be and whether or not the complexity of handling them automatically will be worth it given their prevalence. I think the best thing to do is write the script and try it out to see what goes wrong and what the pain points are.
I personally don't think trying to expose both camelCase and snake_case identifiers for a full release cycle is worth the effort.
Maybe I misunderstand the effort. I thought some [pub] const oldFunc = new_func lines would be a simple way to ease the migration, especially if it's a hypothetical rename tool inserting them. A one-and-done is certainly easier for the Zig project, but downstream packages will have to choose "which side" of the rename to support. I maintain the Ubuntu package of Zig, and the install base is fairly split. One-third of installations follow master and two-thirds follow tagged releases.