ziglings icon indicating copy to clipboard operation
ziglings copied to clipboard

String comparaison

Open sterchelen opened this issue 1 year ago • 8 comments

Hi, On the strings exercises we don't learn how to compare two strings. I'm sure a lot of zig learner would love to know that. wdyt?

sterchelen avatar Apr 07 '23 09:04 sterchelen

Unfortunately, that's actually a pretty big topic.

This is pretty much all the official documentation has to say about strings - it makes creating them easy:

https://ziglang.org/documentation/master/#String-Literals-and-Unicode-Code-Point-Literals

One way to compare strings using just the Zig language is to compare them as an array of u8 bytes. You could use a loop for that.

To avoid writing the loop manually, Zig has comparison functions in the standard library for comparing memory.

But to actually compare text strings first requires that we decide how those strings are encoded. If they're UTF-8 (as Zig string literals are), then comparing bytes will tell you if they're identical (using either the u8 comparison loop or standard library memory comparison methods).

But comparing actual Unicode glyphs against each other is a much larger task. Some people have written Zig libraries for UTF-8 and I believe those have string comparison functions. (Including converting upper and lower case for Latin characters and that sort of thing.)

There are speed-vs-accuracy decisions to take into account when handling strings (if you assume it's all 7-bit ASCII, you can do string stuff super fast!) and for better or worse Zig has opted to not take a stance on how a developer should tackle them.

Maybe someone else could chime in with additional information.

ratfactor avatar Apr 07 '23 11:04 ratfactor

To me we could have two exercises:

One way to compare strings using just the Zig language is to compare them as an array of u8 bytes. You could use a loop for that.

The first one.

To avoid writing the loop manually, Zig has comparison functions in the standard library for comparing memory.

The second one.

And indeed, for now, skip the unicode part.

wdyt @ratfactor?

sterchelen avatar Apr 08 '23 06:04 sterchelen

@sterchelen I agree that would be a good way to structure it. But I'm not sure where that would fit in the Ziglings "lesson plan" such as it is. The truth is, the Zig language has basically side-stepped the issue entirely, and so Ziglings has (so far) too.

Which is not to say that it's not an important thing to learn how to do. To the contrary, I find that most of my own personal programming involves strings, not numbers. So I'm very sympathetic to the importance! That's absolutely one of the first things I'd want to know, too. :-(

Perhaps, when we're confident we've covered the language in full, Ziglings could have extended lessons that cover "cookbook" material for common issues like how to compare strings?

ratfactor avatar Apr 08 '23 13:04 ratfactor

That's absolutely one of the first things I'd want to know, too. :-(

Indeed, while porting host in Zig I was struggling to find relevant information on how to compare two strings 🤷🏼‍♂️ . Didn't find any ref in documentation, nor in ziglearn until I found a reddit post that explains basically how to compare two u8 slices.

Perhaps, when we're confident we've covered the language in full,

You mean that once the official documentation has made the statement clear about how to compare two strings, ziglings project could extend lessons?

That's not part of our conversation but we still have access to C stdlib :)...

sterchelen avatar Apr 09 '23 06:04 sterchelen

Perhaps, when we're confident we've covered the language in full,

You mean that once the official documentation has made the statement clear about how to compare two strings, ziglings project could extend lessons?

Oh, we don't have to wait for that. I think it's reasonable to imagine the official language docs might never cover comparing strings. What I'm saying is that other than the literal syntax (which I suspect only exists for programmer convenience), Zig really doesn't have the concept of a string at all!

(That's a bold statement, so I'm hoping somebody will see this and jump in if I'm wrong.)

Here's the first of a three-part article series by Jose Colon Rodriguez (aka "Dude the Builder"):

https://zig.news/dude_the_builder/unicode-basics-in-zig-dj3

That's not part of our conversation but we still have access to C stdlib :)...

Oh sure. But you'd be better off with a Zig string library.

Here's Rodriguez's Ziglyph lib: https://github.com/jecolon/ziglyph

ratfactor avatar Apr 10 '23 12:04 ratfactor

Unfortunately, that's actually a pretty big topic.

This is pretty much all the official documentation has to say about strings - it makes creating them easy:

https://ziglang.org/documentation/master/#String-Literals-and-Unicode-Code-Point-Literals

Comparing strings using std.mem.eql should be fine most of the time, unless you need Unicode case-folding. Unlike Go, Zig currently does not contain the Unicode Database.

Another issue is Unicode normalization.

perillo avatar Apr 25 '23 19:04 perillo

Would it be within scope of this project to have a couple exercises on very basic std usage / patterns? Things like std.mem.eql, std.ArrayList, a simple example with an allocator...

Arya-Elfren avatar Apr 26 '23 20:04 Arya-Elfren

@perillo and @Arya-Elfren Yeah, I'm increasingly open to the idea that those sorts of basics would probably be useful. We'll want to keep the name of the exercise(s) as general as possible so that even major stdlib changes won't invalidate the whole thing (e.g. xxx_compare_memory rather than xxx_std_mem_eql).

ratfactor avatar Apr 27 '23 23:04 ratfactor