Add bytes.to_lower and bytes.to_upper
the strings package may already have this implemented
Any particular motivation, @laytan?
- I don't know whether it makes sense for the
bytespackage to have case conversion. The package doesn't assume any particular encoding and thestringspackage already provides this.
foo := []u8{65, 66, 67} // ABC
bar := strings.to_lower(string(foo))
- Does the Unicode encoding guarantee that upper- and lowercase versions of a glyph encode to the same length? If it can't guarantee that (including for future glyphs), then encoding in place like this isn't safe as the result may expand.
I'm curious what @gingerBill thinks, but I'm reluctant to add this change considering point 1 especially.
Most of core:bytes is a 1:1 to core:strings. Maybe we should remove all the duplicated procedures?
Most of core:bytes is a 1:1 to core:strings. Maybe we should remove all the duplicated procedures?
There is a case to be made to have both, and they could have subtly different behaviour based on strings having an encoding and bytes not assuming one.
Motivation was that this does not need conversion to string and then back to bytes if you want bytes in, and bytes out. Also makes it done in place while strings package allocates.
The bytes package has a couple of procedures that use runes/unicode already so I found it fit, and that it is already in the strings package is the case for most procedures in the bytes package.
As for your 2nd point @Kelimion, upon further investigation, unicode has a couple of cases where the bytes length changes between case, reference.
I don't think the unicode package implements these cases though (unimplemented or bug, idk), I wrote this small script to verify:
package main
import "core:fmt"
import "core:unicode"
import "core:unicode/utf8"
chars :: []string{
"ǰ", // Latin Small Letter J with Caron
"ff", // Latin Small Ligature Ff
"ῗ", // Greek Small Letter Iota with Dialytika and persispomeni
}
main :: proc() {
for tc in chars {
ch, ch_size := utf8.decode_rune(tc)
ch_bytes, _ := utf8.encode_rune(ch)
fmt.printf("ch: %v\n", ch)
fmt.printf("ch_bytes: %v\n", ch_bytes)
fmt.printf("ch_size: %v\n", ch_size)
upper_ch := unicode.to_upper(ch)
fmt.printf("upper_ch: %v\n", upper_ch)
upper_bytes, upper_size := utf8.encode_rune(upper_ch)
fmt.printf("upper_bytes: %v\n", upper_bytes)
fmt.printf("upper_size: %v\n", upper_size)
fmt.println()
}
}
And all these characters come out the exact same character they came in, while above document says otherwise.
closing stale pr