chibi-scheme
chibi-scheme copied to clipboard
srfi.130: can't split string on NUL characters
This works:
chibi-scheme -m chibi.string
> (string-split "foo\x00;bar\x00;baz\x00;" #\null)
("foo" "bar" "baz" "")
This doesn't:
chibi-scheme -m srfi.130
> (string-split "foo\x00;bar\x00;baz\x00;" "\x00;")
("" "" "" "" "" "" "" "" "" "" "" "" "")
Technically this is an invalid string, and future versions of Chibi may reject its creation to begin with.
(chibi string) string-split uses a manual loop in Scheme with a char predicate.
(srfi 130) allows full string delimiters, so uses string-contains, which in turn calls strstr. I could replace this with memmem, but that's less portable.
Let me think about it.
Ah, I see. Some things that come to mind are 1) use musl's memmem implementation (e.g. extracted here https://github.com/leahneukirchen/mblaze/blob/master/mymemmem.c ) which is portable, efficient and permissively licensed, 2) detect NULs and fall back to a naive memcmp loop (which is the same for a 1-byte needle really).
In any case, I'd strongly recommend allowing NUL bytes in strings, which also is needed for proper roundtripping of UTF-8 and other things.
(The actual problem I had was reading output of a program that prints NUL-separated records, but there is no read-line with a custom record separator.)
Gnulib also has a mature memmem module (if LGPLv2+ is an option).
Am Do., 12. Aug. 2021 um 11:30 Uhr schrieb Leah Neukirchen < @.***>:
Ah, I see. Some things that come to mind are 1) use musl's memmem implementation (e.g. extracted here https://github.com/leahneukirchen/mblaze/blob/master/mymemmem.c ) which is portable, efficient and permissively licensed, 2) detect NULs and fall back to a naive memcmp loop (which is the same for a 1-byte needle really).
In any case, I'd strongly recommend allowing NUL bytes in strings, which also is needed for proper roundtripping of UTF-8 and other things.
(The actual problem I had was reading output of a program that prints NUL-separated records, but there is no read-line with a custom record separator.)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ashinn/chibi-scheme/issues/771#issuecomment-897489086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDTQ7PQDTNFSHFY24KUSDT4OIE3ANCNFSM5B7VBSQA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
In any case, I'd strongly recommend allowing NUL bytes in strings, which also is needed for proper roundtripping of UTF-8 and other things.
+1
Chibi was designed to be embedded in C, with a close connection to the standard C data types and libraries. As far as the FFI is concerned strings are NUL terminated. Pretending otherwise will always leave holes where some things won't work.
So we have three options:
- error early and not allow embedded NUL to begin with
- leave things as they are to allow roundtrip I/O but fail on many common operations
- patch up some cases to make the failures more rare and consequently more surprising
Of the three options, IMO the first seems favorable in that programming errors are caught early. It also helps interoperability with other R7RS implementations because NUL bytes in strings do not have to be supported.
To mitigate the loss of some applications of strings that are currently possible with Chibi, one can use UTF-8-encoded bytevectors instead. In the long run, these can be accompanied by procedures providing the most important string operations for them. (See also SRFI 207.)