chibi-scheme srfi.130: can't split string on NUL characters

This works:

chibi-scheme -m chibi.string 
> (string-split "foo\x00;bar\x00;baz\x00;" #\null)
("foo" "bar" "baz" "")

This doesn't:

chibi-scheme -m srfi.130
> (string-split "foo\x00;bar\x00;baz\x00;" "\x00;")
("" "" "" "" "" "" "" "" "" "" "" "" "")

Aug 11 '21 22:08 leahneukirchen

Technically this is an invalid string, and future versions of Chibi may reject its creation to begin with.

Aug 11 '21 23:08 ashinn

(chibi string) string-split uses a manual loop in Scheme with a char predicate.

(srfi 130) allows full string delimiters, so uses string-contains, which in turn calls strstr. I could replace this with memmem, but that's less portable.

Let me think about it.

Aug 12 '21 01:08 ashinn

Ah, I see. Some things that come to mind are 1) use musl's memmem implementation (e.g. extracted here https://github.com/leahneukirchen/mblaze/blob/master/mymemmem.c ) which is portable, efficient and permissively licensed, 2) detect NULs and fall back to a naive memcmp loop (which is the same for a 1-byte needle really).

In any case, I'd strongly recommend allowing NUL bytes in strings, which also is needed for proper roundtripping of UTF-8 and other things.

(The actual problem I had was reading output of a program that prints NUL-separated records, but there is no read-line with a custom record separator.)

Aug 12 '21 09:08 leahneukirchen

Gnulib also has a mature memmem module (if LGPLv2+ is an option).

Am Do., 12. Aug. 2021 um 11:30 Uhr schrieb Leah Neukirchen < @.***>:

Ah, I see. Some things that come to mind are 1) use musl's memmem implementation (e.g. extracted here https://github.com/leahneukirchen/mblaze/blob/master/mymemmem.c ) which is portable, efficient and permissively licensed, 2) detect NULs and fall back to a naive memcmp loop (which is the same for a 1-byte needle really).

In any case, I'd strongly recommend allowing NUL bytes in strings, which also is needed for proper roundtripping of UTF-8 and other things.

(The actual problem I had was reading output of a program that prints NUL-separated records, but there is no read-line with a custom record separator.)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ashinn/chibi-scheme/issues/771#issuecomment-897489086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDTQ7PQDTNFSHFY24KUSDT4OIE3ANCNFSM5B7VBSQA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Aug 12 '21 09:08 mnieper

In any case, I'd strongly recommend allowing NUL bytes in strings, which also is needed for proper roundtripping of UTF-8 and other things.

+1

Aug 12 '21 09:08 lassik

Chibi was designed to be embedded in C, with a close connection to the standard C data types and libraries. As far as the FFI is concerned strings are NUL terminated. Pretending otherwise will always leave holes where some things won't work.

So we have three options:

error early and not allow embedded NUL to begin with
leave things as they are to allow roundtrip I/O but fail on many common operations
patch up some cases to make the failures more rare and consequently more surprising

Aug 13 '21 08:08 ashinn

Of the three options, IMO the first seems favorable in that programming errors are caught early. It also helps interoperability with other R7RS implementations because NUL bytes in strings do not have to be supported.

To mitigate the loss of some applications of strings that are currently possible with Chibi, one can use UTF-8-encoded bytevectors instead. In the long run, these can be accompanied by procedures providing the most important string operations for them. (See also SRFI 207.)

Aug 13 '21 08:08 mnieper

chibi-scheme chibi-scheme copied to clipboard

srfi.130: can't split string on NUL characters

chibi-scheme
chibi-scheme copied to clipboard