mir Add MIR_scan_string

Pass all test.

I'm not sure what the best API would be. At least MIR_scan_string_s should be safer.

More improvement opportunities: Once it is sure that no \0 is in input string, maybe other code can be simplified too?

Jun 04 '23 19:06 iacore

I am not sure about this PR. I am trying not to change API w/o significant advantages. But I'll think more about this.

Jun 06 '23 20:06 vnmakarov

it would appear this PR makes code slower for no good reason other than to appear as "safe" as Annex K

Jun 06 '23 20:06 rofl0r

I made this PR because I don't want to append the final \0 for mmaped files.

\0 can appear inside a normal file. The current parser stop parsing at first \0. Maybe it should be an error, or ignore the \0?

Jun 07 '23 16:06 iacore

Not all strings are nul-terminated. Raw text files are a common case, as mentioned above. And this comes up a lot in FFI: I’ve worked with several languages whose strings are passed into C as unterminated (pointer, length) pairs. IIRC both Go and Python do this. And then there’s C++’s std::string_view which can hold arbitrary substrings.

If the overhead of calling strlen is a problem, that could be removed; instead, just keep the nul check, and set the max len to a huge value in the existing function so the nul byte will be hit first. (There’s no valid reason to have a 00 byte in either ASCII or UTF-8 text.)

On the other hand, I see the string API as mostly for debugging, so is it really important to save the overhead of copying the text into a nul-terminated buffer?

Jul 31 '23 17:07 snej

If the overhead of calling strlen is a problem, that could be removed; instead, just keep the nul check, and set the max len to a huge value in the existing function so the nul byte will be hit first. (There’s no valid reason to have a 00 byte in either ASCII or UTF-8 text.)

I have applied the suggestion in the last commit.

Nov 30 '23 19:11 iacore

mir
mir copied to clipboard

Add MIR_scan_string_s

mir mir copied to clipboard

Add MIR_scan_string_s

mir
mir copied to clipboard