mir
mir copied to clipboard
Add MIR_scan_string_s
Pass all test.
I'm not sure what the best API would be. At least MIR_scan_string_s
should be safer.
More improvement opportunities: Once it is sure that no \0 is in input string, maybe other code can be simplified too?
I am not sure about this PR. I am trying not to change API w/o significant advantages. But I'll think more about this.
it would appear this PR makes code slower for no good reason other than to appear as "safe" as Annex K
I made this PR because I don't want to append the final \0 for mmaped files.
\0 can appear inside a normal file. The current parser stop parsing at first \0. Maybe it should be an error, or ignore the \0?
Not all strings are nul-terminated. Raw text files are a common case, as mentioned above. And this comes up a lot in FFI: I’ve worked with several languages whose strings are passed into C as unterminated (pointer, length) pairs. IIRC both Go and Python do this. And then there’s C++’s std::string_view
which can hold arbitrary substrings.
If the overhead of calling strlen is a problem, that could be removed; instead, just keep the nul check, and set the max len to a huge value in the existing function so the nul byte will be hit first. (There’s no valid reason to have a 00 byte in either ASCII or UTF-8 text.)
On the other hand, I see the string API as mostly for debugging, so is it really important to save the overhead of copying the text into a nul-terminated buffer?
If the overhead of calling strlen is a problem, that could be removed; instead, just keep the nul check, and set the max len to a huge value in the existing function so the nul byte will be hit first. (There’s no valid reason to have a 00 byte in either ASCII or UTF-8 text.)
I have applied the suggestion in the last commit.