bstr
bstr copied to clipboard
A from_str(_radix) analogue
Working with mostly-UTF-8 byte strings, the two biggest things my current project is missing are substring matching (which this crate provides) and integer parsing (which std
provides only for str
).
This is easily worked around by calling str::from_utf8
first, because if the input is not valid UTF-8 then it would never make it through a from_str
/parse
or from_str_radix
anyway.
But it would be convenient to have these functions in bstr
as well, to avoid the overhead from that redundant from_utf8
call.
Yeah, I do kind of think this sort of thing is in scope. I've certainly wanted functions like this in the past as well.
Adding these sorts of routines for integers seems pretty reasonable, but this does open the door for wanting analogous parsing routines for floating point. That is definitely not a road I want to go down. It looks like the lexical project is already meeting this need anyway: https://docs.rs/lexical-core/0.7.4/lexical_core/
So I'd probably want to draw a line in the sand here: integer parsing APIs are okay, but not floats.
This feels like a good "first issue" to me. In particular, routines could be added that are implemented by first converting the input to a &str
and then reusing std's integer parsing. This isn't optimal, but we can always optimize it later.
For anyone who wants to work on this, it would probably be good to start by posting an API proposal in this issue. The API is probably the hardest part of this issue to be honest. For example, do we want a new sub-module with a bunch of concrete functions for each integer? Do we want to define a trait that we implement for all of the integer APIs? I kind of lean towards the latter.
API Proposal: impl FromBStrRadix for {integer}
Rationale
Currently, we call e.g. i32::from_str_radix(src, radix)
.
By analogy, it seems correct to call i32::from_bstr_radix(src, radix)
.
Defining a new trait with that function and then implementing it for the integers makes that work (at the small cost of importing the trait).
It seems reasonable to name the trait after the function.
Implementation
I'm not quite ready to make a PR but I do have a fork up: https://github.com/alexjago/bstr/commit/34877866fa9502881d33e4333daede44f3cb476c
What's in it: a new file src/from_bstr_radix.rs
with the trait, impl and tests; plus two lines of changes to src/lib.rs
to use the module and export the trait.
There are a couple of slight differences to the stdlib function.
- Obviously, it takes
src: &BStr
rather thansrc: &str
. Question: should this be anAsRef<[u8]>
instead? -
radix
is the output type rather than alwaysu32
(easily changeable if desired). - The error type is slightly different. The stdlib returns
Err(ParseIntError {kind: IntErrorKind})
but sincekind
is private and there's no constructor, I just returnErr(IntErrorKind::{variant})
directly. IMHO this is actually more useful.
edit to add: well, IntErrorKind
would be more useful if it didn't require 1.55. Without it, might as well just wrap the std
integer parsing.
@alexjago Thanks! Nice work. Some responses:
- It should definitely take
AsRef<[u8]>
, not&BStr
.&[u8]
is what all of the APIs inbstr
are defined on. - I'm not sure why
radix
would be tied to the output type?u32
feels right to me here. - The error type should definitely not be just a bare
IntErrorKind
. We'll want to define our ownParseIntError
. We can expose thekind
through a method, just like std does.
The "bstr" in the FromBStrRadix
name bugs me a bit, particularly since we're not taking a BStr
but a &[u8]
. Probably a better name is FromBytesRadix::from_bytes_radix
. I don't love it, but I like it better than FromBStrRadix
.
Feel free to submit a PR and we can work through the actual code itself.
Requiring Rust 1.55 is fine. I'm going to bump the MSRV of bstr
up higher. See the 1.0 issue.
There's another API question which occurred to me around panicking and allocating.
The stdlib function (and therefore this one, for now) has an assert
on the radix value... which includes the erroneous value in the error message.
That's an allocation, right? I don't think any other part of the functionality really needs to allocate.
So it's probably fixable with some clever #[cfg]
work (non-alloc
could just... not have a message).
But it also might be an option to redesign the API so that out-of-range radices are a recoverable error, presumably with an additional variant of IntErrorKind
. Obviously if we do that then we can't just re-export the stdlib enum.
You're saying an assert is an alloc? No, it isn't.
Let's stick with how std does things I think wherever we can. The radix is almost always going to be a static input determined by the programmer, so it seems sensible to panic for invalid radix values.
Good to know that asserts don't alloc. (I got this misunderstanding because the format!
macro returns String, but I suppose the underlying machinery just writes to some buffer...)
And yes, I agree that sticking to how std does things makes life more straightforward for everyone.
Well, the assert might alloc when the assert fails. Sure. But that's okay.