bstr A from_str(_radix) analogue

Working with mostly-UTF-8 byte strings, the two biggest things my current project is missing are substring matching (which this crate provides) and integer parsing (which std provides only for str).

This is easily worked around by calling str::from_utf8 first, because if the input is not valid UTF-8 then it would never make it through a from_str/parse or from_str_radix anyway.

But it would be convenient to have these functions in bstr as well, to avoid the overhead from that redundant from_utf8 call.

Jun 21 '20 21:06 rpjohnst

Yeah, I do kind of think this sort of thing is in scope. I've certainly wanted functions like this in the past as well.

Adding these sorts of routines for integers seems pretty reasonable, but this does open the door for wanting analogous parsing routines for floating point. That is definitely not a road I want to go down. It looks like the lexical project is already meeting this need anyway: https://docs.rs/lexical-core/0.7.4/lexical_core/

So I'd probably want to draw a line in the sand here: integer parsing APIs are okay, but not floats.

Jun 22 '20 15:06 BurntSushi

This feels like a good "first issue" to me. In particular, routines could be added that are implemented by first converting the input to a &str and then reusing std's integer parsing. This isn't optimal, but we can always optimize it later.

For anyone who wants to work on this, it would probably be good to start by posting an API proposal in this issue. The API is probably the hardest part of this issue to be honest. For example, do we want a new sub-module with a bunch of concrete functions for each integer? Do we want to define a trait that we implement for all of the integer APIs? I kind of lean towards the latter.

Jul 05 '22 23:07 BurntSushi

API Proposal: `impl FromBStrRadix for {integer}`

Rationale

Currently, we call e.g. i32::from_str_radix(src, radix).

By analogy, it seems correct to call i32::from_bstr_radix(src, radix).

Defining a new trait with that function and then implementing it for the integers makes that work (at the small cost of importing the trait).

It seems reasonable to name the trait after the function.

Implementation

I'm not quite ready to make a PR but I do have a fork up: https://github.com/alexjago/bstr/commit/34877866fa9502881d33e4333daede44f3cb476c

What's in it: a new file src/from_bstr_radix.rs with the trait, impl and tests; plus two lines of changes to src/lib.rs to use the module and export the trait.

There are a couple of slight differences to the stdlib function.

Obviously, it takes src: &BStr rather than src: &str. Question: should this be an AsRef<[u8]> instead?
radix is the output type rather than always u32 (easily changeable if desired).
The error type is slightly different. The stdlib returns Err(ParseIntError {kind: IntErrorKind}) but since kind is private and there's no constructor, I just return Err(IntErrorKind::{variant}) directly. IMHO this is actually more useful.

edit to add: well, IntErrorKind would be more useful if it didn't require 1.55. Without it, might as well just wrap the std integer parsing.

Jul 19 '22 23:07 alexjago

@alexjago Thanks! Nice work. Some responses:

It should definitely take AsRef<[u8]>, not &BStr. &[u8] is what all of the APIs in bstr are defined on.
I'm not sure why radix would be tied to the output type? u32 feels right to me here.
The error type should definitely not be just a bare IntErrorKind. We'll want to define our own ParseIntError. We can expose the kind through a method, just like std does.

The "bstr" in the FromBStrRadix name bugs me a bit, particularly since we're not taking a BStr but a &[u8]. Probably a better name is FromBytesRadix::from_bytes_radix. I don't love it, but I like it better than FromBStrRadix.

Feel free to submit a PR and we can work through the actual code itself.

Jul 20 '22 00:07 BurntSushi

Requiring Rust 1.55 is fine. I'm going to bump the MSRV of bstr up higher. See the 1.0 issue.

Jul 20 '22 00:07 BurntSushi

There's another API question which occurred to me around panicking and allocating.

The stdlib function (and therefore this one, for now) has an assert on the radix value... which includes the erroneous value in the error message.

That's an allocation, right? I don't think any other part of the functionality really needs to allocate.

So it's probably fixable with some clever #[cfg] work (non-alloc could just... not have a message).

But it also might be an option to redesign the API so that out-of-range radices are a recoverable error, presumably with an additional variant of IntErrorKind. Obviously if we do that then we can't just re-export the stdlib enum.

Jul 21 '22 02:07 alexjago

You're saying an assert is an alloc? No, it isn't.

Let's stick with how std does things I think wherever we can. The radix is almost always going to be a static input determined by the programmer, so it seems sensible to panic for invalid radix values.

Jul 21 '22 10:07 BurntSushi

Good to know that asserts don't alloc. (I got this misunderstanding because the format! macro returns String, but I suppose the underlying machinery just writes to some buffer...)

And yes, I agree that sticking to how std does things makes life more straightforward for everyone.

Jul 22 '22 06:07 alexjago

Well, the assert might alloc when the assert fails. Sure. But that's okay.

Jul 22 '22 09:07 BurntSushi

bstr bstr copied to clipboard

A from_str(_radix) analogue

API Proposal: impl FromBStrRadix for {integer}

Rationale

Implementation

bstr
bstr copied to clipboard

API Proposal: `impl FromBStrRadix for {integer}`