polars
polars copied to clipboard
Remove fast-float crate?
Description
The Eisel-Lemire algorithm at the heart of the fast-float crate has been merged into std.
https://github.com/rust-lang/rust/pull/86761
Does polars still need fast-float?
See https://github.com/aldanor/fast-float-rust/issues/34
I can run a quick benchmark today to see if there's a significant performance difference if we remove the dependency, and if not we can just remove it.
A bit late, I did do the benchmark and there was essentially no difference between the stdlib parse and fast_float
parse.
There is however another problem:
unsafe {
// SAFETY: this is technically not ok, but the first line of
// the stdlib parse implementation calls s = s.as_bytes()...
std::str::from_utf8_unchecked(bytes).parse().ok()
}
We directly want to parse bytestrings which fast_float
supports but the stdlib doesn't. So either we do this thing which isn't really kosher, or keep using the stdlib. @ritchie46 Thoughts?
Relevant stdlib proposal: https://github.com/rust-lang/libs-team/issues/287.
I believe the bytes is the reason we use fast-float indeed. I'd say let's wait.
To summarize, I understand that polars still depends, and only depends on fast_float
because the std library does not yet support (being discussed, but last update four months ago) parsing bytestrings, something fast_float
offers, and polars relies on (here and here, for instance, if I read the code correctly)?
Why am I asking? I'm interested because fast_float
crate has led to some friction in the environment where I'm contributing to adopting polars, and if polars' dependency on fast_float
went away, so would this friction. Knowing that I got the summary right would help me to figure out what to do next.
Yes, if std
would accept bytes we could remove fast floats. We parse floats directly from bytes that may have invalid utf8. Having to do utf8 checking first makes it more costly than needed. Why is the fast_float
dep a problem?
I went through a code review process when importing dependencies of Polars. Some questions/cocnerns popped up about fast_float
, around some of the unsafe
uses in fast_float
. This then prompted me to find out about the state of this issue: since if the fast_float
dependency were removed, we would need to look no further. Thanks for helping me understand.
@ritchie46 would it be acceptable to make fast-float
an optional dependency and use checked from_utf8
as the alternative? This way people can have safe-but-slightly-slower code.
Or even a manual impl copied from the stdlib (i think this might be too much code)
To my knowledge fast-float
is sound. There are many dependencies that use unsafe
and to create feature flags and alternatives for all of them would be a maintenance nightmare. What is it specific on fast-float
? Is there something unsound?
Someone needs to file an issue about it with detail (sorry, IMO that should have happened first), but basically in the process of an unsafe audit we discovered that fast-float
is rather careless with unsafe: it checks safety invariants with debug_assert!()
, invariants that might be upheld elsewhere in the codebase but not in a proximate, obvious manner. It's unclear to me at this time if there is any actual UB in the crate, but it is a bit of an outsize burden on a dependency tree to have crates that have nonlocal safety invariants (since you have to check them again every time you update)
Filed an issue with some detail: https://github.com/aldanor/fast-float-rust/issues/37
Doing a proper safety audit of that code will take much longer and is not really worth it for me, at least.
I'm closing this issue because it is not actionable right now. When the Rust standard library provides a method to directly parse bytes into a float we can switch to that.