polars icon indicating copy to clipboard operation
polars copied to clipboard

Remove fast-float crate?

Open jqnatividad opened this issue 6 months ago • 7 comments

Description

The Eisel-Lemire algorithm at the heart of the fast-float crate has been merged into std.

https://github.com/rust-lang/rust/pull/86761

Does polars still need fast-float?

See https://github.com/aldanor/fast-float-rust/issues/34

jqnatividad avatar Dec 13 '23 18:12 jqnatividad

I can run a quick benchmark today to see if there's a significant performance difference if we remove the dependency, and if not we can just remove it.

orlp avatar Dec 15 '23 13:12 orlp

A bit late, I did do the benchmark and there was essentially no difference between the stdlib parse and fast_float parse.

There is however another problem:

        unsafe {
            // SAFETY: this is technically not ok, but the first line of
            // the stdlib parse implementation calls s = s.as_bytes()...
            std::str::from_utf8_unchecked(bytes).parse().ok()
        }

We directly want to parse bytestrings which fast_float supports but the stdlib doesn't. So either we do this thing which isn't really kosher, or keep using the stdlib. @ritchie46 Thoughts?

orlp avatar Dec 21 '23 15:12 orlp

Relevant stdlib proposal: https://github.com/rust-lang/libs-team/issues/287.

orlp avatar Dec 21 '23 15:12 orlp

I believe the bytes is the reason we use fast-float indeed. I'd say let's wait.

ritchie46 avatar Dec 21 '23 17:12 ritchie46

To summarize, I understand that polars still depends, and only depends on fast_float because the std library does not yet support (being discussed, but last update four months ago) parsing bytestrings, something fast_float offers, and polars relies on (here and here, for instance, if I read the code correctly)?

Why am I asking? I'm interested because fast_float crate has led to some friction in the environment where I'm contributing to adopting polars, and if polars' dependency on fast_float went away, so would this friction. Knowing that I got the summary right would help me to figure out what to do next.

jeadorf avatar Mar 25 '24 14:03 jeadorf

Yes, if std would accept bytes we could remove fast floats. We parse floats directly from bytes that may have invalid utf8. Having to do utf8 checking first makes it more costly than needed. Why is the fast_float dep a problem?

ritchie46 avatar Mar 25 '24 18:03 ritchie46

I went through a code review process when importing dependencies of Polars. Some questions/cocnerns popped up about fast_float, around some of the unsafe uses in fast_float. This then prompted me to find out about the state of this issue: since if the fast_float dependency were removed, we would need to look no further. Thanks for helping me understand.

jeadorf avatar Mar 25 '24 19:03 jeadorf

@ritchie46 would it be acceptable to make fast-float an optional dependency and use checked from_utf8 as the alternative? This way people can have safe-but-slightly-slower code.

Or even a manual impl copied from the stdlib (i think this might be too much code)

Manishearth avatar Mar 29 '24 17:03 Manishearth

To my knowledge fast-float is sound. There are many dependencies that use unsafe and to create feature flags and alternatives for all of them would be a maintenance nightmare. What is it specific on fast-float? Is there something unsound?

ritchie46 avatar Mar 31 '24 08:03 ritchie46

Someone needs to file an issue about it with detail (sorry, IMO that should have happened first), but basically in the process of an unsafe audit we discovered that fast-float is rather careless with unsafe: it checks safety invariants with debug_assert!(), invariants that might be upheld elsewhere in the codebase but not in a proximate, obvious manner. It's unclear to me at this time if there is any actual UB in the crate, but it is a bit of an outsize burden on a dependency tree to have crates that have nonlocal safety invariants (since you have to check them again every time you update)

Manishearth avatar Apr 01 '24 21:04 Manishearth

Filed an issue with some detail: https://github.com/aldanor/fast-float-rust/issues/37

Doing a proper safety audit of that code will take much longer and is not really worth it for me, at least.

Manishearth avatar Apr 01 '24 21:04 Manishearth

I'm closing this issue because it is not actionable right now. When the Rust standard library provides a method to directly parse bytes into a float we can switch to that.

orlp avatar Apr 17 '24 15:04 orlp