fastnumbers icon indicating copy to clipboard operation
fastnumbers copied to clipboard

Proposal: Do not raise an exception on None

Open argenisleon opened this issue 4 years ago • 4 comments

Hi @SethMMorton,

It could be possible for fastnumbers to handle None instead of rising and exception?

import fastnumbers
fastnumbers.fast_float(None)
TypeError: float() argument must be a string or a number, not 'NoneType'

Maybe return None?

argenisleon avatar Jun 02 '20 21:06 argenisleon

Can you explain your use case?

SethMMorton avatar Jun 07 '20 00:06 SethMMorton

I am using fastnumbers to detect the number of float in a Dask series. This series can have millions of element and it can have n None elements. Using try-except to handle the exception can have a big impact on the performance.

argenisleon avatar Jun 07 '20 18:06 argenisleon

If you are trying to just detect floats, would not isfloat be a better choice? But I'm guessing there is more to it than you describe.

Since it sounds like you are using a Pandas-like structure, it would probably be better to do something roughly equivalent to s[s == None] = float("nan"). Even if this functionality is supported, using a vectorized methodology should outperform fastnumbers.

I'm not saying this functionality would not be useful, I just am wondering if the fact that fastnumbers is not vectorized will actually be a hindrance to the types of manipulations you are trying to optimize.

Before you ask, I would not be against having fastnumbers be vectorized, but I simply do not have the time to implement that myself - I would accept a pull request enabling it, though. But that's another issue that would need to be filed, separate from this issue.

SethMMorton avatar Jun 08 '20 05:06 SethMMorton

Yes, I am trying to convert a string pandas series to numbers. For the record in pandas 1.0.1 I tested to_numeric to convert string to float and fast numbers seems to be almost 2x faster.

import pandas as pd
pdf= pd.read_csv("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/crime.csv")

%timeit pdf["OFFENSE_CODE"].map(fast_float)
86.2 ms ± 2.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit pd.to_numeric(pdf["OFFENSE_CODE"], errors="coerce")
162 ms ± 7.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

BTW, I would be amazing If we can vectorize fast_numbers. I would be more than happy to contribute.

argenisleon avatar Jun 25 '20 17:06 argenisleon

BTW, I would be amazing If we can vectorize fast_numbers. I would be more than happy to contribute.

@argenisleon I realize so much time has passed you no longer need/want this, but fastnumbers is now vectorized.

SethMMorton avatar Feb 26 '23 04:02 SethMMorton