Wes McKinney
Wes McKinney
Yeah, having checked/unchecked functions is probably the easiest thing. You can use C++ exceptions for errors that you want to propagate to the user, but not for routine internal "failures"...
We have developed a parallel line-delimited JSON reader in Apache Arrow, see https://github.com/apache/arrow/blob/master/python/pyarrow/_json.pyx
"First class" here means "not implemented using Python lists". You can interpret any array of type `T` as `Array[T]` by adding an array of offsets that encode size and position....
Yeah, the idea behind an "expression VM" is similar to the design of APL interpreters. This is a bigger topic than this issue, but normal pandas operations would be implemented...
I'm -0 on this because these tools are NumPy-centric and do not have good support for non-numeric data.
I honestly might even go so far as disabling implicit broadcasting in favor of `df.add(series, axis='index')`. Perennial source of problems.
@datnamer either way, pandas needs to have its own metadata implementation (see the logical/physical decoupling discussion in https://pydata.github.io/pandas-design/internal-architecture.html#logical-types-and-physical-storage-decoupling). We do not want to delegate metadata details to a third party...
The implicit casting behavior that exists now (really just to permit nulls to be inserted into dtypes that don't support them) makes me super uncomfortable. I guess I would need...
This problem also extends to other analytics, like `value_counts`: ``` In: s = pd.Series([1, 2, np.nan, 1, 1, 2, np.nan]) s.value_counts() Out: 1.0 3 2.0 2 dtype: int64 ``` Here,...
I'm +1 on [36] being NA. [33] is probably correct. At minimum it would be useful to document this behavior carefully.