polars icon indicating copy to clipboard operation
polars copied to clipboard

`Series.set_at_idx` requires NumPy

Open stinodego opened this issue 3 years ago • 1 comments

Originally discussed in #3890

Polars no longer has NumPy as a dependency. But interop with NumPy obviously still requires NumPy to be installed.

There is one more case where NumPy is required, which is for Series.set_at_idx, and by extension, Series.__setitem__. It would be great if we can adjust this function to work without NumPy.

Ritchie's left some pointers for this in the original thread:

We must update the function at the rust side to accept a Series. That series should then be cast on to IdxSize and checked for null values. If null values present we should throw a ValueError. We could update the macro. Instead of idx: &[u32] we can accept a PySeries.

macro_rules! impl_set_at_idx {
    ($name:ident, $native:ty, $cast:ident, $variant:ident) => {
        fn $name(series: &Series, idx: &[u32], value: Option<$native>) -> Result<Series> {
            let ca = series.$cast()?;
            let new = ca.set_at_idx(idx.iter().map(|idx| *idx as usize), value)?;
            Ok(new.into_series())
        }

        #[pymethods]
        impl PySeries {
            pub fn $name(&self, idx: &PyArray1<u32>, value: Option<$native>) -> PyResult<Self> {
                let idx = unsafe { idx.as_slice().unwrap() };
                let series = $name(&self.series, &idx, value).map_err(PyPolarsErr::from)?;
                Ok(Self::new(series))
            }
        }
    };
}

Eventually I want to remove this macro and re-implement setting with our new mutable API and proper python conversions, but that is not an easy "first issue", so I'd recommend to first make it work with the old approach.

Originally posted by @ritchie46 in https://github.com/pola-rs/polars/issues/3890#issuecomment-1175984672

stinodego avatar Jul 31 '22 16:07 stinodego

For arrays backed by numeric types (numbers and logical types like dates), this has been done in #4121.

The changes in the dispatch to the implementation is another pointer of how we should achieve this for the other datatypes. The FFI dictionary lookup should not be used anymore. That's an archaic solution dating from the time I did not understand the pyo3 lib good enough.

ritchie46 avatar Jul 31 '22 16:07 ritchie46

scatter no longer uses numpy

stinodego avatar Jan 23 '24 13:01 stinodego