polars icon indicating copy to clipboard operation
polars copied to clipboard

Schema error when vstack null typed colum with typed colum

Open kenshuri opened this issue 1 year ago • 0 comments

Checks

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
null_typed = pl.DataFrame({'str_col': [None], 'f64_col': None})
typed = pl.DataFrame({'str_col': ['Hello'], 'f64_col': 4.2})
stacked = null_typed.vstack(typed)

Log output

Traceback (most recent call last):
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-143-9da1bcb123b1>", line 1, in <module>
    stacked = null_typed.vstack(typed)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\venv\Lib\site-packages\polars\dataframe\frame.py", line 6504, in vstack
    return self._from_pydf(self._df.vstack(other._df))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.SchemaError: cannot extend/append Null with Utf8

Issue description

This issue is related with closed issue #11824 and feat #12771 merged in Python Polars 0.19.18.

So far, I solved this issue with exception handling:

try:
    stacked_ok = null_typed.vstack(typed)
except pl.exceptions.SchemaError:
    taxo_df = typed.vstack(null_typed)

Expected behavior

I expect it should be possible to vstack the two dataframe, modifying the schema of the initial null typed dataframe on the fly.

stacked_ok = null_typed.vstack(typed) shape: (2, 2) ┌─────────┬─────────┐ │ str_col ┆ f64_col │ │ --- ┆ --- │ │ str ┆ f64 │ ╞═════════╪═════════╡ │ Hello ┆ 4.2 │ │ null ┆ null │ └─────────┴─────────┘

Installed versions

--------Version info---------
Polars:               0.20.2
Index type:           UInt32
Platform:             Windows-10-10.0.19045-SP0
Python:               3.11.7 (tags/v3.11.7:fa7a6f2, Dec  4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]
----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
matplotlib:           3.8.2
numpy:                1.25.2
openpyxl:             <not installed>
pandas:               2.1.4
pyarrow:              12.0.1
pydantic:             2.5.3
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

kenshuri avatar Jan 11 '24 11:01 kenshuri