polars
polars copied to clipboard
ColumnNotFoundError on concat of LazyFrames; works fine if collected first
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Sadly, the construction of these dataframes is long and complex, so I'll have to figure out how to create a minimum reproducible example. In the mean time, I'm hoping someone has an intuition about the problem.
If there's any other attributes on Polars objects that would narrow this down, please let me know.
assert type(df1) == type(df2) == pl.LazyFrame
assert df1.schema == df2.schema
pl.concat([df1.collect(), df2.collect()]) # Works fine if collected first.
pl.concat([df1, df2]).collect() # Fails with output below.
Log output
avg line length: 57.054688
std. dev. line length: 8.579314
initial row estimate: 2116
no. of chunks: 8 processed by: 8 threads.
file < 128 rows, no statistics determined
no. of chunks: 1 processed by: 1 threads.
file < 128 rows, no statistics determined
no. of chunks: 1 processed by: 1 threads.
dataframe filtered
dataframe filtered
keys/aggregates are not partitionable: running default HASH AGGREGATION
avg line length: 117.78516
std. dev. line length: 14.399422
initial row estimate: 10842
no. of chunks: 8 processed by: 8 threads.
found multiple sources; run comm_subplan_elim
join parallel: false
join parallel: false
dataframe filtered
dataframe filtered
UNION: `parallel=false` union is run sequentially
CACHE SET: cache id: 0
---------------------------------------------------------------------------
File ~/miniconda3/lib/python3.11/site-packages/polars/lazyframe/frame.py:1943, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager)
1940 if background:
1941 return InProcessQuery(ldf.collect_concurrently())
-> 1943 return wrap_df(ldf.collect())
ColumnNotFoundError: Level
Issue description
I have two LazyFrame
s created from different functions, but with identical schemas.
concat
works fine if I collect
them first. However, concat
on the LazyFrame
s fails.
The column Level
in the error was present during both LazyFrame
s' construction, but was dropped from both before the concat
.
Expected behavior
pl.concat([df1, df2]).collect()
produces the same result as pl.concat([df1.collect(), df2.collect()])
.
Installed versions
--------Version info---------
Polars: 0.20.18
Index type: UInt32
Platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.35
Python: 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: <not installed>
connectorx: <not installed>
deltalake: <not installed>
fastexcel: <not installed>
fsspec: 2023.6.0
gevent: <not installed>
hvplot: <not installed>
matplotlib: 3.8.2
nest_asyncio: 1.5.6
numpy: 1.26.1
openpyxl: 3.1.2
pandas: 2.1.2
pyarrow: 11.0.0
pydantic: 2.3.0
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: <not installed>
xlsx2csv: 0.8.2
xlsxwriter: <not installed>
Does it work on polars==0.20.16
? I'm encountering a similar-but-possibly-different issue (still investigating/trying to make an MRE) and mine works on 0.20.16 but not later versions...
@aut0clave
You could try narrow down what optimization is causing it which may help e.g. .collect(projection_pushdown=False)
There was some significant work done after 0.20.16
which caused a few issues:
- https://github.com/pola-rs/polars/issues/15442#issuecomment-2032419969
Possibly related to #12917?
.collect(projection_pushdown=False)
@aut0clave @cmdlineluser Your intuition was correct. collect
after the concat
works fine in 0.20.16
(throws an error in 0.20.18
).
In 0.20.18
, .collect(projection_pushdown=False)
works fine.
@acowlikeobject Did you use struct.field
or struct[xxx]
?
@acowlikeobject Did you use
struct.field
orstruct[xxx]
?
@reswqa I didn't use structs at all in the construction of these dataframes.
This is not a p-high
issue if OP hasn't come with a reproducable example. We cannot do anything atm. @acowlikeobject can you try come up with an example. We need to be able to reproduce it to be able to fix it.
@ritchie46 Sadly, I'm not likely going to have time in the next few days. I'm happy to close this out, and maybe it'll get resolved when one of the related issues tagged in this thread are fixed.