polars Stack overflow on joinon high cpu count machine

Checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

NIDS = 500
NAXISA = 1000
NVALS = 100
NAXISB = 5000

ids = pl.Series("id", range(NIDS), pl.Int64)
vals = pl.Series("val", range(NVALS), pl.Int64)
extraa = pl.Series("axis", range(NAXISA), pl.Int64)
extrab = pl.Series("axis", range(NAXISB), pl.Int64)

a = pl.DataFrame(ids).join(pl.DataFrame(extraa), how='cross').join(pl.DataFrame(vals), how='cross')
b = pl.DataFrame(ids).join(pl.DataFrame(extrab), how='cross')

print("a={:,} b={:,}".format(len(a),len(b)))
for i in range(30):
  print(i)
  a.join(b,on=['id','axis'])

Log output

join parallel: true
CROSS join dataframes finished
join parallel: true
CROSS join dataframes finished
join parallel: true
CROSS join dataframes finished
'a=50,000,000 b=2,500,000'
0
join parallel: true
INNER join dataframes finished
1
join parallel: true
INNER join dataframes finished
2
join parallel: true
INNER join dataframes finished
3
join parallel: true
INNER join dataframes finished
4
join parallel: true
INNER join dataframes finished
5
join parallel: true
INNER join dataframes finished
6
join parallel: true
INNER join dataframes finished
7
join parallel: true
INNER join dataframes finished
8
join parallel: true
Segmentation fault (core dumped)```

Issue description

Joining two biggish dataframes (50 million and 2.5 million) occasionally is stack overflowing somewhere deep in rayon.

Note this seems to happen only on my machine with 128 logical cores, and not my smaller machine.

Expected behavior

It does not crash

Installed versions

--------Version info---------
Polars:               0.20.10
Index type:           UInt32
Platform:             Linux-6.5.0-14-generic-x86_64-with-glibc2.35
Python:               3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.1
numpy:                1.26.1
openpyxl:             <not installed>
pandas:               2.1.2
pyarrow:              14.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

Feb 22 '24 14:02 DylanZA

Hmm.. Could you see in GDB where you are at that point? Maybe make a debug compilation. (I am on vacation and have poor internet, let alone a 128 core machine. :))

Feb 23 '24 00:02 ritchie46

Hmm.. Could you see in GDB where you are at that point? Maybe make a debug compilation. (I am on vacation and have poor internet, let alone a 128 core machine. :))

Here is a full gdb thread backtrace. compressed as it's 15MB uncompressed

gdb.txt.gz

Feb 23 '24 09:02 DylanZA

fwiw you can replicate the behaviour by setting RUST_MIN_STACK. On my 16 core machine I need to use 100000

Mar 10 '24 21:03 DylanZA

polars polars copied to clipboard

Stack overflow on joinon high cpu count machine

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

polars
polars copied to clipboard