datatable
datatable copied to clipboard
sort na_position="remove" crashes
- Did you find a bug in datatable, or maybe the bug found you?
I found a bug.
- How to reproduce the bug?
wget https://raw.githubusercontent.com/h2oai/db-benchmark/cf255c174647ac437aa7a85751f6e65732a3cb9a/_data/groupby-datagen.R
Rscript groupby-datagen.R 1e8 1e2 5 0
## activate your pydt env
source ~/git/db-benchmark/pydatatable/py-pydatatable/bin/activate
python
import datatable as dt
from datatable import f, sum, by
x = dt.fread('G1_1e8_1e2_5_0.csv', na_strings=[''])
print(x.nrows, flush=True)
#100000000
ans = x[:2, {"largest2_v3": f.v3}, by(f.id6), sort(-f.v3, na_position="remove")]
#Segmentation fault
It is not reproducible on 1e7 data size, but on 1e8+.
- Your environment?
pydt 9bc7d05 python 3.6.7 ubuntu 16.04