[R][C++] "negative buffer resize" error with arrow and dplyr in R
Hi everyone,
I was working on a large dataset with over 1 billion observations, stored in 3040 parquet files, with 41 variables. I read the data with open_dataset() and then wanted to apply dplyr functions:
individual_positions %>%
group_by(user_id) %>%
summarize(n_positions = n()) %>%
count(n_positions, sort = TRUE) %>%
collect()
individual_positions is my dataset, which consists of different job positions a user held throughout her career. I tried to understand the distribution of the number of all job positions that a user ever held. And I got the following error message:
Error in `compute.arrow_dplyr_query()`:
! Invalid: Negative buffer resize: -2147483584
Backtrace:
1. ... %>% collect()
3. arrow:::collect.arrow_dplyr_query(.)
4. arrow:::compute.arrow_dplyr_query(x)
I googled what "negative buffer resize" really means, but it was in vain. Can anyone please help me with the interpretation and provide any solutions? I know it's possible to process the dataset in SAS, but I'm an R lover and I really want to stick with it. Thanks a lot!
Important update here: Because the observations seemed to be randomly sliced into each parquet file (i.e., a user's position-level observations may be in different parquet files), I think when performing the group_by() functions, it has to pull all parquet files together, instead of picking only the necessary few ones. This might be overwhelming for the memory. Do I have to repartition the data? Thanks!
Component(s)
R, C++
Thanks for reporting this @chenyiwrites.
That error comes from somewhere in the Arrow C++ codebase, to do with memory allocation, though not somewhere I'm personally familiar with. I can ping one of the C++ folks to see if anything looks familiar. Which version of the package are you using?
Thanks for reporting this @chenyiwrites.
That error comes from somewhere in the Arrow C++ codebase, to do with memory allocation, though not somewhere I'm personally familiar with. I can ping one of the C++ folks to see if anything looks familiar. Which version of the package are you using?
Thanks for following up, Nic! I am using arrow 14.0.0.2 and tidyverse 2.0.0. Please see the detailed session info below:
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31 ucrt)
os Windows 11 x64 (build 22621)
system x86_64, mingw32
ui RStudio
language (EN)
collate Chinese (Simplified)_China.utf8
ctype Chinese (Simplified)_China.utf8
tz Asia/Shanghai
date 2024-02-03
rstudio 2023.12.1+402 Ocean Storm (desktop)
pandoc NA
─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
arrow * 14.0.0.2 2023-12-02 [1] CRAN (R 4.3.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.3.2)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.2)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.2)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2)
clipr 0.8.0 2022-02-22 [1] CRAN (R 4.3.2)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.2)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.2)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.2)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.2)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.2)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.2)
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.2)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.2)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.2)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.2)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.2)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.2)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.2)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.2)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.2)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.2)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.2)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.2)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2)
xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2)
[1] C:/Users/sibo/AppData/Local/R/win-library/4.3
[2] C:/Program Files/R/R-4.3.2/library
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
For anyone who had the same issue, I add duckdb::to_duckdb() and it ran successfully now!
individual_positions %>%
to_duckdb() %>%
group_by(user_id) %>%
summarize(n_positions = n()) %>%
count(n_positions, sort = TRUE) %>%
collect()
Hi @chenyiwrites, is there any chance this is a public-available dataset? I think the best next step here is for us to reproduce your issue. Re-partitioning your dataset may make the issue go away but what you ran into is definitely a bug and it needs fixing.
If you aren't able to share the dataset, could you give us a bit more information about the structure of it? Two bits of information would be useful:
- Your schema: Call
individual_positions$schemaand share the output. - Statistics on your files and their number of rows and row groups. The output of the below will be long so let us know if every value is the same or what the range/distribution is.
num_rows <- vapply(individual_positions$files, function(f) { ParquetFileReader$create(f)$num_rows }, 0, USE.NAMES=FALSE) num_row_groups <- vapply(individual_positions$files, function(f) { ParquetFileReader$create(f)$num_row_groups }, 0, USE.NAMES=FALSE)
Hi @amoeba,thanks for reaching out! Sorry that I did not check GitHub for a while.
I'm sorry that I'm unable to share the data at this moment, as it is still academic work in progress. But of course I can provide aggregate information about the dataset to help fix the bug.
- The schema:
Schema
user_id: int64
position_id: int64
company_raw: string
company_linkedin_url: string
company_cleaned: string
location_raw: string
region: string
country: string
state: string
metro_area: string
startdate: string
enddate: string
jobtitle_raw: string
mapped_role: string
job_category: string
role_k50: string
role_k150: string
role_k300: string
role_k500: string
role_k1000: string
remote_suitability: float
weight: float
description: string
start_mean_sampled_salary: double
end_mean_sampled_salary: double
seniority: int16
salary: float
rn: int16
rcid: int64
company_name: string
ultimate_parent_rcid: int64
ultimate_parent_company_name: string
onet_code: string
onet_title: string
ticker: string
exchange: string
cusip: string
naics: string
naics_desc: string
final_parent_factset_id: string
final_parent_factset_name: string
- Statistics on the parquet files:
> dput(num_rows)
c(720896, 421888, 438272, 454656, 503808, 376832, 380928, 700416,
491520, 430080, 458752, 405504, 471040, 425984, 425984, 598016,
425984, 569344, 434176, 327680, 425984, 368640, 425984, 446464,
425984, 417792, 483328, 454656, 413696, 479232, 491520, 487424,
430080, 315392, 586567, 413696, 450560, 438272, 491520, 430080,
421888, 409600, 424274, 442368, 430080, 421888, 438272, 507904,
667648, 458752, 479232, 425984, 421888, 430080, 409600, 344064,
430080, 421888, 413696, 503808, 606208, 442368, 421888, 435194,
450722, 434176, 421888, 700416, 430080, 692224, 405504, 249856,
483328, 495616, 434176, 397312, 438272, 634880, 421888, 421888,
425984, 418277, 618496, 434176, 421888, 421888, 417792, 413696,
430080, 425984, 442368, 409600, 466944, 413696, 385024, 430080,
581632, 425984, 24576, 413696, 425984, 471040, 466944, 421888,
413696, 425984, 425984, 463557, 548864, 417792, 421367, 411289,
200704, 512256, 458752, 430080, 421888, 425728, 421888, 421888,
413696, 425984, 425984, 405504, 421888, 409600, 458752, 94208,
356352, 65536, 438355, 365346, 425984, 453920, 671744, 409600,
416410, 458752, 540672, 442368, 413696, 446464, 434176, 421888,
434176, 421888, 258048, 417102, 700416, 401408, 466944, 422917,
458752, 430080, 425984, 421888, 417792, 438272, 425984, 421888,
393216, 380928, 421888, 446464, 298752, 438272, 435689, 614656,
454656, 430080, 434720, 417792, 442368, 496960, 430080, 421888,
368640, 491776, 440563, 430080, 433920, 466944, 417792, 438272,
427535, 413696, 552960, 761856, 425984, 442368, 523107, 430080,
421888, 425984, 421888, 434176, 593920, 372736, 421888, 430080,
430080, 415313, 430080, 458752, 425984, 376832, 442368, 425984,
417792, 430080, 413696, 442368, 385024, 344064, 421888, 466944,
540672, 373889, 438272, 438272, 413696, 442368, 413696, 536576,
421888, 638976, 467578, 421888, 487424, 413696, 405504, 475136,
544768, 356352, 425984, 143360, 430080, 430080, 438272, 282624,
442368, 425984, 717858, 385024, 413696, 430080, 425059, 356352,
499712, 428485, 421888, 385024, 339968, 425984, 421888, 602112,
421888, 421888, 425984, 417792, 438272, 380928, 643072, 422417,
212992, 372736, 412213, 466944, 434176, 430080, 434176, 638976,
417792, 450560, 405504, 643072, 417433, 425984, 466944, 421888,
442368, 491520, 430080, 425984, 356352, 430080, 421888, 425984,
442368, 421888, 421888, 540672, 446464, 438272, 462848, 421888,
421888, 425984, 491776, 429498, 422579, 409344, 393216, 372736,
421888, 405504, 389120, 397312, 446464, 421888, 430080, 614400,
413696, 422760, 520192, 471040, 425984, 470110, 417792, 434176,
118784, 421888, 421888, 442368, 421888, 417792, 413696, 757760,
421888, 450560, 421888, 430080, 425984, 520192, 417792, 425984,
430080, 573440, 421888, 430080, 430080, 421888, 413696, 446464,
389120, 462848, 647168, 479232, 446464, 438272, 532480, 417792,
430080, 557900, 421888, 425984, 454656, 516096, 425984, 442368,
479232, 421888, 409600, 569344, 425984, 425984, 433471, 446464,
421888, 417792, 397312, 438272, 368640, 749568, 421888, 458752,
401408, 466944, 421888, 593920, 438272, 421888, 405504, 327680,
434176, 471040, 548864, 425984, 442780, 389120, 389120, 430080,
425984, 421888, 425984, 598016, 438272, 454656, 552359, 704512,
393216, 401408, 397728, 413696, 401408, 487424, 544768, 409600,
356352, 479232, 422057, 462848, 425984, 421888, 421888, 413696,
446464, 420633, 368640, 458752, 417792, 397312, 516096, 389120,
401408, 503808, 413696, 413696, 434176, 425984, 446464, 548864,
425984, 356352, 442368, 572061, 442368, 421888, 614400, 466944,
420842, 458752, 438272, 462848, 303104, 573440, 430080, 421888,
446464, 417792, 413696, 417792, 434176, 430080, 528384, 442368,
425984, 425984, 520192, 438272, 425984, 409600, 425984, 421888,
434176, 421415, 421888, 450560, 425984, 430080, 237568, 376832,
418061, 479232, 430080, 430698, 348160, 421888, 413696, 430080,
434176, 569344, 409600, 425984, 475136, 438272, 192512, 401853,
372736, 525229, 499712, 454656, 479232, 122880, 417792, 425984,
421888, 708608, 430080, 430080, 491520, 421888, 462848, 503808,
425984, 425984, 278528, 475136, 434176, 434176, 425984, 425984,
425984, 471040, 438272, 425984, 434176, 397312, 466944, 421888,
696320, 421888, 483328, 463369, 348160, 331776, 393216, 419225,
397312, 548864, 450560, 458694, 421888, 512000, 413696, 425984,
540672, 425984, 450560, 417792, 420504, 417792, 430080, 417792,
428813, 131072, 409600, 452236, 464803, 421888, 415714, 430080,
348160, 442368, 425984, 734870, 544768, 446464, 425984, 532480,
417792, 507904, 462848, 442368, 438272, 577536, 393216, 430080,
430080, 385024, 449275, 438272, 749568, 454656, 417792, 401408,
453173, 364544, 417792, 417792, 442368, 315392, 430080, 428711,
400467, 409600, 417792, 438272, 430080, 417792, 417792, 589824,
466944, 409600, 393216, 491520, 413696, 405504, 466944, 430080,
483328, 450560, 417792, 204800, 540112, 417792, 454656, 434176,
700416, 417792, 421888, 458752, 434176, 421888, 425984, 430080,
413696, 589824, 417792, 421888, 77824, 421888, 425984, 430080,
389120, 430080, 425984, 425984, 454656, 425984, 577536, 491776,
425984, 421888, 397056, 421888, 417792, 430080, 425984, 425732,
438272, 438272, 458752, 471040, 417792, 397312, 401408, 507648,
425984, 417792, 467200, 458752, 454656, 430080, 425984, 434176,
393216, 454656, 421888, 155648, 425984, 425984, 450560, 430080,
421888, 438272, 516096, 471040, 425984, 643072, 430080, 430080,
430080, 417792, 434176, 446704, 512000, 421888, 249856, 434176,
466746, 438272, 385024, 425984, 430080, 458752, 544768, 485954,
446464, 626688, 417792, 401644, 425984, 456211, 421888, 454656,
413696, 450560, 20480, 430080, 430080, 499712, 430080, 740116,
413696, 360448, 421888, 430080, 512000, 561152, 421888, 466944,
482292, 425984, 424987, 430080, 450560, 446464, 430080, 417792,
28672, 431465, 409600, 446464, 491520, 602112, 417792, 458752,
454656, 438272, 430080, 425984, 417792, 430080, 430080, 430080,
491520, 69632, 393216, 471040, 483328, 425984, 507904, 425984,
424300, 291527, 434176, 630784, 458752, 446464, 409600, 398362,
413696, 413696, 434176, 421888, 540672, 417792, 421888, 413696,
438272, 466944, 462848, 425984, 483328, 462848, 454656, 434176,
442368, 421888, 479232, 421888, 471040, 434176, 425984, 393216,
417233, 421888, 413696, 487424, 438272, 495616, 397312, 581632,
397312, 413696, 496964, 417792, 421888, 499712, 352256, 416727,
253952, 421888, 438272, 434176, 417792, 418176, 462848, 430080,
425984, 412697, 425984, 298752, 421888, 434176, 602368, 434176,
421888, 413696, 409600, 655360, 417792, 417792, 409600, 503808,
434176, 520192, 45056, 601136, 410686, 375029, 405504, 442368,
425984, 417792, 450560, 413818, 466944, 417792, 430080, 348160,
438272, 430080, 462848, 466944, 425984, 430080, 425984, 430080,
417792, 430080, 434176, 434176, 454656, 532480, 352256, 319232,
446464, 425984, 454912, 425984, 434176, 438272, 655360, 430080,
409600, 98304, 749568, 540672, 462848, 446464, 434176, 425984,
471040, 397312, 437445, 290816, 428294, 434176, 491520, 430080,
430080, 434176, 425984, 421888, 434176, 425984, 462848, 434176,
425984, 413696, 421888, 425984, 559851, 421888, 618496, 430945,
409600, 466944, 421888, 393216, 409600, 421888, 491520, 278528,
393216, 524288, 405504, 425984, 421888, 409600, 417792, 442368,
430080, 434176, 65536, 475136, 450560, 417792, 446464, 446464,
440646, 438272, 430080, 425984, 458752, 417792, 421888, 421888,
737280, 405504, 458752, 454656, 430080, 548864, 421888, 421888,
417792, 417792, 434176, 630784, 102400, 761856, 94208, 417792,
446464, 418637, 401408, 573440, 483328, 425984, 28672, 368640,
412617, 442368, 421888, 425984, 425984, 385024, 417769, 380928,
446464, 421888, 421888, 426906, 421193, 430080, 425984, 503808,
421888, 450560, 442368, 421888, 438272, 442368, 454656, 409600,
499712, 532480, 487424, 45056, 561152, 151552, 430080, 647168,
421888, 397312, 77824, 438272, 421261, 651264, 376832, 454656,
73728, 434176, 442368, 540672, 421888, 421888, 651264, 430080,
430080, 73728, 425984, 421888, 655360, 393216, 495983, 643072,
614400, 436224, 657152, 466944, 421888, 461293, 438272, 483328,
458752, 450560, 397312, 409600, 427575, 430080, 512000, 352256,
405504, 413696, 458752, 425984, 413696, 438272, 425984, 425984,
430080, 425984, 458752, 457880, 421888, 434176, 417792, 434176,
409600, 487424, 405504, 454656, 393984, 507648, 458752, 388352,
495872, 479232, 417792, 438272, 471040, 421888, 441819, 425984,
503808, 253952, 548864, 446464, 405504, 419033, 454656, 503808,
409600, 380928, 405504, 421888, 470173, 421888, 421888, 458752,
425984, 425984, 425984, 438272, 417792, 417792, 450560, 440363,
389120, 430080, 417792, 536576, 507648, 364544, 393216, 430336,
446464, 426661, 415116, 425984, 578203, 417792, 577536, 81920,
311303, 221184, 398772, 462848, 417792, 438272, 421888, 417792,
368640, 430080, 409600, 421888, 516096, 434176, 430080, 420699,
430080, 450560, 425984, 417792, 417792, 402417, 417792, 430080,
458752, 430080, 454656, 561152, 671744, 483328, 409600, 544768,
593920, 602112, 475136, 192512, 131072, 364544, 421888, 421888,
425984, 569344, 420923, 417792, 422128, 446464, 430080, 430080,
430080, 425984, 438272, 421888, 446464, 417792, 425984, 454656,
421888, 417792, 422329, 425984, 425984, 417792, 528128, 479232,
344064, 393472, 479232, 585728, 409600, 421888, 471040, 790528,
364544, 589824, 131072, 98304, 237568, 462848, 446464, 438272,
421888, 471040, 417792, 448915, 446464, 417792, 417792, 442368,
413696, 413696, 442368, 434176, 425984, 430080, 421888, 413696,
462848, 438272, 430080, 446464, 429863, 372736, 733184, 442368,
487424, 409600, 581632, 411594, 462848, 413696, 421888, 503808,
499712, 262144, 417792, 401408, 385024, 417792, 461470, 441557,
434176, 421888, 417792, 413696, 421888, 413422, 409600, 430080,
434176, 434176, 430080, 428964, 425984, 430080, 430080, 417792,
426286, 425984, 339968, 589824, 600209, 458752, 434176, 436157,
430080, 643072, 409600, 634880, 131072, 626688, 282624, 421888,
45056, 417792, 430080, 475136, 434176, 430080, 430080, 442368,
434176, 454656, 434176, 430080, 475136, 438272, 421888, 430080,
434176, 471040, 421888, 438272, 413696, 430592, 421888, 724992,
409600, 516096, 442531, 368640, 421888, 516513, 548864, 434176,
208896, 438272, 438272, 430080, 425984, 237568, 438272, 311296,
409600, 421888, 454656, 438272, 417792, 499712, 430080, 438272,
442368, 425984, 421888, 430080, 417792, 421888, 434176, 479232,
434176, 409600, 438272, 692224, 417792, 434176, 499712, 454656,
581632, 425984, 543570, 364544, 409600, 450560, 430080, 479232,
452692, 438272, 425984, 458752, 135168, 437541, 413696, 430080,
413696, 487424, 430080, 421888, 421888, 413696, 421888, 438272,
417792, 385024, 446464, 466944, 413696, 413696, 421073, 712704,
438272, 430080, 614400, 483328, 425984, 503808, 425984, 548864,
417792, 680992, 446464, 434176, 380928, 431377, 425984, 425984,
643072, 425984, 454656, 458752, 413696, 466944, 405504, 442368,
425984, 430080, 442368, 438272, 434176, 425984, 413548, 462848,
614400, 454656, 425984, 417792, 417792, 425984, 417792, 389120,
425984, 589824, 417792, 552960, 339968, 606208, 442368, 430080,
507904, 368640, 438272, 503808, 454656, 438272, 602112, 65536,
434176, 49152, 405504, 430080, 450560, 393216, 450560, 417792,
425984, 421888, 421888, 430080, 515840, 451959, 413696, 454912,
425984, 432009, 438272, 417792, 430080, 475136, 434176, 425984,
389120, 389120, 773847, 421888, 778240, 466944, 409600, 389120,
638976, 434176, 577536, 114688, 426956, 110592, 413696, 413696,
450560, 417792, 430080, 440875, 421888, 430080, 434176, 425984,
430080, 430080, 421888, 515840, 393216, 417792, 463104, 421888,
442368, 417792, 499712, 421888, 487424, 430080, 286929, 212992,
592466, 626688, 417792, 475136, 491520, 450560, 435566, 368640,
409600, 319488, 425984, 440099, 450560, 450560, 419516, 479232,
438272, 436390, 498325, 471040, 430080, 434176, 487424, 425984,
446464, 430080, 720896, 430080, 454656, 380928, 589824, 430080,
487424, 425984, 540672, 200704, 360448, 479232, 413696, 552960,
495616, 446464, 426474, 317236, 443456, 503808, 425984, 417792,
138851, 425984, 372736, 434176, 405504, 430080, 450257, 446464,
423933, 417792, 450560, 417792, 401408, 458752, 598460, 413914,
434176, 479232, 434176, 417792, 409600, 434176, 592145, 450560,
528453, 607922, 315778, 610319, 552960, 360448, 430080, 303104,
418905, 577536, 430080, 425984, 462848, 417792, 446464, 438272,
417792, 430080, 446464, 442368, 422333, 536162, 411825, 421291,
434501, 401408, 454656, 421888, 425984, 593920, 471040, 420710,
446464, 425984, 745107, 528384, 524288, 483328, 266240, 409600,
487424, 528384, 507904, 131072, 417792, 512000, 421888, 425984,
413696, 430080, 425984, 491520, 413696, 417792, 421888, 413696,
421888, 434176, 421888, 430080, 430080, 421156, 421888, 438272,
417792, 487424, 458752, 434176, 434176, 393216, 372736, 409600,
585728, 557056, 692430, 421888, 354234, 451701, 450560, 647168,
397312, 135168, 405504, 421888, 450560, 434176, 421888, 434176,
425984, 450560, 454656, 425984, 417792, 475136, 417792, 425984,
425984, 425984, 425984, 458752, 425984, 581632, 438272, 413696,
466944, 425984, 446464, 438272, 663552, 491520, 524288, 421888,
425984, 118784, 413270, 393612, 413696, 483328, 69632, 454656,
176128, 462848, 434176, 458752, 389120, 413696, 434176, 438272,
434176, 430080, 421888, 430080, 451001, 434176, 430080, 430080,
429233, 438272, 417792, 421888, 638976, 421888, 417792, 450560,
421888, 671744, 434176, 573440, 446464, 487358, 487424, 397007,
262144, 602112, 233472, 466944, 425984, 417792, 438272, 434176,
421888, 425984, 442368, 421888, 430080, 455980, 425984, 425984,
413696, 405504, 421888, 439802, 446891, 430080, 475136, 421888,
430080, 442368, 421888, 565248, 530256, 327680, 450560, 454656,
442368, 471040, 495771, 413696, 557056, 290816, 540672, 135168,
413696, 349347, 421888, 446464, 528384, 417792, 430080, 434176,
425984, 425984, 425984, 417792, 479232, 446464, 405504, 438272,
421888, 421888, 458752, 425984, 425984, 435013, 446464, 438272,
360448, 421888, 667648, 409600, 339968, 401408, 446464, 507904,
425984, 524288, 503808, 487424, 364544, 409600, 438272, 434176,
466944, 425984, 430080, 413696, 442368, 454656, 409600, 425984,
401408, 434176, 417792, 405504, 434176, 431797, 466944, 430080,
425984, 417792, 466944, 413696, 430080, 417792, 581632, 561152,
491520, 421888, 421888, 449283, 425984, 548864, 503808, 430080,
114688, 331776, 434176, 462848, 434176, 427280, 466944, 413696,
417792, 540672, 425984, 409600, 491520, 430080, 417792, 413696,
430080, 458752, 450560, 442368, 413696, 483328, 434176, 413696,
536576, 430080, 557056, 700416, 761856, 421888, 479232, 376832,
425984, 593607, 315392, 430080, 311296, 409600, 163840, 442368,
450560, 430080, 438272, 417792, 421888, 430080, 425984, 430080,
425984, 425064, 430080, 421888, 421888, 421888, 421888, 421888,
430080, 421888, 466944, 361864, 380928, 466944, 425984, 507904,
458752, 487424, 434176, 413696, 397312, 422989, 532480, 430080,
536576, 425984, 430080, 425984, 446464, 544768, 466944, 421888,
454656, 479232, 430080, 421888, 425984, 425984, 430080, 405504,
442368, 421888, 434029, 421888, 418731, 430080, 430080, 434176,
401408, 438272, 413696, 364544, 475136, 356352, 298752, 430080,
483328, 463104, 585728, 438272, 438272, 421888, 387301, 466747,
462848, 413696, 450560, 430080, 450560, 53248, 348160, 417792,
446464, 425984, 434176, 417792, 436428, 425984, 430080, 434176,
417792, 438272, 475136, 417792, 434176, 348160, 430080, 450560,
565248, 466944, 421888, 425984, 671744, 425984, 461031, 520192,
458752, 421888, 442368, 397312, 413696, 487424, 430080, 450560,
430080, 532480, 434176, 458752, 40960, 454656, 430080, 479232,
540672, 413696, 436192, 499712, 438272, 409600, 425984, 393216,
413696, 495360, 421888, 413696, 422144, 434176, 430080, 421888,
493444, 425984, 591240, 445319, 421888, 417792, 475136, 295168,
453194, 471040, 454400, 470289, 360448, 425984, 409937, 438272,
561152, 421888, 430080, 475136, 364544, 425984, 434176, 491520,
413696, 425984, 417792, 417792, 425984, 421888, 425984, 417792,
507904, 413696, 430080, 389120, 434176, 430080, 450560, 626688,
352256, 458583, 416347, 582519, 466944, 540672, 450560, 421888,
233472, 421888, 482941, 425984, 311296, 421888, 438272, 421888,
401408, 417792, 421888, 499712, 458480, 421888, 421888, 421888,
442368, 311040, 417792, 438272, 401664, 429807, 417792, 425984,
430080, 430080, 561152, 671744, 450560, 167936, 398811, 540148,
450560, 419136, 417792, 434176, 385024, 421888, 483328, 421888,
281589, 471040, 425984, 438272, 462848, 475136, 421888, 412290,
430080, 413696, 450560, 479232, 430080, 433388, 430080, 425984,
667648, 409600, 421888, 405504, 421888, 417792, 475136, 503808,
430080, 452694, 389120, 462848, 446464, 431122, 475136, 438272,
569344, 417792, 434176, 421888, 577536, 462848, 421888, 36864,
524288, 443754, 475136, 487424, 421888, 421888, 435432, 430080,
417792, 389120, 425984, 417792, 457990, 417792, 450560, 459008,
417792, 417792, 421888, 352256, 434176, 491520, 667648, 421888,
438272, 458752, 591404, 425984, 475136, 352256, 430080, 12288,
462848, 348160, 417792, 430080, 417792, 430129, 490096, 512000,
417792, 520192, 483328, 380928, 8192, 434176, 421888, 405504,
364544, 425984, 438272, 524288, 430080, 446464, 417792, 425984,
413696, 434176, 409344, 569344, 241664, 590080, 425984, 451253,
413696, 434176, 425984, 455365, 446464, 421888, 315392, 491520,
421036, 454656, 389120, 393257, 430080, 471040, 479232, 425984,
348160, 413696, 405504, 450560, 393216, 417792, 462848, 524288,
450560, 331776, 394766, 421888, 430080, 454656, 421888, 417792,
409600, 306944, 434176, 438272, 532736, 438272, 434176, 401408,
430080, 376832, 442368, 442368, 434176, 159744, 421888, 430080,
484741, 487424, 405504, 434176, 602112, 417792, 450560, 421888,
446464, 425984, 446464, 753664, 434176, 385024, 438272, 413696,
434721, 471040, 430080, 679936, 434687, 425984, 81920, 409600,
630442, 405504, 423788, 528384, 442368, 430080, 454656, 442368,
434176, 286720, 401408, 425984, 462848, 428217, 430080, 446464,
421888, 472926, 417792, 425984, 573440, 417792, 421888, 598016,
414414, 589824, 430080, 425984, 528384, 319488, 434176, 421888,
463851, 462848, 409600, 417792, 549204, 425984, 413696, 561152,
499712, 421888, 450556, 94208, 417792, 507728, 466944, 413696,
159744, 421888, 421888, 462848, 421888, 461822, 417792, 385024,
577536, 446464, 417792, 468012, 487424, 413696, 421888, 413696,
430080, 413696, 417792, 430080, 413696, 430080, 765952, 425984,
425984, 475715, 434176, 475136, 475136, 421888, 520192, 397312,
421888, 424754, 516096, 421888, 425984, 20480, 425984, 339968,
450560, 405644, 385484, 462848, 544768, 397537, 421888, 481140,
434176, 425984, 540672, 348160, 425984, 225280, 430080, 438903,
458752, 425984, 450560, 417792, 425984, 425984, 413696, 438272,
541787, 425984, 602112, 409600, 450560, 471040, 389120, 409600,
544657, 397312, 434176, 433597, 405504, 427994, 430080, 413696,
655360, 417792, 333050, 622592, 421888, 483328, 557056, 425984,
421888, 425984, 441361, 446464, 483328, 430080, 405504, 430080,
421888, 479232, 417792, 430080, 430080, 442368, 745472, 430080,
417792, 421888, 425984, 421888, 540672, 368640, 417792, 57344,
593920, 557056, 458752, 430080, 442368, 430080, 385024, 413696,
442368, 425984, 638976, 421888, 450560, 405504, 421888, 507904,
417792, 442368, 360448, 438272, 442368, 405504, 418880, 466944,
430080, 487424, 450560, 495616, 425984, 557056, 425984, 421888,
421888, 450560, 421888, 389120, 442368, 605309, 45056, 446464,
425984, 466944, 565248, 421888, 417792, 523246, 323584, 421888,
434176, 462848, 430080, 430080, 413696, 421888, 450560, 421888,
421888, 425984, 385024, 425984, 430080, 438272, 425984, 425984,
417792, 344064, 446464, 432996, 425984, 425984, 442368, 489108,
325607, 515840, 417792, 487424, 418048, 442368, 417792, 418557,
73728, 430080, 494894, 450560, 450560, 90112, 438272, 438272,
471040, 454656, 438272, 422198, 413696, 421888, 417792, 425984,
307200, 434176, 421888, 421888, 442368, 425984, 430080, 421888,
454656, 421888, 450560, 462848, 425984, 417792, 405504, 417792,
503808, 425984, 430080, 450560, 425984, 344064, 417792, 434176,
434176, 389120, 442368, 306944, 421888, 315743, 405760, 421888,
487424, 434176, 344064, 475136, 466944, 430080, 430080, 524288,
425984, 454656, 102400, 417792, 458752, 385024, 430080, 434176,
425984, 421888, 425984, 417792, 610304, 471040, 435610, 466944,
622592, 421888, 434176, 450560, 413696, 647168, 430080, 471040,
28672, 462848, 425984, 425984, 434176, 478976, 385024, 524288,
372992, 421888, 4096, 425984, 425984, 434176, 487424, 425984,
454656, 155648, 438272, 417792, 425984, 413696, 438272, 458752,
466944, 356352, 417792, 679936, 413696, 421888, 413696, 417792,
421888, 421888, 434176, 466944, 503808, 466944, 430080, 32768,
528384, 421888, 447392, 417792, 593920, 468730, 422271, 446464,
430080, 364544, 421888, 434717, 425820, 342737, 451475, 438272,
434176, 438272, 28672, 425984, 425984, 466944, 450560, 438272,
446464, 462848, 532480, 425984, 430080, 430080, 430080, 360448,
438272, 430080, 421888, 499712, 430080, 423324, 126976, 458752,
577536, 450560, 421888, 466944, 430080, 655360, 417792, 446464,
581632, 552960, 425984, 425984, 430080, 421888, 438272, 446464,
421888, 385024, 421888, 393216, 438272, 475136, 421888, 90112,
352256, 516096, 438272, 430080, 468421, 417792, 393216, 524288,
417792, 430080, 331776, 413696, 548634, 450560, 356352, 438272,
430080, 458752, 421888, 413696, 430080, 503808, 425984, 430080,
159744, 434176, 327680, 438272, 442368, 586945, 425984, 425984,
430080, 425984, 393216, 425984, 425984, 360448, 421888, 438272,
614400, 421888, 425984, 442368, 299008, 356352, 499741, 446464,
417792, 237870, 417792, 441796, 466944, 454656, 413440, 446464,
430080, 425984, 425984, 421888, 569344, 421888, 425984, 434176,
356352, 421888, 434176, 430080, 430080, 417792, 438272, 331776,
417792, 413696, 552960, 417792, 532480, 417792, 434176, 442368,
425984, 425984, 438272, 421888, 425984, 454656, 421888, 331520,
413696, 417792, 557312, 430080, 466944, 421888, 421888, 442368,
421888, 417792, 425984, 540672, 420006, 425984, 409600, 430080,
434176, 417792, 417792, 417792, 471040, 417792, 466944, 425984,
378453, 381987, 425984, 544768, 729088, 446464, 421888, 529136,
425984, 557056, 434176, 413696, 167936, 425984, 360448, 466944,
417792, 667648, 446464, 409600, 425984, 438272, 53248, 532480,
434176, 421888, 16384, 425984, 439270, 425984, 430080, 431489,
398672, 442368, 421888, 430080, 417792, 430080, 460523, 331776,
421888, 450560, 706684, 421888, 450560, 491520, 421888, 430080,
548864, 741376, 430080, 45056, 421888, 507648, 471040, 417792,
577792, 430080, 425984, 413696, 421888, 491971, 421888, 544768,
339968, 212992, 425984, 442368, 417792, 358024, 430080, 417792,
409600, 360448, 446464, 425984, 420348, 421888, 667275, 417792,
454656, 450764, 434176, 474735, 491520, 430080, 466944, 98908,
425984, 442368, 516096, 458752, 286720, 434176, 450257, 417792,
450560, 425984, 421888, 434176, 286720, 425984, 483328, 430080,
446464, 417792, 446464, 159744, 425984, 495616, 425984, 460118,
413696, 407356, 630784, 499711, 425984, 479232, 446464, 434176,
431936, 500160, 430080, 126976, 458752, 409600, 475136, 724992,
380928, 425984, 458752, 430080, 421888, 417792, 286084, 451700,
336060, 413696, 442368, 352256, 413696, 421888, 430080, 417792,
438272, 401408, 665750, 417792, 479232, 376832, 442368, 401408,
417792, 442368, 421888, 425984, 499712, 454656, 417792, 651264,
401408, 417792, 81912, 614400, 479232, 479232, 438272, 487424,
450560, 552960, 417792, 425783, 454656, 446464, 421428, 434176,
401408, 352256, 483328, 442368, 446464, 415296, 425984, 430080,
430080, 241664, 446464, 425984, 421888, 663552, 421888, 430080,
434176, 487424, 421888, 491520, 417792, 442368, 180224, 475136,
425984, 430080, 417792, 421888, 372736, 417792, 427554, 417792,
684032, 483328, 548864, 57344, 622592, 442368, 446464, 335872,
573440, 438272, 430080, 421888, 454656, 438272, 483328, 495616,
425984, 487424, 421888, 450560, 446464, 446464, 425984, 499712,
430080, 417792, 303104, 360448, 425984, 417792, 327680, 438272,
442368, 552960, 417792, 385024, 421888, 430080, 643072, 528384,
757760, 421888, 385024, 425984, 438272, 450560, 446464, 446464,
528384, 532480, 479232, 425984, 417792, 610304, 413696, 438272,
577536, 450560, 430080, 487424, 421888, 405504, 122880, 425984
)
> dput(num_row_groups)
c(5, 7, 7, 7, 5, 6, 6, 5, 8, 7, 7, 4, 3, 7, 7, 8, 6, 4, 7, 3,
7, 3, 6, 4, 7, 6, 8, 6, 7, 7, 8, 8, 7, 2, 5, 3, 7, 7, 3, 7, 7,
7, 7, 5, 7, 7, 4, 8, 4, 7, 7, 7, 4, 7, 5, 2, 7, 7, 6, 4, 5, 7,
7, 7, 7, 7, 7, 8, 7, 5, 7, 3, 8, 5, 7, 6, 7, 7, 7, 7, 7, 7, 5,
6, 6, 7, 3, 7, 7, 7, 6, 7, 7, 7, 4, 7, 5, 7, 1, 7, 4, 7, 6, 7,
7, 7, 6, 4, 8, 6, 7, 4, 3, 4, 6, 7, 6, 4, 7, 7, 7, 7, 7, 4, 6,
5, 7, 1, 6, 1, 7, 6, 7, 5, 4, 7, 7, 7, 5, 5, 7, 7, 7, 7, 7, 7,
3, 7, 5, 5, 7, 7, 5, 7, 7, 7, 7, 7, 7, 7, 3, 3, 7, 7, 2, 7, 7,
4, 7, 7, 7, 7, 7, 7, 7, 7, 1, 4, 7, 5, 3, 7, 7, 7, 7, 5, 6, 5,
6, 6, 4, 7, 7, 7, 7, 7, 5, 6, 7, 7, 7, 7, 7, 4, 7, 6, 3, 7, 7,
7, 6, 7, 5, 2, 7, 7, 6, 4, 7, 7, 7, 7, 7, 5, 7, 6, 3, 7, 7, 7,
6, 5, 7, 3, 7, 2, 7, 7, 7, 2, 7, 7, 3, 6, 6, 7, 7, 6, 3, 7, 7,
6, 2, 7, 7, 3, 7, 7, 7, 6, 7, 4, 4, 7, 1, 3, 6, 5, 7, 7, 7, 5,
7, 7, 6, 4, 7, 7, 5, 7, 7, 8, 7, 7, 1, 4, 7, 7, 5, 7, 7, 5, 6,
7, 4, 7, 7, 7, 4, 7, 5, 4, 6, 6, 7, 6, 6, 7, 7, 7, 5, 5, 7, 5,
6, 7, 7, 4, 7, 7, 1, 7, 7, 7, 7, 7, 6, 5, 7, 7, 6, 7, 7, 4, 7,
7, 6, 5, 7, 7, 3, 7, 7, 7, 6, 7, 8, 4, 6, 7, 4, 7, 7, 8, 7, 7,
1, 7, 7, 7, 4, 7, 6, 5, 7, 7, 7, 7, 7, 7, 6, 7, 3, 5, 7, 7, 6,
7, 7, 7, 7, 7, 7, 2, 7, 7, 5, 7, 7, 6, 6, 7, 7, 7, 7, 7, 7, 7,
5, 4, 6, 5, 6, 7, 5, 7, 4, 7, 3, 6, 5, 6, 7, 7, 7, 7, 6, 7, 4,
6, 7, 5, 4, 6, 6, 5, 7, 7, 7, 7, 7, 8, 7, 6, 7, 5, 7, 7, 5, 7,
7, 7, 7, 7, 3, 5, 7, 7, 6, 7, 7, 7, 6, 7, 3, 7, 7, 7, 6, 7, 7,
4, 7, 7, 7, 7, 7, 7, 7, 7, 3, 2, 7, 7, 3, 7, 6, 7, 7, 7, 7, 5,
6, 4, 4, 7, 2, 6, 6, 3, 8, 7, 7, 1, 7, 7, 7, 6, 7, 7, 4, 7, 7,
8, 7, 6, 4, 7, 7, 7, 7, 7, 6, 4, 7, 7, 5, 6, 7, 7, 5, 7, 7, 6,
6, 3, 6, 7, 6, 6, 7, 7, 7, 4, 7, 7, 5, 7, 6, 7, 7, 7, 6, 7, 7,
2, 7, 7, 7, 7, 7, 7, 2, 7, 7, 5, 5, 7, 7, 5, 7, 8, 7, 7, 4, 4,
6, 6, 7, 4, 7, 7, 4, 7, 7, 5, 7, 6, 7, 7, 7, 3, 7, 6, 6, 6, 7,
7, 7, 7, 7, 5, 2, 6, 5, 5, 6, 5, 7, 6, 7, 7, 7, 1, 3, 7, 6, 5,
3, 6, 7, 5, 7, 7, 7, 6, 7, 2, 7, 7, 1, 7, 7, 7, 5, 7, 7, 6, 7,
7, 4, 4, 7, 5, 3, 7, 7, 7, 7, 3, 4, 7, 7, 6, 7, 6, 6, 5, 7, 7,
4, 6, 7, 7, 7, 5, 6, 7, 7, 3, 7, 7, 7, 7, 7, 7, 4, 4, 7, 3, 3,
7, 7, 7, 7, 5, 6, 7, 3, 5, 7, 7, 6, 7, 7, 7, 4, 7, 7, 5, 7, 7,
7, 6, 7, 7, 7, 7, 1, 7, 7, 6, 7, 4, 7, 2, 5, 7, 4, 3, 7, 7, 7,
7, 7, 7, 7, 6, 5, 6, 1, 7, 6, 5, 7, 3, 7, 7, 5, 7, 7, 7, 7, 7,
7, 7, 4, 1, 6, 5, 7, 7, 7, 7, 7, 1, 7, 6, 7, 7, 5, 5, 7, 7, 7,
7, 4, 7, 7, 7, 7, 7, 7, 6, 5, 7, 7, 3, 7, 7, 7, 6, 4, 7, 7, 4,
3, 7, 7, 6, 7, 7, 6, 5, 3, 7, 6, 6, 7, 7, 2, 7, 4, 7, 7, 7, 7,
7, 6, 7, 7, 5, 7, 2, 7, 7, 5, 7, 7, 7, 6, 4, 7, 7, 4, 2, 7, 6,
1, 3, 3, 5, 5, 7, 7, 7, 7, 7, 6, 7, 7, 3, 7, 7, 7, 7, 7, 7, 7,
7, 7, 6, 7, 7, 7, 4, 6, 2, 5, 7, 3, 7, 7, 6, 1, 7, 4, 2, 6, 5,
6, 6, 4, 7, 6, 6, 7, 3, 6, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 4, 7, 5, 4, 7, 3, 7, 6, 6, 7, 5, 3, 5, 4, 6, 7, 6, 6,
7, 6, 6, 7, 1, 7, 7, 6, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 4, 6, 4,
6, 7, 3, 7, 7, 7, 7, 7, 7, 2, 5, 1, 7, 6, 5, 5, 6, 7, 7, 1, 6,
7, 7, 7, 7, 7, 6, 7, 4, 7, 7, 7, 7, 7, 7, 7, 4, 7, 4, 3, 7, 4,
7, 7, 5, 7, 2, 7, 1, 2, 1, 7, 1, 6, 6, 2, 7, 7, 1, 6, 6, 2, 7,
7, 2, 7, 4, 1, 7, 7, 2, 7, 7, 1, 5, 4, 1, 4, 5, 1, 5, 7, 4, 7,
6, 6, 4, 1, 7, 7, 7, 5, 6, 6, 7, 7, 7, 6, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 6, 8, 6, 6, 2, 4, 6, 4, 4, 4, 7, 7, 6, 7, 5, 7, 5,
2, 6, 6, 6, 7, 7, 7, 6, 5, 6, 7, 7, 6, 6, 7, 7, 7, 7, 7, 7, 7,
7, 7, 6, 7, 7, 5, 4, 2, 4, 5, 4, 7, 7, 7, 6, 6, 8, 2, 2, 1, 5,
7, 6, 7, 7, 7, 6, 7, 5, 7, 7, 7, 7, 7, 7, 6, 7, 7, 7, 5, 7, 7,
7, 7, 4, 5, 4, 5, 5, 3, 8, 7, 7, 2, 2, 3, 4, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 4, 2, 2, 3,
5, 4, 6, 7, 7, 8, 6, 8, 1, 1, 2, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7,
6, 7, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 2, 4, 4, 5, 6, 6, 7,
7, 7, 7, 4, 3, 3, 7, 4, 5, 7, 7, 7, 7, 7, 6, 7, 7, 7, 5, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 2, 5, 5, 5, 6, 3, 7, 7, 4, 7, 1, 6,
2, 7, 1, 7, 7, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6,
7, 7, 7, 7, 5, 6, 5, 7, 2, 6, 7, 4, 7, 2, 6, 5, 5, 6, 2, 7, 3,
6, 7, 6, 7, 7, 8, 7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 7, 6, 7, 6, 4,
7, 6, 5, 5, 7, 4, 2, 3, 4, 7, 7, 5, 4, 7, 7, 2, 7, 4, 6, 7, 7,
7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7, 5, 7, 7, 5, 7, 6, 6, 6,
5, 6, 5, 4, 7, 5, 7, 7, 7, 6, 7, 5, 7, 7, 7, 6, 7, 7, 7, 7, 7,
7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 7, 6, 7, 7, 7, 5, 5, 5, 5, 7, 4,
4, 7, 8, 4, 7, 5, 1, 7, 1, 6, 7, 7, 6, 7, 6, 7, 7, 7, 7, 5, 7,
7, 3, 6, 5, 7, 7, 7, 8, 7, 7, 3, 5, 5, 6, 5, 7, 5, 5, 6, 7, 6,
1, 7, 1, 7, 7, 7, 7, 7, 6, 7, 7, 6, 7, 7, 5, 7, 4, 4, 7, 4, 7,
6, 7, 6, 7, 5, 7, 2, 1, 5, 5, 6, 4, 7, 7, 5, 3, 7, 3, 7, 7, 7,
7, 7, 6, 7, 6, 8, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 4, 8, 7, 7, 7,
5, 3, 2, 5, 6, 4, 7, 7, 7, 4, 7, 4, 6, 7, 2, 7, 6, 5, 7, 7, 6,
7, 7, 7, 7, 7, 6, 7, 5, 7, 7, 4, 7, 7, 7, 7, 5, 5, 6, 4, 2, 5,
7, 6, 4, 3, 6, 7, 6, 7, 7, 7, 6, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7,
6, 7, 7, 7, 5, 6, 7, 4, 7, 5, 8, 5, 7, 3, 5, 7, 6, 7, 1, 7, 1,
7, 7, 6, 7, 6, 8, 7, 7, 7, 7, 7, 7, 7, 7, 4, 7, 7, 7, 7, 5, 7,
7, 5, 6, 2, 6, 5, 4, 6, 6, 6, 7, 7, 6, 6, 1, 6, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7, 7, 7, 4, 6, 7, 6, 7, 4, 7, 5,
4, 8, 6, 7, 1, 5, 5, 6, 8, 1, 7, 1, 7, 7, 7, 6, 7, 5, 7, 7, 6,
7, 7, 6, 7, 7, 7, 7, 7, 7, 7, 4, 7, 6, 6, 7, 4, 7, 5, 6, 7, 6,
6, 1, 7, 2, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7, 7,
7, 7, 6, 7, 6, 6, 7, 5, 4, 2, 6, 7, 3, 7, 8, 7, 8, 1, 5, 1, 6,
2, 7, 7, 3, 7, 7, 6, 7, 7, 7, 6, 6, 7, 6, 6, 7, 7, 7, 7, 7, 7,
7, 7, 2, 7, 5, 5, 2, 6, 7, 5, 7, 7, 8, 6, 1, 3, 7, 7, 5, 7, 6,
7, 7, 7, 7, 7, 6, 7, 7, 3, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 5, 5,
4, 6, 6, 5, 7, 6, 8, 4, 1, 3, 6, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 6, 7, 7, 7, 6, 6, 8, 7, 5, 5, 5, 5, 6, 5, 7, 8,
3, 7, 1, 6, 1, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 5, 5, 4, 7, 4, 4, 4, 5, 7, 5, 7, 6, 7, 5, 7, 7, 1,
5, 4, 5, 7, 6, 7, 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7, 7, 7, 7, 6,
4, 7, 6, 6, 6, 2, 7, 5, 4, 5, 4, 7, 7, 6, 7, 7, 4, 2, 7, 7, 1,
6, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 7, 7, 7, 7, 2, 7, 7, 5, 7, 7,
7, 2, 7, 4, 4, 2, 6, 7, 4, 7, 5, 7, 7, 7, 7, 7, 7, 1, 7, 7, 7,
7, 7, 7, 8, 7, 7, 7, 5, 6, 3, 7, 7, 5, 7, 7, 7, 3, 7, 7, 3, 7,
7, 7, 2, 7, 7, 4, 7, 4, 7, 6, 6, 6, 7, 7, 6, 6, 7, 7, 8, 6, 7,
7, 7, 7, 4, 7, 7, 5, 6, 7, 6, 7, 7, 4, 5, 6, 7, 6, 6, 7, 6, 4,
7, 1, 7, 6, 7, 3, 7, 7, 7, 6, 7, 7, 8, 7, 7, 7, 7, 7, 2, 7, 7,
4, 7, 7, 7, 7, 7, 7, 5, 7, 1, 6, 5, 5, 7, 4, 6, 3, 7, 6, 7, 3,
7, 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7, 7, 6, 7, 6, 7, 7, 6, 5, 7,
7, 5, 7, 2, 5, 4, 6, 7, 3, 7, 4, 7, 6, 6, 8, 6, 7, 1, 5, 7, 7,
7, 7, 5, 5, 7, 7, 4, 7, 7, 4, 7, 7, 5, 7, 7, 7, 2, 7, 5, 5, 7,
7, 7, 5, 7, 6, 2, 7, 1, 7, 6, 6, 7, 7, 7, 8, 7, 7, 4, 8, 6, 1,
7, 7, 5, 2, 7, 7, 4, 7, 7, 7, 7, 6, 7, 2, 7, 3, 4, 7, 7, 7, 7,
7, 7, 7, 7, 1, 4, 7, 6, 3, 4, 7, 7, 6, 6, 6, 6, 7, 6, 5, 7, 7,
5, 7, 4, 4, 7, 7, 7, 7, 6, 3, 2, 7, 6, 6, 6, 7, 6, 6, 6, 5, 6,
7, 3, 7, 6, 6, 7, 6, 7, 5, 7, 7, 4, 7, 7, 6, 5, 7, 6, 6, 6, 7,
5, 7, 6, 6, 7, 1, 6, 6, 6, 7, 4, 7, 7, 7, 7, 7, 3, 6, 7, 7, 7,
7, 7, 7, 7, 7, 7, 4, 7, 7, 6, 7, 5, 6, 7, 4, 2, 7, 7, 7, 7, 6,
6, 2, 7, 6, 2, 4, 7, 7, 2, 7, 3, 7, 7, 3, 7, 7, 6, 7, 7, 7, 6,
5, 7, 6, 4, 7, 7, 7, 7, 7, 4, 7, 7, 6, 7, 5, 4, 7, 6, 7, 4, 7,
6, 4, 5, 7, 7, 7, 7, 7, 1, 7, 5, 7, 7, 5, 7, 5, 5, 6, 4, 7, 7,
7, 6, 6, 1, 7, 3, 7, 7, 7, 7, 7, 7, 7, 7, 5, 7, 5, 5, 7, 4, 6,
6, 7, 6, 7, 3, 5, 7, 6, 7, 5, 5, 3, 3, 7, 6, 4, 7, 7, 7, 7, 7,
7, 7, 6, 7, 7, 7, 7, 7, 7, 7, 5, 7, 7, 4, 6, 7, 6, 2, 7, 1, 4,
5, 7, 7, 4, 7, 6, 7, 6, 7, 3, 7, 7, 4, 7, 4, 7, 7, 5, 7, 7, 6,
7, 7, 7, 7, 4, 7, 7, 4, 7, 7, 7, 6, 7, 4, 7, 5, 1, 6, 4, 6, 5,
7, 7, 4, 3, 7, 6, 7, 7, 7, 6, 7, 7, 7, 7, 5, 6, 7, 4, 7, 7, 7,
7, 2, 7, 6, 5, 7, 7, 7, 5, 5, 7, 7, 6, 4, 7, 7, 1, 7, 7, 4, 4,
1, 6, 5, 7, 7, 7, 7, 7, 7, 7, 7, 3, 7, 7, 7, 7, 7, 7, 7, 6, 6,
7, 4, 7, 7, 3, 7, 5, 7, 7, 7, 7, 6, 7, 7, 7, 6, 7, 2, 7, 4, 4,
7, 7, 7, 2, 7, 7, 7, 7, 6, 7, 7, 1, 7, 7, 3, 7, 7, 7, 7, 7, 7,
5, 7, 7, 6, 4, 7, 6, 5, 7, 1, 7, 7, 1, 7, 7, 7, 7, 4, 6, 5, 4,
7, 1, 7, 7, 7, 3, 7, 7, 1, 7, 7, 7, 7, 6, 7, 7, 6, 7, 5, 7, 7,
4, 7, 7, 7, 7, 7, 8, 4, 7, 1, 4, 7, 7, 7, 4, 7, 7, 5, 7, 6, 7,
7, 7, 4, 6, 7, 7, 7, 1, 7, 7, 4, 7, 7, 4, 7, 4, 7, 7, 6, 7, 6,
7, 7, 7, 8, 7, 6, 2, 6, 4, 7, 7, 7, 7, 4, 7, 7, 2, 4, 7, 7, 7,
7, 7, 7, 7, 6, 7, 5, 7, 7, 7, 1, 6, 5, 6, 7, 5, 7, 6, 8, 7, 7,
4, 6, 5, 7, 6, 4, 6, 6, 7, 7, 7, 8, 7, 7, 1, 7, 2, 7, 7, 4, 7,
7, 7, 7, 6, 7, 7, 2, 7, 7, 4, 6, 7, 7, 5, 6, 7, 7, 7, 3, 7, 4,
7, 7, 5, 7, 7, 7, 7, 7, 5, 7, 6, 7, 6, 7, 7, 7, 7, 6, 7, 2, 7,
7, 5, 7, 5, 7, 5, 6, 7, 7, 7, 7, 7, 7, 7, 2, 6, 5, 4, 7, 6, 7,
7, 7, 7, 7, 7, 4, 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 7, 7, 2, 6, 7,
3, 4, 7, 7, 4, 7, 6, 7, 5, 3, 7, 2, 7, 7, 3, 6, 2, 7, 6, 1, 6,
7, 7, 1, 7, 6, 5, 7, 7, 6, 6, 7, 7, 7, 7, 7, 2, 7, 7, 3, 7, 7,
8, 7, 7, 2, 5, 7, 1, 6, 4, 7, 7, 3, 7, 7, 7, 6, 8, 5, 7, 3, 2,
7, 7, 7, 6, 7, 6, 5, 6, 7, 7, 7, 7, 4, 7, 7, 3, 7, 7, 8, 7, 7,
2, 7, 7, 7, 4, 2, 7, 4, 3, 7, 7, 7, 7, 4, 7, 6, 6, 7, 7, 7, 2,
7, 8, 3, 7, 7, 7, 4, 7, 7, 6, 4, 7, 6, 8, 7, 2, 7, 7, 7, 5, 3,
7, 7, 7, 7, 7, 2, 7, 5, 3, 5, 6, 7, 7, 7, 7, 7, 5, 5, 7, 6, 3,
4, 4, 6, 3, 7, 7, 8, 7, 7, 1, 4, 6, 1, 4, 7, 7, 4, 7, 7, 8, 6,
7, 3, 4, 6, 7, 3, 6, 8, 7, 7, 7, 6, 7, 7, 2, 7, 7, 5, 4, 7, 7,
5, 8, 7, 8, 7, 7, 3, 4, 7, 7, 3, 7, 5, 7, 7, 5, 6, 7, 8, 1, 4,
7, 4, 3, 6, 7, 7, 7, 7, 4, 6, 7, 6, 4, 7, 7, 3, 6, 5, 8, 6, 7,
4, 6, 7, 6, 2, 6, 6, 4, 7, 6, 6, 7, 7, 7, 5, 7, 3, 5, 6, 6, 7,
7, 2, 5, 7, 7, 7, 5, 7, 7, 5, 7, 7, 7, 7, 7, 2, 7)
Hope these could help :)
I'm getting the same error message when trying to count or run distinct() on a 1B+ row parquet dataset. Adding to_duckdb() fixed it. I also can't share the dataset now, but here is the metadata requested.
> ds$schema
Schema
cvr_id: double
precinct: string
pres: string
pid: string
column: double
item: string
choice: string
choice_id: double
office_type: string
dist: string
party: string
incumbent: double
measure: double
place: string
topic: string
unexp_term: double
num_votes: double
state: string
county: string
See $metadata for additional Schema metadata
>
> n_rows = vapply(ds$files, function(f) { ParquetFileReader$create(f)$num_rows }, 0, USE.NAMES=FALSE)
> n_rowgrps = vapply(ds$files, function(f) { ParquetFileReader$create(f)$num_row_groups }, 0, USE.NAMES=FALSE)
> summary(n_rows); sum(n_rows)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2588 177040 500624 2773980 1776367 140958860
[1] 1137331911
> summary(n_rowgrps); sum(n_rowgrps)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 6.00 16.00 85.16 54.75 4302.00
[1] 34916
> packageVersion("arrow")
[1] ‘17.0.0.1’
I'm seeing the same issue when filtering a ~400M row dataset to remove rows where a column is duplicated. I'm running R 4.4.1 with Arrow 17.0.0.1 on mac. Is there another way do do this within Arrow that I'm missing? Here's the code that produces the error:
group_by(timestamp) |>
mutate(duplicate = n()) |>
filter(duplicate == 1) |>
ungroup()
Converting to_duckdb() gets around this error, but is erroring out due to lack of memory. One solution there is to let duckDB work on disk, but that adds more time.
This issue is definitely related to data size. Breaking it up into groups smaller than 170M rows works with my data.
Here's a working duckDB solution:
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = FALSE)
ds_filt <- ds_filt |>
to_duckdb(con = con) |>
group_by(timestamp) |>
mutate(duplicate = n()) |>
filter(duplicate == 1) |>
ungroup() |>
to_arrow()
duckDB needs an on-disk store to finish this, at least on my 32GB mac.
@blongworth Since DuckDB can now natively read Parquet datasets, it is possible that the arrow package is not needed here; would it not work if I ran duckdb alone without the arrow package? https://duckdb.org/docs/api/r#dbplyr
@blongworth are you able to share your data? And, just to be clear, you get the same type or error as the OP? "! Invalid: Negative buffer resize: -2147483584"
Here's the full error, so same as OP:
Error in `compute.arrow_dplyr_query()`:
! Invalid: Negative buffer resize: -2147483584
Backtrace:
1. dplyr::collect(...)
2. arrow:::collect.arrow_dplyr_query(...)
3. arrow:::compute.arrow_dplyr_query(x)
Here's some info about the data:
> nrow(ds)
[1] 400909276
> schema(ds)
Schema
timestamp: timestamp[us, tz=UTC]
pressure: double
u: double
v: double
w: double
amp1: int32
amp2: int32
amp3: int32
corr1: int32
corr2: int32
corr3: int32
temp: double
DO_percent: double
ph_counts: int32
ox_umol_l: double
pH: double
duplicate: int32
pH_cal: double
ox_umol_l_cal: double
year: int32
month: int32
Hey all, ran into the same issue, and I actually can share the dataset :) 500M rows, spread across a bunch of gzipped CSV files, approx 5.x GB. Where do I put them?
Query:
ds_multi_follows_final = open_dataset("./multi_follows_final",
format="csv",
schema = schema(
did=arrow::utf8(),
multi_follow_id=arrow::uint64(),
follow_created_at=arrow::utf8(),
follow_subject=arrow::utf8(),
sp_list_uri=arrow::utf8(),
match_score=arrow::float64()
),
skip=1)
ds_multi_follows_final %>%
group_by(did,follow_subject) %>%
summarize() %>%
collect() %>%
write_csv("all_multifollow_edges.csv.gz")
Backtrace:
> rlang::last_trace(drop=FALSE)
<error/rlang_error>
Error in `compute.arrow_dplyr_query()`:
! Invalid: Negative buffer resize: -2147483584
---
Backtrace:
▆
1. ├─... %>% write_csv("all_multifollow_edges.csv.gz")
2. ├─readr::write_csv(., "all_multifollow_edges.csv.gz")
3. │ └─readr::write_delim(...)
4. │ ├─base::stopifnot(is.data.frame(x))
5. │ └─base::is.data.frame(x)
6. ├─dplyr::collect(.)
7. └─arrow:::collect.arrow_dplyr_query(.)
8. └─arrow:::compute.arrow_dplyr_query(x)
9. └─base::tryCatch(...)
10. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
11. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
12. └─value[[3L]](cond)
13. └─arrow:::augment_io_error_msg(e, call, schema = schema())
14. └─rlang::abort(msg, call = call)
Session info:
> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] tidyjson_0.3.2 scales_1.3.0 stringr_1.5.1 readr_2.1.5 xtable_1.8-4 forcats_1.0.0
[7] lubridate_1.9.4 tidyr_1.3.1 ggplot2_3.5.1 arrow_18.1.0 pracma_2.4.4 dplyr_1.1.4
loaded via a namespace (and not attached):
[1] bit_4.5.0.1 jsonlite_1.8.9 gtable_0.3.6 crayon_1.5.3 compiler_4.4.2 renv_1.0.11
[7] tidyselect_1.2.1 parallel_4.4.2 assertthat_0.2.1 R6_2.5.1 labeling_0.4.3 generics_0.1.3
[13] tibble_3.2.1 munsell_0.5.1 pillar_1.10.1 tzdb_0.4.0 rlang_1.1.4 utf8_1.2.4
[19] stringi_1.8.4 bit64_4.5.2 timechange_0.3.0 cli_3.6.3 withr_3.0.2 magrittr_2.0.3
[25] grid_4.4.2 vroom_1.6.5 hms_1.1.3 lifecycle_1.0.4 vctrs_0.6.5 glue_1.8.0
[31] farver_2.1.2 colorspace_2.1-1 purrr_1.0.2 tools_4.4.2 pkgconfig_2.0.3
I compiled/installed arrow with renv like so:
Sys.setenv(ARROW_WITH_ZLIB="ON")
Sys.setenv("LIBARROW_MINIMAL" = FALSE)
Sys.setenv("LIBARROW_BINARY" = FALSE)
Sys.setenv("ARROW_R_DEV" = TRUE)
Sys.setenv(MAKEFLAGS = sprintf("-j%d", parallel::detectCores()))
options(renv.config.pak.enabled = TRUE)
install.packages(c("dplyr","pracma","arrow","ggplot2","tidyr","lubridate","forcats","xtable","readr","stringr","scales","tidyjson"))
That's great, thanks for the info @mrd0ll4r. You can try uploading to my Dropbox at https://www.dropbox.com/request/dR1ACeYDzZvj9b5Qsjbn. I can move them somewhere else after.
Oh, that returns This file request has been closed, deleted, or never existed
Whoops, can you try https://www.dropbox.com/request/oAfPalVlE9utSmVMJJMj?
Got it, thanks so much. I'm able to reproduce the issue.
Hi @zanmato1984, I initially thought I'd have time to look into this but haven't yet. Do you have any interest in taking a look at this one? I can send you a link to the files if so.
Hi @amoeba , sorry for the late reply, I was on vacation last few days. I can take a look. But I may have trouble reproducing in R cause I have hardly developed anything in R. Do you think it is available to create an equal C++ or Python reproduction? Or do you have a link for setting up local R environment? Thanks.
Oh hi @amoeba , one more thing worth looking. What arrow version were you using to reproduce? If it's not 19.0.0, could you try it? I'm recalling some fix between 18.1.0 to 19.0.0 that is solving a similar issue. Thanks.
Hi @zanmato1984, hardest thing will probably be finding the minimal dataset that will produce the issue. Once there's a minimal dataset that triggers the issue, reproducing in C++ or python would help isolate the issue. LMK if I can help with setting up or testing in R.
For my data, I've whittled it down to the summarize step that counts elements in each group:
dsd <- ds |>
group_by(timestamp) |>
summarize(n = n()) |>
collect()
I'm not sure whether it's the summarizing or the counting. I still see the problem in arrow 19.0.0.
Hi @blongworth , thanks for the information. Seeing it in 19.0.0 negates my previous assumption of an existing fix (this is helpful as well!). I think I'll just wait for @amoeba 's data and try to reproduce it in my local.
Similar issue, but in Python with pyarrow 20.0.0. Works fine with 30M less rows.
>>> mydf.shape
(338440930, 3)
>>> mydf.dtypes
sample category
peptide category
N Int32
dtype: object
>>> mydf.to_parquet('pep_all_pandas.parquet', index=False, row_group_size=8192*8192)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/util/_decorators.py", line 333, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/core/frame.py", line 3113, in to_parquet
return to_parquet(
^^^^^^^^^^^
File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line 480, in to_parquet
impl.write(
File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line 228, in write
self.api.parquet.write_table(
File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1909, in write_table
writer.write_table(table, row_group_size=row_group_size)
File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1115, in write_table
self.writer.write_table(table, row_group_size=row_group_size)
File "pyarrow/_parquet.pyx", line 2226, in pyarrow._parquet.ParquetWriter.write_table
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Negative buffer resize: -2008352576