arrow ARROW-18012: [R] Make map_batches .lazy = TRUE by default

ARROW-18012: [R] Make map_batches .lazy = TRUE by default

Open paleolimbot opened this issue 1 year ago • 2 comments

This makes the default map_batches() behaviour lazy (i.e., the function is called once per batch as each batch arrives):

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.

source <- RecordBatchReader$create(
  record_batch(a = 1:10),
  record_batch(a = 11:20)
)

mapped <- map_batches(source, function(x) {
  message("Hi! I'm being evaluated!")
  x
}, .schema = source$schema)

as_arrow_table(mapped)
#> Hi! I'm being evaluated!
#> Hi! I'm being evaluated!
#> Table
#> 20 rows x 1 columns
#> $a <int32>

^{Created on 2022-10-26 with reprex v2.0.2}

This was previously a confusing default since piping the resulting RecordBatchReader into an ExecPlan would fail for some ExecPlans before ARROW-17178 (#13706). This PR commits to the (more optimal/expected) lazy behaviour.

Oct 26 '22 15:10 paleolimbot

https://issues.apache.org/jira/browse/ARROW-18012

Oct 26 '22 17:10 github-actions[bot]

:warning: Ticket has not been started in JIRA, please click 'Start Progress'.

Oct 26 '22 17:10 github-actions[bot]

Benchmark runs are scheduled for baseline = 286c263492860bf6d62b3e39c80147b787848020 and contender = 97076308d07e447ad52fd4fa026f8d92513b98c9. 97076308d07e447ad52fd4fa026f8d92513b98c9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2 [Failed :arrow_down:0.0% :arrow_up:0.0%] test-mac-arm [Finished :arrow_down:0.0% :arrow_up:0.0%] ursa-i9-9960x [Finished :arrow_down:0.21% :arrow_up:0.0%] ursa-thinkcentre-m75q Buildkite builds: [Finished] 97076308 ec2-t3-xlarge-us-east-2 [Failed] 97076308 test-mac-arm [Finished] 97076308 ursa-i9-9960x [Finished] 97076308 ursa-thinkcentre-m75q [Finished] 286c2634 ec2-t3-xlarge-us-east-2 [Failed] 286c2634 test-mac-arm [Finished] 286c2634 ursa-i9-9960x [Finished] 286c2634 ursa-thinkcentre-m75q Supported benchmarks: ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True test-mac-arm: Supported benchmark langs: C++, Python, R ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Oct 30 '22 16:10 ursabot

arrow arrow copied to clipboard

ARROW-18012: [R] Make map_batches .lazy = TRUE by default

arrow
arrow copied to clipboard