support memory = FALSE like in spark
Hi,
Assuming that is (even technically) possible, it would be useful to have the data indexed (but not loaded yet in the RAM) like in sparklyr (see https://www.rdocumentation.org/packages/sparklyr/versions/1.0.2/topics/spark_read_parquet)
That would allow the user to load very large parquet files but pay only for what is actually used (similarly to what vroom does https://github.com/r-lib/vroom)
what do you think? Thanks!
Yes, I plan to implement ALTREP features also for the parquet reader similar to VROOM.
great idea!! maybe you should work with Jim Hester (@jimhester, vroom author) to get a single package that handles csv + parquet super fast? that would be a killer package in my opinion! and more dev are needed to fix bugs and other inefficiencies. what do you think?
Check out the altrep branch in this repo... for now, it materialises everything at once, but things like this should no longer read any unrelated payload data:
a <- miniparquet::read_parquet("...")
names(a)
mean(a$col)
See also https://twitter.com/hfmuehleisen/status/1176410678967640065?s=20