tidyquery icon indicating copy to clipboard operation
tidyquery copied to clipboard

Support disk.frame objects

Open ianmcook opened this issue 4 years ago • 8 comments

See https://github.com/xiaodaigh/disk.frame/issues/196

ianmcook avatar Oct 24 '19 10:10 ianmcook

Blocked by https://github.com/xiaodaigh/disk.frame/issues/197 (using a Mac dev environment)

ianmcook avatar Oct 24 '19 18:10 ianmcook

xiaodaigh/disk.frame#197 is resolved. Now blocked by xiaodaigh/disk.frame#217

ianmcook avatar Nov 25 '19 16:11 ianmcook

Also blocked by https://github.com/xiaodaigh/disk.frame/issues/250

ianmcook avatar Jan 05 '20 17:01 ianmcook

Hey, all blockers are resolved and it is working! But with some bugs. See

library(disk.frame)
setup_disk.frame()

airports.df = as.disk.frame(airports)

# this works
airports.df %>%
  query("SELECT name as name1, lat as lat1, lon as lon1 ORDER BY lat DESC") %>% 
  collect

but this doesn't

airports.df %>%
  query("SELECT name, lat, lon as lon1 ORDER BY lat DESC LIMIT 5") %>% 
  collect

complaining about

Error: The SELECT list includes two or more long expressions with no aliases assigned to them. You must assign aliases to these expressions
In addition: There were 17 warnings (use warnings() to see them)

and the warnings()

Warning messages:
1: In readChar(rc, nchars) : truncating string with embedded nuls
2: In readChar(rc, nchars) : truncating string with embedded nuls
3: In readChar(rc, nchars) : truncating string with embedded nuls
4: In readChar(rc, nchars) : truncating string with embedded nuls
5: In readChar(rc, nchars) : truncating string with embedded nuls
6: In readChar(rc, nchars) : truncating string with embedded nuls
7: In readChar(rc, nchars) : truncating string with embedded nuls
8: In readChar(rc, nchars) : truncating string with embedded nuls
9: In readChar(rc, nchars) : truncating string with embedded nuls
10: In readChar(rc, nchars) : truncating string with embedded nuls
11: In readChar(rc, nchars) : truncating string with embedded nuls
12: In readChar(rc, nchars) : truncating string with embedded nuls
13: In readChar(rc, nchars) : truncating string with embedded nuls
14: In readChar(rc, nchars) : truncating string with embedded nuls
15: In readChar(rc, 1L, useBytes = TRUE) : truncating string with embedded nuls
16: In readChar(rc, 1L, useBytes = TRUE) : truncating string with embedded nuls
17: In readChar(rc, 1L, useBytes = TRUE) : truncating string with embedded nuls
18: In arrange.disk.frame(., ...) :
  `arrange.disk.frame` is now deprecated. Please use `chunk_arrange` instead. This is in preparation for a more powerful `arrange` that sorts the whole disk.frame

xiaodaigh avatar Jul 30 '20 03:07 xiaodaigh

Thanks @xiaodaigh—I'll take a look at this soon

ianmcook avatar Jul 30 '20 03:07 ianmcook

@xiaodaigh this error is happening because colnames() is returning NULL on a disk.frame object. Should I be using names(collect(get_chunk(df, 1))) to get the column names, as you suggest at https://diskframe.com/reference/colnames.html?

ianmcook avatar Aug 01 '20 16:08 ianmcook

I see. the design disk.frame is a little odd at this stage. So names(get_chunk(df, 1)) should suffice. But it's kinda weird to make you run this disk.frame specific code. Let me fix the disk.frame colnames.

See https://github.com/xiaodaigh/disk.frame/issues/299

xiaodaigh avatar Aug 02 '20 05:08 xiaodaigh

Another approach, which I think might be better is to set query as a S3 method so this would work

query <- function(data, ...) {
  UseMethod("query")
}

query.data.frame <- function(data, sql)  {
    query_(data, sql, TRUE)
}

then on {disk.frame} side, I can do something like this

query.disk.frame = create_chunk_mapper(tidyquery::query)

airports.df %>%
  query("SELECT name, lat, lon as lon1") %>% 
  collect

to test, this should definitely work

airports.df %>%
  query.disk.frame("SELECT name, lat, lon as lon1") %>% 
  collect

This already on a branch on {disk.frame}'s side.

xiaodaigh avatar Aug 08 '20 14:08 xiaodaigh

Closing because {disk.frame} has been soft-deprecated.

ianmcook avatar Nov 05 '22 14:11 ianmcook