Be able to read sas
Is there any workaround to achieve this with the current disk.frame version?
How big is your SAS dataset?
They are about 10gb each in sas7bdat format and I have to repeat some process for about 500 of them. I tried just reading one by one to memory using haven package but it doesn’t fit the RAM.
Matthew Son
On Aug 30, 2021, at 6:48 PM, evalparse @.***> wrote:
How big is your SAS dataset?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
10gb each in sas7bdat format
are they binary compressed? if not then {haven} can read it if I remember correctly.
For example something like this would work
list_of_disk.frames = lapply(list_of_paths_to_sasfiles, function(path) {
path %>%
haven::read_sas() %>%
disk.frame::as.disk.frame()
})
diskf = disk.frame::rbindlist.disk.frame(list_of_disk.frames)
I think it is only compressed in ‘char’, not pretty sure, but I was able to read the sas file from RStudio for first 10,000 rows. So I believe haven can provide some help here.
But each file is too big for my machine (it’s RStudio server, and whenever I try to load one file into ram it just crashes).
I guess the suggestion works only when each file fits the memory?
What I was guessing is that since haven allows to read only user-specified rows, I can somehow mix haven and disk.frame to split them into several .fst pieces.
On Aug 31, 2021, at 6:14 AM, evalparse @.***> wrote:
10gb each in sas7bdat format
are they binary compressed? if not then {haven} can read it if I remember correctly.
For example something like this would work
list_of_disk.frames = lapply(list_of_paths_to_sasfiles, function(path) { path %>% haven::read_sas() %>% disk.frame::as.disk.frame() })
diskf = disk.frame::rbindlist.disk.frame(list_of_disk.frames) — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xiaodaigh/disk.frame/issues/44#issuecomment-909098004, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALCXW7JYP6HDISWEDDB6JDTT7STPZANCNFSM4GCHFQAQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
What I was guessing is that since haven allows to read only user-specified rows, I can somehow mix haven and disk.frame to split them into several
.fstpieces.
Yeah. Not available yet.
Looking forward to future releases! Thanks anyhow.
On Aug 31, 2021, at 8:27 AM, evalparse @.***> wrote:
What I was guessing is that since haven allows to read only user-specified rows, I can somehow mix haven and disk.frame to split them into several .fst pieces.
Yeah. Not available yet.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xiaodaigh/disk.frame/issues/44#issuecomment-909190505, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALCXW7NWLXQXZ46J7KZ7PHDT7TDDZANCNFSM4GCHFQAQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.