archive
archive copied to clipboard
archive_extract keeps working after extracting the desired files from remote .zip
I want to extract specific files from .zip file on the web. The archive is almost 1GB and has multiple folders. The files I am interested in are in folder TIMSS2019_IDB_SPSS_G8/Data/
. Here is the code I use.
library(archive)
d <- tempdir()
options(timeout = 50000000000)
archive_extract(archive = "https://www.iea.nl/sites/default/files/data-repository/TIMSS/TIMSS2019/TIMSS2019_IDB_SPSS_G8.zip",
dir = d,
files = c("TIMSS2019_IDB_SPSS_G8/Data/bcgarem7.sav",
"TIMSS2019_IDB_SPSS_G8/Data/bcgchlz7.sav"))
The two files are downloaded instantaneously, as they are below 150K. However, archive_extract
keeps working for nearly seven minutes which is the time that I would need for downloading the entire .zip file. It does not matter if I want to extract just 2 or 200 files, archive_extract
just keeps working. I can't figure out why, but it looks to me that archive_extract
keeps looking for files with the same name. Is there any way to make it exit when the desired files are downloaded?
I observe exactly the same behavior when just try to list the files in the .zip using the archive
function - provides the list of files, but then keeps working.
In addition, after archive_extract
finally exits, R drops the following warning:
Warning message:
In file(archive, "rb") : NAs introduced by coercion to integer range
Yeah, it should stop, if all specified files are extracted and they are all files (i.e. not directories).
Thank you for your reply Gabor. Yes, they are all files.
The archive
function behaves the same when listing files in an archive.
Addressed by pull request #94
Closed by #94.