archive icon indicating copy to clipboard operation
archive copied to clipboard

archive_extract keeps working after extracting the desired files from remote .zip

Open pakom opened this issue 1 year ago • 3 comments

I want to extract specific files from .zip file on the web. The archive is almost 1GB and has multiple folders. The files I am interested in are in folder TIMSS2019_IDB_SPSS_G8/Data/. Here is the code I use.

library(archive)

d <- tempdir()

options(timeout = 50000000000)

archive_extract(archive = "https://www.iea.nl/sites/default/files/data-repository/TIMSS/TIMSS2019/TIMSS2019_IDB_SPSS_G8.zip",
dir = d,
files = c("TIMSS2019_IDB_SPSS_G8/Data/bcgarem7.sav",
          "TIMSS2019_IDB_SPSS_G8/Data/bcgchlz7.sav"))

The two files are downloaded instantaneously, as they are below 150K. However, archive_extract keeps working for nearly seven minutes which is the time that I would need for downloading the entire .zip file. It does not matter if I want to extract just 2 or 200 files, archive_extract just keeps working. I can't figure out why, but it looks to me that archive_extract keeps looking for files with the same name. Is there any way to make it exit when the desired files are downloaded?

I observe exactly the same behavior when just try to list the files in the .zip using the archive function - provides the list of files, but then keeps working.

In addition, after archive_extract finally exits, R drops the following warning:

Warning message:
In file(archive, "rb") : NAs introduced by coercion to integer range

pakom avatar May 21 '23 00:05 pakom

Yeah, it should stop, if all specified files are extracted and they are all files (i.e. not directories).

gaborcsardi avatar May 21 '23 12:05 gaborcsardi

Thank you for your reply Gabor. Yes, they are all files. The archive function behaves the same when listing files in an archive.

pakom avatar May 21 '23 12:05 pakom

Addressed by pull request #94

pakom avatar Oct 30 '23 23:10 pakom

Closed by #94.

gaborcsardi avatar Apr 22 '24 09:04 gaborcsardi