usethis icon indicating copy to clipboard operation
usethis copied to clipboard

`use_zip` error in unzipping directory with a different name from the zipfile

Open burnsal opened this issue 3 months ago • 1 comments

I am running into a missing value error trying to download and unzip large files with the use_zip function.

I was able to isolate the issue to an incompatibility with the top_directory function for extracting the head directory of the zip files in the helper function tidy_unzip. Here is the reprex with output included:

## usethis::use_zip() issue reprex

# Goal: download and unzip large files stored on a Figshare platform

# long link: https://agdatacommons.nal.usda.gov/ndownloader/files/44576230
# bitly link: https://bit.ly/ctdv2sp

# attempt with existing function
> usethis::use_zip("https://bit.ly/ctdv2sp",
                 destdir = "C:/temp")
✔ Downloading from 'https://bit.ly/ctdv2sp'
Downloaded: 312.74 MB  (100%)
✔ Download stored in 'C:/temp/species_v2_3.zip'
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

# separate process into its helper functions to source the error
> usethis:::tidy_download("https://bit.ly/ctdv2sp",
                        destdir = "C:/temp")
Downloaded: 312.74 MB  (100%)

> usethis:::tidy_unzip("C:/temp/species_v2_3.zip")
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed


## Dig deeper into `tidy_unzip`

# ABnote: first, define zipfile downloaded with `tidy_download`
> zipfile <- "C:/temp/species_v2_3.zip"
> file.exists(zipfile)
[1] TRUE

> base_path <- path_dir(zipfile)
> print(base_path)
[1] "C:/temp"

> filenames <- utils::unzip(zipfile, list = TRUE)[["Name"]]
> print(filenames) # you can see that the zipfile has a single-folder directory with a different name than the zip file
[1] "species_v2/label_encoder.txt" "species_v2/model_arch.pt"     "species_v2/model_weights.pth"

# dropbox particularites do not apply here, skip next two lines

> td <- top_directory(filenames)
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

## AH HA! Look at `top_directory` function
> in_top <- path_dir(filenames) == "."
[1] FALSE FALSE FALSE
> unique_top <- unique(filenames[in_top])
character(0)
# I think the code should extract the dir `species_v2` from my zipfile here
> is_directory <- grepl("/$", unique_top)
logical(0)

# address conditionals one at a time
> length(unique_top) > 1 
[1] FALSE
> !is_directory
logical(0)

# conditional statement is throwing the error b/c directory was not correctly extracted
> length(unique_top) > 1 || !is_directory
[1] NA
> if (length(unique_top) > 1 || !is_directory) {
+   NA_character_
+ } else {
+   unique_top
+ }
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

The code creating the top directory is not correctly addressing the structure of my zip files and extracting the unique directory within the zipfile.

# current structure
> in_top <- path_dir(filenames) == "."
> all(in_top) == FALSE
[1] TRUE
> unique_top <- unique(filenames[in_top])
character(0)

# the folder name at the top of the directory inside the zip file should be extracted
> path_dir(filenames)
[1] "species_v2" "species_v2" "species_v2"
> unique(path_dir(filenames))
[1] "species_v2"

A revised top_directory function makes this work:

top_directory <- function(filenames) {
  in_top <- path_dir(filenames) == "."
  unique_top <- unique(filenames[in_top])
  is_directory <- grepl("/$", unique_top)
  if(length(unique(path_dir(filenames)))==1 & length(unique_top) == 0){
    unique_top <- unique(path_dir(filenames))
  } else {
    if (length(unique_top) > 1 || !is_directory) {
      NA_character_
    } else {
      unique_top
    }
  }
}

# run use_zip with the revised helper
use_zip(url = "https://bit.ly/ctdv2sp",
+         destdir = "C:/temp", cleanup=T)
✔ Downloading from <https://bit.ly/ctdv2sp>.
Downloaded: 312.74 MB  (100%)
✔ Download stored in C:/temp/species_v2_3.zip.
✔ Unpacking ZIP file into species_v2/ (3 files extracted).
✔ Deleting species_v2_3.zip.
✔ Opening species_v2/ in the file manager.

I have made this change to a new branch in my forked clone of the repository here, and it passes the devtools::check tests. Please let me know if this change needs to be more generalized or if it is ready for a pull request. Thank you!

burnsal avatar Mar 08 '24 18:03 burnsal