platypus
platypus copied to clipboard
Warning while creating the train/test/valid split in the BCCD example
Describe the bug This bug is associated with the BCCD example: https://github.com/maju116/platypus/blob/yolo3_fix/examples/Blood%20Cell%20Detection/Blood-Cell-Detection.md?fbclid=IwAR1-c-JKTEj6rCad5uCdDh84zzQ7Hv7rdXKZclQQZpAUOGiFyXNpwxj8p-Y
To Reproduce There is a possibility that running this code:
walk2(c("train", "valid", "test"), list(train_ids, valid_ids, test_ids), ~ {
annots <- annot_paths[.y]
images <- images_paths[.y]
dir_name <- .x
annots %>% walk(~ file.copy(., gsub("(BCCD)", paste0("BCCD/", dir_name), .)))
images %>% walk(~ file.copy(., gsub("(BCCD)", paste0("BCCD/", dir_name), .)))
})
will result in a set of the following warnings:
49: In file.create(to[okay]) :
cannot create the file '~/train_Dataset-master/BCCD/train/Annotations/BloodImage_00229.xml', reason: 'No such file or directory'
In my opinion this happens because of excessive action of the gsub function which replaces every match of 'BCCD' with a 'train', 'test' or 'valid' string creating paths of the form:
~/train_Dataset-master/BCCD/train/Annotations/BloodImage_00229.xml'
Instead of
~/BCCD_Dataset-master/BCCD/train/Annotations/BloodImage_00229.xml'
Session information (please complete the following information):
- OS: [e.g. iOS]: MS Windows 8.1 64 bit
- R version: 4.0.2
- Python version: 3.7.6
- TensorFlow (Python) version (
tensorflow::tf_version()
): 2.0 - R session information (
sessionInfo()
):
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] here_0.1 abind_1.4-5 platypus_0.1.1 keras_2.3.0.0 tensorflow_2.2.0 forcats_0.5.0 stringr_1.4.0
[8] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] progress_1.2.2 reticulate_1.16 tidyselect_1.1.0 haven_2.3.1 lattice_0.20-41 colorspace_1.4-1
[7] vctrs_0.3.4 generics_0.0.2 base64enc_0.1-3 XML_3.99-0.5 blob_1.2.1 rlang_0.4.7
[13] pillar_1.4.6 glue_1.4.2 withr_2.3.0 DBI_1.1.0 dbplyr_1.4.4 RColorBrewer_1.1-2
[19] modelr_0.1.8 readxl_1.3.1 lifecycle_0.2.0 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0
[25] rvest_0.3.6 tfruns_1.4 fansi_0.4.1 broom_0.7.1 Rcpp_1.0.5 scales_1.1.1
[31] backports_1.1.10 jsonlite_1.7.1 fs_1.5.0 gridExtra_2.3 hms_0.5.3 stringi_1.5.3
[37] rprojroot_1.3-2 grid_4.0.2 cli_2.0.2 tools_4.0.2 magrittr_1.5 crayon_1.3.4
[43] whisker_0.4 pkgconfig_2.0.3 zeallot_0.1.0 ellipsis_0.3.1 Matrix_1.2-18 prettyunits_1.1.1
[49] xml2_1.3.2 reprex_0.3.0 lubridate_1.7.9 assertthat_0.2.1 httr_1.4.2 rstudioapi_0.11
[55] R6_2.4.1 compiler_4.0.2
Additional context The solution could be using some additional package for paths manipulation e.g. pathlibr
Replacing that walk2 with the following seems to do the trick.
walk2(c("train", "valid", "test"), list(train_ids, valid_ids, test_ids)[2], ~ {
annots <- annot_paths[.y]
images <- images_paths[.y]
dir_name <- .x
annots %>% walk(~ file.copy(., gsub(BCCD_path, paste0(BCCD_path, '/', dir_name), .)))
images %>% walk(~ file.copy(., gsub(BCCD_path, paste0(BCCD_path, '/', dir_name), .)))
})