How to get files' ids?
Hi, sorry for stupid question, but I don't know how to get files' ids so I can download individual files from a dryad dataset.
I tried looking at our published dataset with:
> dryad_dataset("10.5061/dryad.7nt8f")
# truncated output
$`10.5061/dryad.7nt8f`$id
[1] 6817
However if I try to use that id to get files, it shows different doi for this id:
> dryad_files(6817)
# truncated output
$`6817`$`_links`$`stash:dataset`$href
[1] "/api/v2/datasets/doi%3A10.5061%2Fdryad.nf757"
i.e. the returned doi is rather 10.5061/dryad.nf757 instead of 10.5061/dryad.7nt8f.
So how do I get:
- a proper
idsfor my dataset, to be used in functions likedryad_files? - a link to a particular file (e.g.
Appendix S2.txtin the doi link above)?
Session Info
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: elementary OS 5.1.7 Hera
Matrix products: default
BLAS/LAPACK: /home/jena/miniconda3/lib/libopenblasp-r0.3.12.so
locale:
[1] LC_CTYPE=cs_CZ.UTF-8 LC_NUMERIC=C
[3] LC_TIME=cs_CZ.UTF-8 LC_COLLATE=cs_CZ.UTF-8
[5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=cs_CZ.UTF-8
[7] LC_PAPER=cs_CZ.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rdryad_1.0.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 magrittr_2.0.1 rappdirs_0.3.1 uuid_0.1-4
[5] R6_2.5.0 rlang_0.4.8 hoardr_0.5.2 tools_3.6.1
[9] htmltools_0.5.0 ellipsis_0.3.1 digest_0.6.27 httpcode_0.3.0
[13] tibble_3.0.4 lifecycle_0.2.0 crayon_1.3.4 zip_2.1.1
[17] IRdisplay_0.7.0 repr_1.1.0 base64enc_0.1-3 vctrs_0.3.5
[21] triebeard_0.3.0 IRkernel_1.1.1 curl_4.3 crul_1.0.0
[25] evaluate_0.14 mime_0.9 pbdZMQ_0.3-3.1 compiler_3.6.1
[29] pillar_1.4.7 urltools_1.7.3 jsonlite_1.7.1 pkgconfig_2.0.3
Update
I noticed that I can use the number from a link to file on Dryad website as ids and it seem to work properly and get the right file. But how do I get that ids from rdryad?
For example the file Appendix S2.txt mentioned above is linked with the following url: https://datadryad.org/stash/downloads/file_stream/33893
Using 33893 as ids in functions returns the right doi, file description etc:
> dryad_files(33893)
# truncated output
$`33893`$`_links`$`stash:dataset`$href
[1] "/api/v2/datasets/doi%3A10.5061%2Fdryad.7nt8f"
Thanks for opening the issue. It's quite a mystery to me too how it works. i'll have a look though
Sorry for the confusion on this. I hate to point fingers, but Dryad has not explained their API well at all, especially how the different ids work, and why we have to deal with their internal IDs, and not just the DOI for the dataset itself. And they don't really respond to questions, so really is a joy!
Okay, so this should work, where you have to get version information first:
last <- function(x) x[length(x)]
z = dryad_dataset_versions("10.5061/dryad.7nt8f")
idpath <- z[[1]]$`_embedded`$`stash:versions`$`_links.self.href`
id <- as.numeric(last(strsplit(idpath, "/")[[1]]))
# gives you information about the files, including their individual IDs
dryad_versions_files(id)
Then you still have regex/etc. the IDs out of the strings for each file.
We really need to make this easier - any pull requests welcome - don't have a lot of time to devote to this
Thanks for your reply and tips.
Early next year I plan to work on one pipeline which starts by pulling data from Dryad, so I will work more closely with this package. I cannot promise anything, but I will see if I can help to make it work in some way.
Thanks, sounds good