rdryad How to get files' ids?

Hi, sorry for stupid question, but I don't know how to get files' ids so I can download individual files from a dryad dataset.

I tried looking at our published dataset with:

> dryad_dataset("10.5061/dryad.7nt8f")
# truncated output
$`10.5061/dryad.7nt8f`$id
[1] 6817

However if I try to use that id to get files, it shows different doi for this id:

> dryad_files(6817)
# truncated output
$`6817`$`_links`$`stash:dataset`$href
[1] "/api/v2/datasets/doi%3A10.5061%2Fdryad.nf757"

i.e. the returned doi is rather 10.5061/dryad.nf757 instead of 10.5061/dryad.7nt8f.

So how do I get:

a proper ids for my dataset, to be used in functions like dryad_files?
a link to a particular file (e.g. Appendix S2.txt in the doi link above)?

Session Info

R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: elementary OS 5.1.7 Hera

Matrix products: default
BLAS/LAPACK: /home/jena/miniconda3/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=cs_CZ.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=cs_CZ.UTF-8        LC_COLLATE=cs_CZ.UTF-8    
 [5] LC_MONETARY=cs_CZ.UTF-8    LC_MESSAGES=cs_CZ.UTF-8   
 [7] LC_PAPER=cs_CZ.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rdryad_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5      magrittr_2.0.1  rappdirs_0.3.1  uuid_0.1-4     
 [5] R6_2.5.0        rlang_0.4.8     hoardr_0.5.2    tools_3.6.1    
 [9] htmltools_0.5.0 ellipsis_0.3.1  digest_0.6.27   httpcode_0.3.0 
[13] tibble_3.0.4    lifecycle_0.2.0 crayon_1.3.4    zip_2.1.1      
[17] IRdisplay_0.7.0 repr_1.1.0      base64enc_0.1-3 vctrs_0.3.5    
[21] triebeard_0.3.0 IRkernel_1.1.1  curl_4.3        crul_1.0.0     
[25] evaluate_0.14   mime_0.9        pbdZMQ_0.3-3.1  compiler_3.6.1 
[29] pillar_1.4.7    urltools_1.7.3  jsonlite_1.7.1  pkgconfig_2.0.3

Dec 14 '20 10:12 janxkoci

Update

I noticed that I can use the number from a link to file on Dryad website as ids and it seem to work properly and get the right file. But how do I get that ids from rdryad?

For example the file Appendix S2.txt mentioned above is linked with the following url: https://datadryad.org/stash/downloads/file_stream/33893

Using 33893 as ids in functions returns the right doi, file description etc:

> dryad_files(33893)
# truncated output
$`33893`$`_links`$`stash:dataset`$href
[1] "/api/v2/datasets/doi%3A10.5061%2Fdryad.7nt8f"

Dec 14 '20 11:12 janxkoci

Thanks for opening the issue. It's quite a mystery to me too how it works. i'll have a look though

Dec 15 '20 18:12 sckott

Sorry for the confusion on this. I hate to point fingers, but Dryad has not explained their API well at all, especially how the different ids work, and why we have to deal with their internal IDs, and not just the DOI for the dataset itself. And they don't really respond to questions, so really is a joy!

Dec 15 '20 20:12 sckott

Okay, so this should work, where you have to get version information first:

last <- function(x) x[length(x)]
z = dryad_dataset_versions("10.5061/dryad.7nt8f")
idpath <- z[[1]]$`_embedded`$`stash:versions`$`_links.self.href`
id <- as.numeric(last(strsplit(idpath, "/")[[1]]))
# gives you information about the files, including their individual IDs
dryad_versions_files(id)

Then you still have regex/etc. the IDs out of the strings for each file.

We really need to make this easier - any pull requests welcome - don't have a lot of time to devote to this

Dec 15 '20 20:12 sckott

Thanks for your reply and tips.

Early next year I plan to work on one pipeline which starts by pulling data from Dryad, so I will work more closely with this package. I cannot promise anything, but I will see if I can help to make it work in some way.

Dec 16 '20 09:12 janxkoci

Thanks, sounds good

Dec 16 '20 16:12 sckott