network icon indicating copy to clipboard operation
network copied to clipboard

Move read.paj examples to files or statnet URLs

Open CarterButts opened this issue 3 years ago • 5 comments

Vlado's pajek site has been moved or otherwise is/was offline, which has produced errors for us on CRAN, because the documentation for read.paj and associated tests makes use of remote files on that site. It's nice to be able to demonstrate reading files by URL, but it does introduce a dependence; including demo files locally would fix that. Alternately, hosting files on our own site would give us control over the matter, but of course there's no guarantee that our site won't also run into issues (as has been known to happen). Some combination of fixes will be needed for the next release, since CRAN is threatening us with removal over it. (So very, very tired of CRAN's obsessions.)

CarterButts avatar Sep 19 '21 22:09 CarterButts

I think the best way to resolve https://github.com/statnet/networkDynamic/issues/10 also would be to just add the zip compressed file from http://vlado.fmf.uni-lj.si/pub/networks/data/esna/sampson.htm to network's data directory (since that is where read.paj is implemented) I can make a PR for that.

However the pajek site doesn't specify a license for the data, just reference papers and the UCINET data it was derived from http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/UciData.htm#sampson

Does anyone know the data was directly included in the original Sampson or Breiger papers? (and would that mean it is OK to include with the original citation?) I think this may have been one reason why I didn't include it directly before

Breiger R., Boorman S. and Arabie P. (1975). An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling. Journal of Mathematical Psychology, 12, 328-383.
Sampson, S. (1969). Crisis in a cloister. Unpublished doctoral dissertation, Cornell University. 

skyebend avatar Sep 27 '21 00:09 skyebend

IIRC, data sets are not usually subject to copyright in the US, although there are complications (e.g., the expression of data in a map certainly is).  To the extent that some expressive content beyond the raw data is contained in the paj files, it is probably best to assume that they are subject to copyright in some form. The Sampson data is in the original thesis (there are tables, including, interestingly, some that are never used, like relations other than liking at later time points).

If it is a concern, one way to proceed would be for us to make our own .paj files for the relevant data sets, and then include them.  I haven't dealt with that format in a long time, though. FWIW, it looks to me like the stuff is back up, so maybe there is some other way to handle the issue.  For the network package itself, I currently think that the main thing to do is to deactivate the read.paj tests that rely on external files from the standard CRAN-run set.  The external URL-based examples are already turned off (\dontrun{} tags are used) to prevent this sort of issue, but someone then stuck them in the testthat ensemble. This is what set off the CRAN alarms when the site was offline. In the long run, it's a good idea to migrate to internal files, but in the short run the actual problem is CRAN freaking out every time anything isn't perfect.  For that reason, all unit tests need to be designed so that their CRAN versions only alarm if something is really broken...overzealous testing that picks up either transient issues (someone's web site is down) or other things that are non-error behaviors and/or that require interpretation cause a ton of headaches.  An alarm that goes off too often is just as bad as one that doesn't go off when needed....

-Carter

On 9/26/21 5:19 PM, Skye Bender-deMoll wrote:

I think the best way to resolve statnet/networkDynamic#10 https://github.com/statnet/networkDynamic/issues/10 also would be to just add the zip compressed file from http://vlado.fmf.uni-lj.si/pub/networks/data/esna/sampson.htm http://vlado.fmf.uni-lj.si/pub/networks/data/esna/sampson.htm to network's data directory (since that is where read.paj is implemented) I can make a PR for that.

However the pajek site doesn't specify a license for the data, just reference papers and the UCINET data it was derived from http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/UciData.htm#sampson http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/UciData.htm#sampson

Does anyone know the data was directly included in the original Sampson or Breiger papers? (and would that mean it is OK to include with the original citation?) I think this may have been one reason why I didn't include it directly before

|Breiger R., Boorman S. and Arabie P. (1975). An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling. Journal of Mathematical Psychology, 12, 328-383. Sampson, S. (1969). Crisis in a cloister. Unpublished doctoral dissertation, Cornell University. |

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/statnet/network/issues/66#issuecomment-927402881, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJM3GEHPF2J2W7EQO2TI2LUD62ANANCNFSM5EKZKA2A. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

CarterButts avatar Sep 27 '21 01:09 CarterButts

  1. I agree with @skyebend that we should include the Sampson .paj data in the network data directory.

  2. On the Pajek website the license is:

Licenses and Citation:
If the source of the data set is not specified otherwise, these data sets are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License
Creative Commons License.
When publishing results obtained using this data set the original authors should be cited. In addition this collection should be cited as:
Vladimir Batagelj and Andrej Mrvar (2006): Pajek datasets.
<URL: http://vlado.fmf.uni-lj.si/pub/networks/data/>.

So I believe we're ok with that.

  1. It might be useful to wrap any scripts/code that accesses external websites with a graceful error/exit option.

martinamorris avatar Sep 29 '21 18:09 martinamorris