igraphdata icon indicating copy to clipboard operation
igraphdata copied to clipboard

Reproduce graphs from original data

Open krlmlr opened this issue 2 years ago • 14 comments

Done by inst/getdata.R, I have trouble recreating yeast and USairports .

  • The Nature publications required for yeast are not found, cannot open URL 'https://www.nature.com/nature/journal/v417/n6887/extref/nature750-s1.doc': HTTP status was '404 Not Found', new location linked at https://www.nature.com/articles/nature750#Sec8
  • USairports relies on ~/Downloads/1067890998_T_T100D_SEGMENT_ALL_CARRIER.csv

@gaborcsardi: Can you help with the .csv file?

Moving forward, I propose to separate the downloads from the creation of the graphs so that we're less reliant on external data storage here.

krlmlr avatar Jun 16 '23 04:06 krlmlr

I don't have that CSV any more, sorry.

gaborcsardi avatar Jun 16 '23 07:06 gaborcsardi

Thanks. We could create that CSV from the graph, I suppose.

I'm thrilled that the other graphs seem to be recreatable without problems. We'll need to compare the results, though.

krlmlr avatar Jun 16 '23 07:06 krlmlr

Just to understand, is it necessary to recreate the networks from the original source or are you just testing the scripts that do so?

szhorvat avatar Jun 16 '23 07:06 szhorvat

No, it's not necessary, just good practice.

krlmlr avatar Jun 17 '23 14:06 krlmlr

@krlmlr

Moving forward, I propose to separate the downloads from the creation of the graphs so that we're less reliant on external data storage here.

Do you mean the package should contain a copy of the datasets? Or we store them somewhere else?

maelle avatar Jun 19 '23 09:06 maelle

I don't mind a copy of the raw data on GitHub.

krlmlr avatar Jun 19 '23 09:06 krlmlr

so the task is to try and locate the yeast and airport data?

how do we compare the obtained graphs?

maelle avatar Jun 19 '23 09:06 maelle

reg the yeast data, it probably is in https://www.nature.com/articles/nature750 but I don't have access (and even if we get access I suppose the data isn't really free to share :sweat_smile: )

maelle avatar Jun 19 '23 09:06 maelle

oh wait this is the data: https://static-content.springer.com/esm/art%3A10.1038%2Fnature750/MediaObjects/41586_2002_BFnature750_MOESM2_ESM.doc (via https://www.nature.com/articles/nature750#Sec8)

maelle avatar Jun 19 '23 09:06 maelle

Yes, yeast seems solved. The only tricky challenge is to reverse-engineer the CSV file, but I'm not sure it's worth it.

krlmlr avatar Jun 19 '23 09:06 krlmlr

so what are the TODOs?

maelle avatar Jun 19 '23 09:06 maelle

  • Find a place for the raw files in this repo
  • Split the getdata.R script, perhaps one unified download script, and one script per dataset?
  • Make sure that the generated data is unchanged

krlmlr avatar Jun 19 '23 09:06 krlmlr

Wait so we need 3 new issues, this issue is not really closed then?

maelle avatar Jun 19 '23 10:06 maelle

One issue is fine.

krlmlr avatar Jun 19 '23 10:06 krlmlr