content icon indicating copy to clipboard operation
content copied to clipboard

Curation work to ensure all entries give "canonical" tool descriptions

Open joncison opened this issue 6 years ago • 10 comments

One of many issues around GitHub-based content management for bio.tools.

joncison avatar Feb 01 '19 10:02 joncison

Curation work to remove remaining entry redundancy ensuring a non-redundant set of “canonical” tool descriptions - this is mostly done but see e.g. https://github.com/bio-tools/biotoolsRegistry/issues/282

joncison avatar Feb 01 '19 10:02 joncison

@hansioan, can we make a definitive list of actions here? To my mind it's this:

  • [x] tools imported from cloudIFB (did these yesterday)
  • [x] tools imported from Galaxy pasteur - need to speak ideally today with @hmenager about this
  • [x] 100 other known duplicates ? (do you have a list we can work on Hans?)
  • [x] verification of all currently unverified IDs (see https://github.com/bio-tools/biotoolsRegistry/issues/357)
  • [x] resolving remaining redundancies & issues from systematic ID check
  • [x] checking that all homepage URLs are not broken (with tooling to auto-annotate ones which are down) (see https://github.com/bio-tools/biotoolsRegistry/issues/207)
  • [x] redundant descriptions / entries of highly prevalent tools, e.g. BLAST, HMMER ? (need to check the big names)

joncison avatar Feb 05 '19 08:02 joncison

@bgruening @piotrgithub1 @matuskalas - me, Hans & Herve have been making a major push in content clean-up (mostly ID verification, tool names and redundancy removal) in preparation for data dump (https://github.com/bio-tools/content/issues/2).

Bearing in mind that the vision for bio.tools is to provide "canonical" descriptions of unique tools, may I ask please that if you have a view on clean-ups that need doing in this regard, to let us know very soon please. e.g. do we satisfy the requirement for integration of data from bioconda etc.

We hope to get the clean-up complete by end of next week.

joncison avatar Feb 07 '19 08:02 joncison

@joncison what do you need? Imho we can deal with this after the push. Bioconda will deal with whatever bio.tools drop. Bioconda has already started to annotate packages with bio.tools IDs, so ideally they should keep stable and the content should be YAML from our side. But otherwise, we will know more if we start working on it :)

bgruening avatar Feb 07 '19 08:02 bgruening

I was wondering whether any of you guys know already of content issues that would make the integration hard, duplicates (which are now I think nearly all resolved) being an obvious case. We need also to do this clean-up for a paper soon to be submitted (we're all co-authors) - the main reason for doing it now. Rest assured the dump will go ahead ASAP.

joncison avatar Feb 07 '19 08:02 joncison

Thanks @joncison! My take on this is, we create the bot and create the content-validation scripts and if things fail, because of duplicates or such, we will know and can fix it.

bgruening avatar Feb 07 '19 08:02 bgruening

very good - which would trap any currently unknown issues (and soon we'll have fixed all the known ones). ps for the validation angles we already have biotoolsLint (currently just harvesting ideas)

joncison avatar Feb 07 '19 09:02 joncison

quick update @bgruening and @hmenager : @hansioan and me are making sweeping progress on above, but it's a huge job ... will keep you posted. The (clean) content dump will follow once we're done.

joncison avatar Feb 18 '19 14:02 joncison

quick update @bgruening @piotrgithub1 me and @hansioan are done with the clean-ups (huge job) only thing left is a final verification of IDs (for things added in last weeks). Once that's done I'll close this issue. I'm not claiming all the content is now perfect, but it's a lot better than it was a couple of months ago in terms of redundancy, sensible IDs, ownership etc. cc @hmenager

joncison avatar Mar 19 '19 12:03 joncison

UPDATE All things mooted on Feb 5 have been done, but keep this open because there will be further improvements to make, no doubt.

joncison avatar May 10 '19 09:05 joncison