go-site icon indicating copy to clipboard operation
go-site copied to clipboard

Change sources: Human, chicken, dog, pig, cow files

Open pgaudet opened this issue 1 year ago • 4 comments

Hello,

@alexsign /GOA is now producing 'combined' files for Human, chichen, dog, pig, cow files, containing all Swiss-Prot isoforms, (not the TrEMBL isoforms), complexes, and RNAs.

The links are here:

  • Chicken http://ftp.ebi.ac.uk/pub/databases/GO/goa/CHICKEN/goa_chicken_plus.gaf.gz
  • Cow: http://ftp.ebi.ac.uk/pub/databases/GO/goa/COW/goa_cow_plus.gaf.gz
  • Dog: http://ftp.ebi.ac.uk/pub/databases/GO/goa/DOG/goa_dog_plus.gaf.gz
  • Human: http://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human_plus.gaf.gz
  • Pig: http://ftp.ebi.ac.uk/pub/databases/GO/goa/PIG/goa_pig_plus.gaf.gz

We need to change where we get this data in our 'sources'

Thanks, Pascale

pgaudet avatar Dec 21 '23 13:12 pgaudet

Talking to @pgaudet , we'll wait for this next release to pass and then push the change. Possible locations of friction:

  • [ ] neo
  • [x] downloads
  • [x] stats

kltm avatar Jan 08 '24 20:01 kltm

@pgaudet I noticed the existence of goa_pdb (https://ftp.ebi.ac.uk/pub/databases/GO/goa/PDB/goa_pdb.gaf.gz) in the metadata. Is this used for anything? I think we don't use that? I don't have any reference to it, except causing problems, back to 2019.

kltm avatar May 07 '24 01:05 kltm

The files in the first comment are correct. GOA produces various files for various groups; we can ignore these.

pgaudet avatar May 07 '24 14:05 pgaudet

Initial changes have been made and we're waiting on a snapshot run to test.

kltm avatar May 08 '24 00:05 kltm

Talking @pgaudet, the stats seem to be good. Looking at the test downloads page (http://snapshot.geneontology.org/products/pages/downloads.html , ignoring the links), that seems to be good.

The final item to ensure is the NEO build. Building now.

kltm avatar May 15 '24 22:05 kltm

NEO built: 1734706857 golr-index-contents.tgz on machine: 1738730937 golr_new.tgz Given how close these are, I think it's reasonable that nothing extreme happened. Allowing snapshot to proceed.

kltm avatar May 21 '24 13:05 kltm

Single file for human, dog, cow, chicken and pig: :)

Image

compared to 2024-04-24 release:

Image

pgaudet avatar Jun 13 '24 08:06 pgaudet

I think this is complete? The only concern I see now is the entity is incorrect, currently is "protein" when it's a mix of protein, various RNAs, "gene_product", etc. But I think the requirements of this actual ticket are complete.

suzialeksander avatar Jun 19 '24 19:06 suzialeksander

Right, next, we need to fix the downalods page and the documentation,

pgaudet avatar Jun 20 '24 06:06 pgaudet