snaptron Missing Files in Snaptron Server Startup

Hello,

I have followed the singularity steps to get the encode1159 server running locally, as well as a custom snaptron db with data I processed via monorail. For setup of the encode1159 server, I ran into multiple errors where the lucene_full directories were empty, and the samples.tsv file was missing, which I filled in using files from here: https://snaptron.cs.jhu.edu/data/encode1159 However, I ran into this error while querying both the encode1159 server and my own data, which I cannot solve:

IOError: [Errno 2] No such file or directory: './data/refseq_transcripts_by_hgvs.tsv' curl: (18) transfer closed with outstanding read data remaining

I think for some reason a bunch of files are not being transferred correctly during the snaptron setup?

Thanks for your help!

Feb 20 '25 16:02 qhauck16

Thanks for the bug report @qhauck16, I suspect this is related to the server move we did a few years ago--- moving Snaptron from its original home to a new server that has an additional layer of web proxying between it and the public (which I don't control).

It looks like public web access to the annotation directories are locked down for some reason. I'll take a look and see if I can open them up (again) and let you know.

Separately, I'm impressed that you're running your own Snaptron server! As you've likely found out, its setup is not for the faint of heart :)

Mainly out of curiosity on my part, would you be willing to share a bit more about your project and why you're using Monorail/Snaptron for it?

Thanks, Chris

Feb 22 '25 19:02 ChristopherWilks

ok, I've restored access to the annotation files, you should be able to download them now for both of your local compilations, you'll probably need all 3 of these files but they're the same same across compilations):

https://snaptron.cs.jhu.edu/data/encode1159/refseq_transcripts_by_hgvs.tsv https://snaptron.cs.jhu.edu/data/encode1159/ucsc_known_canonical_transcript.tsv https://snaptron.cs.jhu.edu/data/gene_annotation_hg38/all_transcripts.gtf.bgz http://snaptron.cs.jhu.edu/data/srav2/gencode.v25.annotation.gff3.gz

Give that a try and let me know.

As for the initial lucene failure, it looks like I had assumed a particular path in the download script ("new_lucene") which none of the normal snaptron compilations have! So that should be fixed now.

Please let me know if you run into additional issues.

Thanks, Chris

Feb 22 '25 19:02 ChristopherWilks

Thanks for the quick response @ChristopherWilks!

I was able to download those files and get my own snaptron server up and running. To answer your question - I am broadly interested in splicing patterns across some of the datasets available on the snaptron website, and the junction counts do a pretty decent job of summarizing those splicing patterns, so snaptron has proved very useful for me! I would also like to check out some of those patterns (e.g. aberrant splicing in a specific gene in some BRCA samples from TCGA) on some local data. I figured best practice would be to run the same pipeline used to generate the current snaptron compilations on my own data - so actually setting up a server was not 100% necessary, but it seemed like the easiest way to replicate the pipeline.

Thanks again!

Feb 24 '25 19:02 qhauck16

excellent, thanks for the follow-up and the explanation for what you're doing, glad that Snaptron has been useful! I'm going to close this issue, but of course feel free to open new one(s) for any additional issues you run into.

Feb 24 '25 19:02 ChristopherWilks