TALON icon indicating copy to clipboard operation
TALON copied to clipboard

ValueError: No reads detected. Make sure your dataset names are correct.

Open chalybes opened this issue 4 years ago • 10 comments

Hello! I'm testing out TALON on some Nanopore sequencing reads. After setting up the database and specifying the config file TALON's get_read_annotations.py would run and throw this exception:

ValueError: No reads detected. Make sure your dataset names are correct.

I combed through the GitHub code and double-checked my config file, I can't seem to pinpoint why I would be having issues with the dataset names.

chalybes avatar Apr 11 '20 21:04 chalybes

Hello! Let's pinpoint what's going wrong for you. Can you tell me what commands you're using starting from initializing the database?

fairliereese avatar Apr 14 '20 19:04 fairliereese

@fairliereese I'm also getting this issue. Do the various sample names specified in the config file have to correspond to RG tag values in the input sam?

oneillkza avatar May 12 '21 18:05 oneillkza

I am running ccbr_talon_5.0_v2.1 from the Docker container (via Singularity).

The commands run were:

#TALON demands a sam file as input, so decompress to sam so it can recompress it all over again:
singularity exec -B /projects ccbr_talon_5.0_v2.1.sif \
	samtools view -h $1 > tmp/$1.sam

#TALON takes half its parameters via a config file, so make one temporarily:
echo dummy,dummy,dummy,tmp/$1.sam > tmp/$1_talon_conf.csv

singularity exec -B /projects ccbr_talon_5.0_v2.1.sif \
	talon \
	--f tmp/$1_talon_conf.csv \
	--db hg38_no_alt_ensembl100_talon.db \
	--build hg38_no_alt \
	-t 48 \
	--tmp_dir=tmp/talon \
	--o colo829_F34519_talon 

This was the output from TALON:

[ 2021-05-11 00:49:26 ] Started TALON run
[ 2021-05-11 12:10:16 ] Merged input SAM/BAM files
[ 2021-05-11 16:19:26 ] Split reads into 104 intervals
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22_KI270738v1_random:4406-90670...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chrUn_GL000195v1:35-179876...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14_GL000009v2_random:1-200815...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270712v1_random:10198-176043...
[ 2021-05-11 16:20:16 ] Annotating reads in interval chrUn_GL000216v2:507-176089...
[ 2021-05-11 16:20:53 ] Annotating reads in interval chrUn_KI270466v1:1-1233...
[ 2021-05-11 16:21:14 ] Annotating reads in interval chrUn_KI270746v1:3325-64242...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr16_KI270728v1_random:10075-1871121...
[ 2021-05-11 16:20:23 ] Annotating reads in interval chrUn_KI270336v1:2-1026...
[ 2021-05-11 16:20:54 ] Annotating reads in interval chrUn_KI270512v1:9765-14717...
[ 2021-05-11 16:21:16 ] Annotating reads in interval chrUn_KI270749v1:29506-157277...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22_KI270735v1_random:1500-38669...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr5_GL000208v1_random:34379-62208...
[ 2021-05-11 16:20:43 ] Annotating reads in interval chrUn_KI270435v1:24088-91080...
[ 2021-05-11 16:21:08 ] Annotating reads in interval chrUn_KI270741v1:79-130840...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270706v1_random:1-175026...
[ 2021-05-11 16:20:56 ] Annotating reads in interval chrUn_KI270516v1:228-718...
[ 2021-05-11 16:21:16 ] Annotating reads in interval chrUn_KI270748v1:7384-85760...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270707v1_random:108-31868...
[ 2021-05-11 16:20:16 ] Annotating reads in interval chrUn_GL000213v1:12618-144984...
[ 2021-05-11 16:20:47 ] Annotating reads in interval chrUn_KI270442v1:114728-392061...
[ 2021-05-11 16:21:13 ] Annotating reads in interval chrUn_KI270745v1:3180-41891...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22_KI270734v1_random:21750-164851...
[ 2021-05-11 16:20:25 ] Annotating reads in interval chrUn_KI270337v1:20-1115...
[ 2021-05-11 16:20:55 ] Annotating reads in interval chrUn_KI270515v1:775-5545...
[ 2021-05-11 16:21:16 ] Annotating reads in interval chrUn_KI270747v1:1530-40703...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr15_KI270727v1_random:292-446254...
[ 2021-05-11 16:21:18 ] Annotating reads in interval chrUn_KI270750v1:1698-144458...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270708v1_random:2468-127612...
[ 2021-05-11 16:20:16 ] Annotating reads in interval chrUn_GL000214v1:12910-137718...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22_KI270732v1_random:8612-41537...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr9_KI270720v1_random:93-38639...
[ 2021-05-11 16:20:57 ] Annotating reads in interval chrUn_KI270519v1:18882-74403...
[ 2021-05-11 16:21:21 ] Annotating reads in interval chrUn_KI270754v1:1288-40191...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr17_GL000205v2_random:23522-185559...
[ 2021-05-11 16:20:21 ] Annotating reads in interval chrUn_KI270333v1:1-2699...
[ 2021-05-11 16:20:53 ] Annotating reads in interval chrUn_KI270467v1:655-3920...
[ 2021-05-11 16:21:25 ] Annotating reads in interval chrUn_KI270755v1:5090-32526...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270714v1_random:1183-16423...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr4_GL000008v2_random:9705-209687...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14_GL000194v1_random:10561-189315...
[ 2021-05-11 16:21:20 ] Annotating reads in interval chrUn_KI270751v1:219-147540...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22_KI270731v1_random:867-136018...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr9_KI270717v1_random:1955-39680...
[ 2021-05-11 16:20:47 ] Annotating reads in interval chrUn_KI270438v1:65522-110592...
[ 2021-05-11 16:21:10 ] Annotating reads in interval chrUn_KI270742v1:1-183868...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14_KI270726v1_random:2541-39548...
[ 2021-05-11 16:20:21 ] Annotating reads in interval chrUn_KI270330v1:1-765...
[ 2021-05-11 16:20:51 ] Annotating reads in interval chrUn_KI270448v1:7767-7888...
[ 2021-05-11 16:21:13 ] Annotating reads in interval chrUn_KI270743v1:1446-210474...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14_GL000225v1_random:1915-200640...
[ 2021-05-11 16:20:17 ] Annotating reads in interval chrUn_GL000218v1:1660-157752...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22_KI270736v1_random:126323-171129...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr9_KI270718v1_random:1-38054...
[ 2021-05-11 16:20:53 ] Annotating reads in interval chrUn_KI270509v1:193-1730...
[ 2021-05-11 16:21:13 ] Annotating reads in interval chrUn_KI270744v1:1-159018...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270710v1_random:6706-38919...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr9_KI270719v1_random:1083-174211...
[ 2021-05-11 16:20:58 ] Annotating reads in interval chrUn_KI270521v1:7283-7404...
[ 2021-05-11 16:21:21 ] Annotating reads in interval chrUn_KI270753v1:1408-62944...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr17_KI270730v1_random:9744-111506...
[ 2021-05-11 16:20:12 ] Annotating reads in interval chr3_GL000221v1_random:1-153774...
[ 2021-05-11 16:21:06 ] Annotating reads in interval chrUn_KI270589v1:14135-14280...
[ 2021-05-11 16:21:36 ] Annotating reads in interval chrY:2844080-56881340...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14_KI270722v1_random:956-194050...
[ 2021-05-11 16:20:18 ] Annotating reads in interval chrUn_GL000220v1:731-161802...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22_KI270733v1_random:3446-179772...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr18:10158-80247460...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr21:5022496-46698047...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr13:17353615-114351750...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr10:15907-133785674...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr16:11282-90226652...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270711v1_random:4606-41730...
[ 2021-05-11 16:20:17 ] Annotating reads in interval chrUn_GL000219v1:40755-177847...
[ 2021-05-11 16:21:32 ] Annotating reads in interval chrUn_KI270756v1:7616-76286...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22:10528666-50805650...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr20:129962-64328864...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270709v1_random:1877-66687...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chrM:1-16569...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14:16026156-106845701...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr17_KI270729v1_random:14-280490...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr8:60001-145069947...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr15:17014781-101980374...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1_KI270713v1_random:5068-34735...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr9:11716-138321337...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr22_KI270737v1_random:2205-95207...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr4:42727-190202972...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr11_KI270721v1_random:2600-53022...
[ 2021-05-11 16:20:18 ] Annotating reads in interval chrUn_GL000224v1:1-134637...
[ 2021-05-11 16:21:05 ] Annotating reads in interval chrUn_KI270538v1:67287-68445...
[ 2021-05-11 16:21:35 ] Annotating reads in interval chrX:11380-156028048...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14_KI270723v1_random:4043-36806...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr5:64619-181478259...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr19:60173-58606123...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14_KI270724v1_random:854-30568...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr6:67349-170745979...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr14_KI270725v1_random:11661-172788...
[ 2021-05-11 16:20:15 ] Annotating reads in interval chr7:10347-159235349...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr12:10624-133211597...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr17:60807-83247136...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr3:51404-198234971...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr2:12436-242181765...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr1:10507-248944156...
[ 2021-05-12 00:05:21 ] Shutting down message queue...
[ 2021-05-11 16:19:40 ] Annotating reads in interval chr11:60185-135054804...
[ 2021-05-11 16:19:40 ] Launching parallel annotation jobs
[ 2021-05-12 00:05:21 ] All jobs complete. Starting database update.
[ 2021-05-12 00:10:01 ] Validating database........
[ 2021-05-12 00:10:04 ] Database update complete.
[ 2021-05-12 00:10:04 ] Creating read-wise annotation file.
Traceback (most recent call last):
  File "/usr/local/bin/talon", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/talon/talon.py", line 2474, in main
    get_read_annotations.make_read_annot_file(database, build,  
  File "/usr/local/lib/python3.8/dist-packages/talon/post/get_read_annotations.py", line 355, in make_read_annot_file
    fetch_reads(database, build, tmp_file = tmp_read_file, datasets = datasets)
  File "/usr/local/lib/python3.8/dist-packages/talon/post/get_read_annotations.py", line 134, in fetch_reads
    raise ValueError(("No reads detected. Make sure your dataset names are " 
ValueError: No reads detected. Make sure your dataset names are correct.
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/managers.py", line 616, in _run_server
    server.serve_forever()
  File "/usr/lib/python3.8/multiprocessing/managers.py", line 182, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/usr/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/usr/lib/python3.8/shutil.py", line 715, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.8/shutil.py", line 672, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/usr/lib/python3.8/shutil.py", line 670, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs000000008f4d9ad400034c6c'

Note that I'm just passing in "dummy" for the various labels in the config file.

One other thing: I did not label reads for internal priming. Could that be causing the issue?

oneillkza avatar May 12 '21 18:05 oneillkza

@fairliereese I'm also getting this issue. Do the various sample names specified in the config file have to correspond to RG tag values in the input sam?

TALON provides RG tags when it merges all the sam files together at the beginning of the run, based on the sample names provided in the config file. Since you only have one sam file in your config file, I imagine this is not the problem

One other thing: I did not label reads for internal priming. Could that be causing the issue?

This is not a problem, we just suggest doing this before you run TALON for transcript QC purposes.

The problem seems to be in the script that executes at the end of the TALON run when it attempts to create the read-wise annotation file by querying the database, which it looks like there was specifically a problem in the query. This part of the code runs separately from populating the database so the good news is you have a fully-functional TALON database at this point. If you wish to create the read_annot file, I suggest you run the talon_fetch_reads module separately on the created database. I would be interested in getting a minimal non-working example from you as this appears to be a problem that multiple people are having!

fairliereese avatar May 12 '21 19:05 fairliereese

Thanks!

Here's a minimal dataset that should be OK to share:

uhr_min.sam.gz

I was also trying to run quantification on the database produced by the above step, but that produced an empty file. I notice that in the tmp dir, the abundance_tuples.tsv file is empty:

$ ls tmp/talon -lh
total 1.1G
-rw-r--r-- 1 koneill pog    0 May 11 09:19 abundance_tuples.tsv
-rw-r--r-- 1 koneill pog  52M May 11 17:05 edge_tuples.tsv
-rw-r--r-- 1 koneill pog  38M May 11 17:04 exon_annot_tuples.tsv
-rw-r--r-- 1 koneill pog 4.4M May 11 17:04 gene_annot_tuples.tsv
-rw-r--r-- 1 koneill pog 639K May 11 17:04 gene_tuples.tsv
-rw-r--r-- 1 koneill pog  21M May 11 17:05 location_tuples.tsv
-rw-r--r-- 1 koneill pog 538M May 11 17:04 observed_transcript_tuples.tsv
-rw-r--r-- 1 koneill pog 201M May 11 17:04 transcript_annot_tuples.tsv
-rw-r--r-- 1 koneill pog 162M May 11 17:05 transcript_tuples.tsv
-rw-r--r-- 1 koneill pog  28M May 11 17:05 vertex_2_gene_tuples.tsv

oneillkza avatar May 13 '21 02:05 oneillkza

Hmm, this very much makes me think that there is a problem with the queries produced as part of these steps. I'll take a look at your data and get back to you!

Edit: Or perhaps I was incorrect about the database finishing updating correctly

fairliereese avatar May 13 '21 03:05 fairliereese

I was unable to reproduce your bug with the most recent commit to master.

I ran the following code:

talon_initialize_database \
	--f gencode.v29.annotation.gtf \
	--g hg38 \
	--a gencode_v29 \
	--o talon

echo dummy,dummy,dummy,uhr_min.sam > config.csv

talon \
	--f config.csv \
	--db talon.db \
	--build hg38 \
	--o talon

talon_abundance \
	--db talon.db \
	-a gencode_v29 \
	-b hg38 \
	--o talon 

I would advise trying to pull the latest commits and installing using pip in the cloned directory, and seeing if that fixes your issue. If not, I'd recommend sending me the broken database so I can see if there's anything glaringly wrong there that we can track down.

The abundance_tuples.tsv empty file is holdover from a previous generation of the code as far as I aware, I have run successful TALON jobs that results in an empty version of that file many times!

fairliereese avatar May 13 '21 03:05 fairliereese

Thanks -- that's good to know about the empty abundance_tuples.tsv.

I'll see if I can get the latest version of the code up and running. I was running it through the Docker container. I should be able to set up a Conda environment and do a pip install inside that.

oneillkza avatar May 14 '21 00:05 oneillkza

@fairliereese so it looks like that does run OK.

One other thing is that I was previously running the branch from #75 with the option of specifying the tmp_dir, and that's the one that failed. I'm realising now that #75 actually hasn't been merged (so the functionality is not present in the development branch). I'm also noticing that there isn't any integration testing on TALON, so it's hard to know whether the pull request broke things. I wonder if #75 caused this error?

Given how complex TALON now is, it would make a lot of sense to set up integration testing (eg with Travis or GitHub actions). That hooks into GitHub, and would tell you whether a commit or PR broke functionality. It could be run on the example data bundled with TALON.

https://lab.github.com/githubtraining/github-actions:-continuous-integration https://docs.travis-ci.com/user/tutorial/

(Sorry if this is stuff you already know!)

oneillkza avatar May 14 '21 16:05 oneillkza

Thanks for the tips! I have not had time to review the pull request and therefore haven't merged it in. We used to have Travis CI set up but some dependency broke it at some point in a way we just seemed to be unable to fix. I will look into setting it up again though.

fairliereese avatar May 14 '21 17:05 fairliereese