Hydra icon indicating copy to clipboard operation
Hydra copied to clipboard

bam.routed file is empty

Open MaestSi opened this issue 7 years ago • 12 comments

Dear Hydra developers, after making hydra configuration (make_hydra_config.py) and extracting discordants for sample0 (extract_discordants.py), I run command for routing all samples into hydra router (hydra-router). This command doesn't give any error, however, I noticed that output file bam.routed is empty. After that, I run commands for combining hydra assembly files (assemble-routed-files.sh) and for merging results (combine-assembled-files.sh). When forceOneClusterPerPairMem.py script for starting hydra clustering is invoked, however, it gives the following error: call error: Traceback (most recent call last): File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 498, in main() File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 483, in main clusterSupport = computeSupportForEachCluster(opts.master, opts.maxDist) File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 120, in computeSupportForEachCluster for line in open(clusterFile, 'r'): IOError: [Errno 2] No such file or directory: '/mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/all.assembled'

I can confirm that file all.assembled has not been created. How could I solve this issue? Thanks in advance.

MaestSi avatar Apr 14 '18 19:04 MaestSi

I am not exactly sure what is the first set of empty files that aren't being written by your description. It seems like it is the router step, but I am more likely to believe it is failing at the extracting discordants step. Files won't get routed or assembled properly if the proceeding files are not written, empty, truncated, or contain some kind of catastrophic failure. Do all of the discordant bams get written? Does routed-files.txt get written? Do any of the discordant cluster files get written?

I think there have been changes to samtools in the past few years that could be causing this depending on what version you have of samtools, but this would be a problem at discordant extraction. If you could tell me what exact step is failing and what fails to write, then I might be able to give you a better answer.

ml4wc avatar Apr 14 '18 19:04 ml4wc

I have the following non-empty files:

  • bam.stub
  • bam.stub.config
  • start_sorted.bam.bedpe
  • a very large set of chr#1.chr#2.(+/-).(+/-); I think you mean these are all the discordant bams That's it, I have no other files. So, I don't think neither routed-files.txt nor any of the discordant cluster files are being written.

MaestSi avatar Apr 15 '18 07:04 MaestSi

Okay thanks. That is helpful. Did you set the ulimit to over 16000 or 16384? This controls the number of file handles that can be open simultaneously. How many of those chr chr +- files are written?

Have you tried running the routing step independently? What happens?

ml4wc avatar Apr 15 '18 14:04 ml4wc

1020 chr chr +- files are written in total, some of them are empty but the majority is not. Yes, I already set ulimit -f to 16384 and tried running the routing step independently. In particular these are the command and the messages printed to screen:

/mnt/cifs01/simone/software/SVE/src/hydra/bin/hydra-router -config /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.stub.config -routedList /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.routed

Parameters: Configuration file (-config): /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.stub.config Routed file list (-routedList): /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.routed

Processing: Routing discordant mappings to master chrom/chrom/strand/strand files. Found /mnt/cifs01/simone/NA12878/start_sorted.bam.bedpe Routing mappings from: /mnt/cifs01/simone/NA12878/start_sorted.bam.bedpe...Time elapsed: 54 sec

MaestSi avatar Apr 15 '18 22:04 MaestSi

Are you running this on a single dataset? I think there should be many bedpes.

ml4wc avatar Apr 16 '18 00:04 ml4wc

Yes, I'm running Hydra on a single sample. I had only one ~1GB bedpe file.

MaestSi avatar Apr 16 '18 09:04 MaestSi

I don't remember if hydra-multi works on a single sample, I would recommend using Lumpy https://github.com/arq5x/lumpy-sv if you are using a single sample

ml4wc avatar Apr 16 '18 13:04 ml4wc

I am running them both, together with other 5 softwares (potentially) in the framework of SVE (https://github.com/TheJacksonLaboratory/SVE). If you tell me that is doesn't work with a single sample that's fine. Thanks.

MaestSi avatar Apr 16 '18 14:04 MaestSi

My intuition is that it should still work on a single sample, while not the intended use — so I am really not sure. Lumpy should really be used instead on a single sample. It might still be a problem with ulimit try -n rather than -f, sorry I can’t be of more help.

ml4wc avatar Apr 16 '18 14:04 ml4wc

I think it will fail silently on a chr/chr/strand/strand encountered in the routed list that didn't get written which is why I think that it is a ulimit problem... not sure.

ml4wc avatar Apr 16 '18 15:04 ml4wc

Just one more question. Does Hydra support hg38 reference? Because the BAM file I'm using has been obtained mapping fastq reads to hg38. Could that be the reason for the issue I am facing? Thank you

MaestSi avatar Apr 18 '18 07:04 MaestSi

It should and there isn’t any reason why it wouldn’t. So long as all of the samples are aligned to the same genome.

ml4wc avatar Apr 18 '18 10:04 ml4wc