TOGA icon indicating copy to clipboard operation
TOGA copied to clipboard

twobit sizes do not match

Open lpnunez opened this issue 9 months ago • 10 comments

Hello,

I am trying to run TOGA to transfer annotations over from my well-annotated reference to several different query genomes. However, while trying to run TOGA on certain genomes I would run into the following error:

Found 365 sequences in /home/lnunez/mendel-nas1/WGS/Cactus/Outputs/Diss/twobit/Thamnophis_elegans.2bit Error! 2bit file: /home/lnunez/mendel-nas1/WGS/Cactus/Outputs/Diss/twobit/Thamnophis_elegans.2bit; chain_file: /home/lnunez/mendel-nas1/WGS/TOGA/Dissertation/Natrix_natrix/CM020096/temp/genome_alignment.chain Chromosome: WNA01000062.1; Sizes don't match! Size in twobit: 7 1766; size in chain: 71276 Traceback (most recent call last): File "/home/lnunez/mendel-nas1/TOGA/toga.py", line 1600, in main() File "/home/lnunez/mendel-nas1/TOGA/toga.py", line 1595, in main toga_manager = Toga(args) File "/home/lnunez/mendel-nas1/TOGA/toga.py", line 261, in init self.__check_param_files() File "/home/lnunez/mendel-nas1/TOGA/toga.py", line 338, in __check_param_files TogaSanityChecker.check_2bit_file_completeness(self.t_2bit, t_chrom_to_size, self.chain_file) File "/mendel-nas1/lnunez/TOGA/modules/toga_sanity_checks.py", line 105, in check_2bit_file_completeness raise ValueError(err) ValueError: Error! 2bit file: /home/lnunez/mendel-nas1/WGS/Cactus/Outputs/Diss/twobit/Thamnophis_elegans.2bit; chain_file: /home/lnunez/mendel-nas1/WGS/TOGA/Dissertation/Natrix_natrix/CM020096/temp/genome_alignment.chain Chromosome: WNA01000062.1; Sizes don't match! Size in twobit: 71766; size in chain: 71276

WNA01000062.1 refers to an unplaced scaffold in the reference, of which there are 347 of them. However, I am only interested in looking at the actual reference chromosomes, of which there are 18. At first, I used the --limit_to_ref_chrom option to limit the runs to these specific chromosomes, like so:

./toga.py "${path_to_chain}"/"${genome}.chain.gz" ${path_to_bed} "${path_to_2bit}"/"${ref}.2bit" ${path_to_2bit}"/"${genome}.2bit" --limit_to_ref_chrom ${chromosome} --kt --pn /home/lnunez/mendel-nas1/WGS/TOGA/Dissertation/"${genome}"/"${chromosome}" --nc ${path_to_nextflow_config_dir} --cb 10,100 --cjn 500

However, I still get the same error, despite noting to limit it to the chromosome. Is there a way to bypass this particular step that I am not seeing? I am in a time crunch, so I would greatly prefer it if I did not have to regenerate the input files from the start.

lpnunez avatar May 17 '24 18:05 lpnunez