TADbit icon indicating copy to clipboard operation
TADbit copied to clipboard

Restriction with Arima kit

Open jeff-godwin opened this issue 10 months ago • 10 comments

Dear team,

I would like to clarify if the --renz parameter in tadbit map supports the restriction site for the ARIMA kit (https://arimagenomics.com/products/genome-wide-hic/) - ^GATC,G^ANTC. If so, what is the string I need to use for the --renz parameter?

Many thanks, Jeff

jeff-godwin avatar Apr 19 '24 09:04 jeff-godwin

Hi Jeff, yes, this would be the enzymes DpnII (or MboI) and HinfI. So you can just input --renz MboI Hinfl. Let us know how this works.

cheers

fransua avatar Apr 19 '24 09:04 fransua

Hi @fransua,

Thank you for the reply.

I ran tadbit map with --renz MboI Hinfl and --renz MboI DpnII HinfI and it produced the following error -

Writing log to /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out/process.log Writing versions of TADbit and dependencies Generating Hi-C QC plot Traceback (most recent call last): File "/home/jgodwin/miniconda3/envs/tadbit_add/bin/tadbit", line 170, in main(sys.argv) File "/home/jgodwin/miniconda3/envs/tadbit_add/bin/tadbit", line 167, in main args.func(args) File "/home/jgodwin/miniconda3/envs/tadbit_add/lib/python3.7/site-packages/pytadbit/tools/tadbit_map.py", line 69, in run savefig=fig_path) File "/home/jgodwin/miniconda3/envs/tadbit_add/lib/python3.7/site-packages/pytadbit/utils/fastq_utils.py", line 314, in quality_plot liges[k][max_seq_len // 2]) IndexError: list index out of range

The mapping runs fine with just --renz DpnII but the addition of HinfI produces the above error. How do I troubleshoot this?

Thanks

jeff-godwin avatar Apr 19 '24 09:04 jeff-godwin

It appears the "HinfI" enzyme (STRING) is the issue. Any other combinations of enzymes proceeds normally. Running with just --renz HinfI produces the same error as above Curiously it does appear in the list of available enzymes yet somehow is not identified by the fastq_utils.

jeff-godwin avatar Apr 19 '24 11:04 jeff-godwin

this is indeed unexpected... I had a look to it, with the current version of TADbit in this repo. Seems to work fine with me (in the mapping stage at least). image

but I am perhaps missing something, can you share your command line?

fransua avatar Apr 19 '24 13:04 fransua

My TADbit version is v1.1

I'm attaching my bash script for your reference -

fastq_dir="/home/jgodwin/Data/Hi-C/RK_data/1_Raw_data" workdir="/home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out" yeast_ref="/home/jgodwin/Data/Hi-C/Yeast_ref"

tadbit map -w ${workdir} --fastq ${fastq_dir}/WT_L1_1.dsrc --read 1 --index ${yeast_ref}/sacCer3.gem --renz DpnII HinfI --genome ${yeast_ref}/sacCer3.fa -C 3 tadbit map -w ${workdir} --fastq ${fastq_dir}/WT_L1_2.dsrc --read 2 --index ${yeast_ref}/sacCer3.gem --renz DpnII HinfI --genome ${yeast_ref}/sacCer3.fa -C 3

I have tried with --renz MboI HinfI as well.

jeff-godwin avatar Apr 19 '24 13:04 jeff-godwin

ok, I have tried with Saccharomyces reference genome and the QC step is generated properly: image (uggly because using mock data)

but it's not the genome either... can you try to reinstall TADbit? Perhaps it is something that has been fixed already.

(Note: MboI and DpnII are the same internally, it should not matter if you use one or the other... but perhaps you could try MboI and HindIII)

fransua avatar Apr 19 '24 14:04 fransua

Unfortunately the problem exists after re-installation too.

Running with MboI and HindIII proceeds without a problem. Attaching the log here -

Writing log to /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out/process.log Writing versions of TADbit and dependencies Generating Hi-C QC plot

  • Dangling-ends (sensu-stricto): 10.191%
  • Dangling-ends (sensu-stricto): 0.138%
  • Ligation sites: 3.069%
  • Ligation sites: 0.011%
  • Ligation sites: 0.006%
  • Ligation sites: 0.000% mapping /home/jgodwin/Data/Hi-C/RK_data/1_Raw_data/WT_L1_1.dsrc read 1 to /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out Preparing FASTQ file
  • conversion to MAP format Mapping reads in window 1-end_5f7b312935... TO GEM 3 /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y /home/jgodwin/miniconda3/envs/tadbit-3d/bin/gem-mapper -I /home/jgodwin/Data/Hi-C/Yeast_ref/sacCer3.gem -t 3 -F SAM -o /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y_full_1-end_5f7b312935.map -i /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y Parsing result... x removing GEM input /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y x removing map /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y_full_1-end_5f7b312935.map
  • splitting into restriction enzyme (RE) fragments using ligation sites
  • ligation sites are replaced by RE sites to match the reference genome
    • enzymes: MboI & MboI, ligation site: GATCGATC, RE site: GATC & GATC
    • enzymes: MboI & HindIII, ligation site: GATCAGCTT, RE site: GATC & AAGCTT
    • enzymes: HindIII & MboI, ligation site: AAGCTGATC, RE site: AAGCTT & GATC
    • enzymes: HindIII & HindIII, ligation site: AAGCTAGCTT, RE site: AAGCTT & AAGCTT Preparing MAP file x removing pre-GEM input /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y_filt_1-end_5f7b312935.map Mapping fragments of remaining reads... TO GEM 3 /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt /home/jgodwin/miniconda3/envs/tadbit-3d/bin/gem-mapper -I /home/jgodwin/Data/Hi-C/Yeast_ref/sacCer3.gem -t 3 -F SAM -o /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt_frag_1-end_5f7b312935.map -i /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt Parsing result... x removing GEM input /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt x removing failed to map /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y_fail_5f7b312935.map x removing tmp mapped /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt_frag_1-end_5f7b312935.map ,---------------. | MAPPED_INPUTs | ,----.--------.------------.------.------.------.--------------.-----------------------------.---------------------------------------------------------------------------------.----------.-----------------.---------. | Id | PATHid | Entries | Trim | Frag | Read | Enzyme | Dangling_Ends | Ligation_Sites | WRKDIRid | MAPPED_OUTPUTid | INDEXid | |----+--------+------------+------+------+------+--------------+-----------------------------+---------------------------------------------------------------------------------+----------+-----------------+---------| | 1 | 2 | 23,709,683 | None | full | 1 | MboI-HindIII | MboI:10.191% HindIII:0.138% | MboI-MboI:3.069% MboI-HindIII:0.011% HindIII-MboI:0.006% HindIII-HindIII:0.000% | 1 | 5 | 3 | | 2 | 2 | 6,487,346 | None | frag | 1 | MboI-HindIII | MboI:10.191% HindIII:0.138% | MboI-MboI:3.069% MboI-HindIII:0.011% HindIII-MboI:0.006% HindIII-HindIII:0.000% | 1 | 6 | 3 | '----^--------^------------^------^------^------^--------------^-----------------------------^---------------------------------------------------------------------------------^----------^-----------------^---------' ,-------. | PATHs | ,----.-------.------------------------------------------------.--------------. | Id | JOBid | Path | Type | |----+-------+------------------------------------------------+--------------| | 1 | 1 | /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out | WORKDIR | | 2 | 1 | ../1_Raw_data/WT_L1_1.dsrc | MAPPED_FASTQ | | 3 | 1 | ../../Yeast_ref/sacCer3.gem | INDEX | | 4 | 1 | WT_L1_1.dsrc_MboI-HindIII_5f7b312935.png | FIGURE | | 5 | 1 | 01_mapped_r1/WT_L1_1_full_1-end_5f7b312935.map | SAM/MAP | | 6 | 1 | 01_mapped_r1/WT_L1_1_frag_1-end_5f7b312935.map | SAM/MAP | '----^-------^------------------------------------------------^--------------' ,------. | JOBs | ,----.------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.---------------------.---------------------.------.----------------. | Id | Parameters | Launch_time | Finish_time | Type | Parameters_md5 | |----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+------+----------------| | 1 | skip_mapping:0 genome:[/home/jgodwin/Data/Hi-C/Yeast_ref/sacCer3.fa] read:1 chr_name:[] noX:0 fast_fragment:0 cpus:3 mapper:gem mapper_binary:/home/jgodwin/miniconda3/envs/tadbit-3d/bin/gem-mapper mapper_param:{} gem_version:3 | 19/04/2024 17:50:30 | 19/04/2024 18:00:13 | Map | 5f7b312935 | '----^------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^---------------------^---------------------^------^----------------' ,---------------. | MAPPED_INPUTs | ,----.--------.------------.------.------.------.--------------.-----------------------------.---------------------------------------------------------------------------------.----------.-----------------.---------. | Id | PATHid | Entries | Trim | Frag | Read | Enzyme | Dangling_Ends | Ligation_Sites | WRKDIRid | MAPPED_OUTPUTid | INDEXid | |----+--------+------------+------+------+------+--------------+-----------------------------+---------------------------------------------------------------------------------+----------+-----------------+---------| | 1 | 2 | 23,709,683 | None | full | 1 | MboI-HindIII | MboI:10.191% HindIII:0.138% | MboI-MboI:3.069% MboI-HindIII:0.011% HindIII-MboI:0.006% HindIII-HindIII:0.000% | 1 | 5 | 3 | | 2 | 2 | 6,487,346 | None | frag | 1 | MboI-HindIII | MboI:10.191% HindIII:0.138% | MboI-MboI:3.069% MboI-HindIII:0.011% HindIII-MboI:0.006% HindIII-HindIII:0.000% | 1 | 6 | 3 | '----^--------^------------^------^------^------^--------------^-----------------------------^---------------------------------------------------------------------------------^----------^-----------------^---------' ,-------. | PATHs | ,----.-------.------------------------------------------------.--------------. | Id | JOBid | Path | Type | |----+-------+------------------------------------------------+--------------| | 1 | 1 | /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out | WORKDIR | | 2 | 1 | ../1_Raw_data/WT_L1_1.dsrc | MAPPED_FASTQ | | 3 | 1 | ../../Yeast_ref/sacCer3.gem | INDEX | | 4 | 1 | WT_L1_1.dsrc_MboI-HindIII_5f7b312935.png | FIGURE | | 5 | 1 | 01_mapped_r1/WT_L1_1_full_1-end_5f7b312935.map | SAM/MAP | | 6 | 1 | 01_mapped_r1/WT_L1_1_frag_1-end_5f7b312935.map | SAM/MAP | '----^-------^------------------------------------------------^--------------' ,------. | JOBs | ,----.------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.---------------------.---------------------.------.----------------. | Id | Parameters | Launch_time | Finish_time | Type | Parameters_md5 | |----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+------+----------------| | 1 | skip_mapping:0 genome:[/home/jgodwin/Data/Hi-C/Yeast_ref/sacCer3.fa] read:1 chr_name:[] noX:0 fast_fragment:0 cpus:3 mapper:gem mapper_binary:/home/jgodwin/miniconda3/envs/tadbit-3d/bin/gem-mapper mapper_param:{} gem_version:3 | 19/04/2024 17:50:30 | 19/04/2024 18:00:13 | Map | 5f7b312935 | '----^------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^---------------------^---------------------^------^----------------' cleaning temporary files

jeff-godwin avatar Apr 19 '24 16:04 jeff-godwin

It appears it's a problem with the reads. The mapping proceeds with the demo data used for the 3DARORC tutorial (SRR5077821) without a problem. I'm rechecking the reads although I do believe it was prepared with the ARIMA double digestion kit.

jeff-godwin avatar Apr 19 '24 17:04 jeff-godwin

Hi, yes, it works for me too with other dataset... I suspect it is either the reference genome, can you try with this index: https://drive.google.com/file/d/1vBqQ3CjlEDnVo9p5cIHt81aB5blFNSDD (guessing you are using gem v3).

or an heterogeneous read lengths, can you check that all your reads are the same size?

fransua avatar Apr 19 '24 19:04 fransua

Hi Francois,

The same issue shows up with the new index as well. My read lengths are the same (35bp - wonder if this is tripping something). I tried mapping with bowtie and hisat as well. The reads are from publicly available data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5918358) I was able to process them through the nextflow HiC pipeline (https://nf-co.re/hic/2.1.0/docs/usage) with the --digestion 'arima' parameter. I will let you know if anything else comes up.

Thanks again.

jeff-godwin avatar Apr 23 '24 08:04 jeff-godwin