ALLHiC icon indicating copy to clipboard operation
ALLHiC copied to clipboard

Omni-C

Open ardy20 opened this issue 4 years ago • 16 comments

Hello

We have done a phased assembly using HiFi data and Improved Phased Assembly (IPA) on a polyploid plant. Then, we did a Omni-C (Dovetail) for the same plant and we got the its short read libraries.

We tried to use AllHiC using Omni-C data but I got the following errors at PreprocessSAMs.pl srtage.

Could AllHiC be used for Omni-C data? Any parameter to change in the scripts?

(base) uq@fl002:.../uq/bwa/F-HiC> PreprocessSAMs.pl sample.bwa_aln.sam final.female.p_ctg.fasta MOBI Sun Feb 21 10:24:32 2021: PreprocessSAMs.pl: samtools view -bS sample.bwa_aln.sam -o sample.bwa_aln.bam /30days/uq/ALLHiC/scripts/PreprocessSAMs.pl sample.bwa_aln.sam final.female.p_ctg.fasta MOBI

Use of uninitialized value $RE_site in string eq at /30days/uq/ALLHiC/scripts/PreprocessSAMs.pl line 137. Use of uninitialized value $RE_site in concatenation (.) or string at /30days/uq/ALLHiC/scripts/PreprocessSAMs.pl line 155. Use of uninitialized value $RE_site in concatenation (.) or string at /30days/uq/ALLHiC/scripts/PreprocessSAMs.pl line 156. Sun Feb 21 10:28:47 2021: PreprocessSAMs.pl: make_bed_around_RE_site.pl final.female.p_ctg.fasta 500

make_bed_around_RE_site.pl

Find all occurrences of a motif in a genome. Make a 'POS' file listing these occurrences, and also a BED file representing the regions around these occurrences.

SYNTAX: make_bed_around_RE_site.pl fasta: A fasta file representing a genome (reference or draft assembly.) motif: A motif, typically a restriction site sequence (e.g., HindIII = AAGCTT, NcoI = CCATGG, Dpn1 = GATC). range: A number representing how many bp around the sequence to include. Recommend 500 based on Yaffe & Tanay, Nat. Genetics 2011.

OUTPUT FILES: .near_..bed .near_pos_of_.txt

Sun Feb 21 10:28:47 2021: PreprocessSAMs.pl: bedtools intersect -abam sample.bwa_aln.bam -b final.female.p_ctg.fasta.near_.500.bed > sample.bwa_aln.REduced.bam Error: Unable to open file final.female.p_ctg.fasta.near_.500.bed. Exiting. Sun Feb 21 10:28:48 2021: PreprocessSAMs.pl: samtools view -F12 sample.bwa_aln.REduced.bam -b -o sample.bwa_aln.REduced.paired_only.bam [main_samview] fail to read the header from "sample.bwa_aln.REduced.bam". Sun Feb 21 10:28:48 2021: PreprocessSAMs.pl: samtools flagstat sample.bwa_aln.REduced.paired_only.bam > sample.bwa_aln.REduced.paired_only.flagstat [E::hts_open_format] Failed to open file "sample.bwa_aln.REduced.paired_only.bam" : No such file or directory samtools flagstat: Cannot open input file "sample.bwa_aln.REduced.paired_only.bam": No such file or directory

ardy20 avatar Feb 21 '21 00:02 ardy20

Dovetail have a detailed instruction for Omni-C fastq preprocessing. From fastq to final valid pairs bam file, maybe feed the valid pairs bam into AllHiC (allhic prune)is better?

For DNAase type data, juicer choose skip the frag filtering step

## If DNAse-type experiment, no fragment maps
if [ "$site" == "none" ]
then
    nofrag=1;
fi

SALSA2 have similar process,see https://github.com/marbl/SALSA/issues/55.

But for ALLHiC_partition, can skip the -e enzyme_sites option? @tangerzhang

Just for curiousus, Is the IPA assembly have better quality than hifiasm? hifiasm have any problem in polyploid plant?

baozg avatar Feb 21 '21 06:02 baozg

Hi @ardy20 The error was caused by the typo of MBOI name. Your input is MOBI. Alternatively, you can simply the restriction sites: GATC. We have not tested Omni-C libraries. In theory, ALLHiC can be applied to various types of Hi-C libraries if there is known restriction sites. However, if Omni-C libraries do not record restriction sties, I am afraid that the current ALLHiC does not support this kind of libraries. @baozg -e enzyme_sites option is still requested at current stage.

tangerzhang avatar Feb 22 '21 02:02 tangerzhang

Hi All

Thanks a lot for guides and apology for the typo. I will try the suggestions and get back to you. Regarding the assembly with HiFi, we did not test Hifiasm because we were happy with IPA. However, we tested HiCanu and the assembly quality was much better. Especially, IPA creates phased assembly and provides primary and associated contigs. I am not sure if HiFiasm has the same capability. We combined our IPA HiFi with Omni-C and we got very high quality chromosome level assembly of Jojoba plant.

ardy20 avatar Feb 22 '21 02:02 ardy20

hifiasm usually have better quality on plant assembly and faster than hicanu and IPA. It can creates phased assembly in falcon-unzip style (primary + alternative) and trio.

baozg avatar Feb 22 '21 02:02 baozg

Thanks for the suggestion. We will test it soon.

ardy20 avatar Feb 22 '21 03:02 ardy20

In which file the following code should be changed?

For DNAase type data, juicer choose skip the frag filtering step

If DNAse-type experiment, no fragment maps

if [ "$site" == "none" ] then nofrag=1; fi

ardy20 avatar Feb 22 '21 05:02 ardy20

Just as a quick update that we tested hifiasm and found that the IPA creates significantly better assemblies with higher N50.

ardy20 avatar Mar 03 '21 05:03 ardy20

Dear ALLHiC Team

For Omni-C, the -e option is still required. What should we put for that?

ardy20 avatar Mar 08 '21 03:03 ardy20

Dear Sirs, is there any planned support for OmniC data in allHiC? thanks

diriano avatar Aug 24 '21 12:08 diriano

Dear Sirs, is there any planned support for OmniC data in allHiC? thanks

Hi @diriano I have not had a chance to test OmniC data. Could anyone share with me some sample data so that we can test ALLHiC on OmniC?

tangerzhang avatar Aug 25 '21 01:08 tangerzhang

Hi @tangerzhang , I think I can share some data with you. I have a diploid plant that is highly het. Would it be OK if I share the two versions (haplotypes) of a set of contigs that make a chromosome and the corresponding OmniC reads?

diriano avatar Aug 25 '21 03:08 diriano

Hi @diriano That would be great if you could share with me the contig assembly and OmniC reads. My gmail is tanger.zhang@gmail and google drive works for me. Thanks!

tangerzhang avatar Aug 25 '21 03:08 tangerzhang

@tangerzhang did you have a chance to check the data that I sent? Cheers

diriano avatar Nov 09 '21 05:11 diriano

Hi @tangerzhang, is there any update on supporting DNAse type HiC data (like OmniC)? I would also appreciate it very much. I think many people use it now for scafollding purposes, because of its higher/more even coverage.

jolbi avatar Nov 25 '22 16:11 jolbi

OmniC data support would be much appreciated.

markopetek avatar Dec 04 '22 18:12 markopetek

any news on this topic ?

amvarani avatar Jul 19 '23 13:07 amvarani