yahs
yahs copied to clipboard
Generate Hi-C contact maps error: [E::make_asm_dict_from_agp] sequence not found
Dear yahs developers,
I'm trying to scaffold a de novo assembly following your pipeline.
I'm at the step "Generate Hi-C contact maps" and trying to run the first command:
/software/yahs/juicer pre ST.yash.out.bin ST.yash.out_scaffolds_final.agp STpurged.fa.fai
[E::make_asm_dict_from_agp] sequence not found Segmentation fault
My file sizes: 1.3G ST.yash.out.bin 32K ST.yash.out_scaffolds_final.agp
STpurged.fa.fai is the index for the original de novo assembly that I want to scaffold.
What could be the issue?
Thank you Alex
I get a similar error when running the first and second step in the "Manual curation with juicebox".
./juicer pre $DIR/ST.yash.out.bin $DIR/ST.yash.out_scaffolds_final.agp $DIR/STpurged.fa.fai [E::make_asm_dict_from_agp] sequence ? not found Segmentation fault
I was able to run the first step using an independently installed juicer_tools_2.17.00.jar (from the juicer tools website). This created the five expected output files. But then when I run the second command (as above) I get the following error:
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2022-10-26T13:44:02,788] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) for primary task Using 10 CPU thread(s) for secondary task Not including fragment map Start preprocess Writing header Writing body .. Writing footer nBytesV5: 2484682 masterIndexPosition: 99651623
Finished preprocess
Calculating norms for zoom BP_2500000java.lang.NullPointerException: Cannot invoke "java.util.Iterator.hasNext()" because "this.currentIterator" is null
at juicebox.data.iterator.ListOfListIterator.hasNext(ListOfListIterator.java:44)
at juicebox.data.iterator.IteratorContainer.getNumberOfContactRecords(IteratorContainer.java:54)
at juicebox.data.iterator.ListOfListIteratorContainer.getIsThereEnoughMemoryForNormCalculation(ListOfListIteratorContainer.java:56)
at juicebox.tools.utils.norm.NormalizationCalculations.
Is there an issue with my juicer installation (in yahs or independent?), or have any of the output files not been created properly? Any insights appreciated!
Thank you Alex
Hello Alex,
It looks like there are some formatting errors in your input AGP or contig index file. Have you manually edited them?
The error says some sequences presented in the AGP file are not found in the contig index file. You can compare the two files (the 6th column in the AGP file and the 1st column in the index file) to see if this is the case. Also the ?
in sequence ? not found
makes me suspect it is related to encoding.
Best, Chenxi
If you do not mind, you can post the AGP and index file, then I can have a quick check. Chenxi
Hi Chenxi,
Thanks so much for the quick response! I haven't done any manual editing. The agp was output by yahs. I'm using the contig index file for the original de novo assembly. Is that correct? I've also tried to run with the ST.yash.out_scaffolds_final.fa.fai which is output by yahs.
I attach all the files here. I've added the .txt extension just for the purposes of attaching here. The real extension is .agp, and .fai.
Thank you Alex
STpurged.fa.fai.txt ST.yash.out_scaffolds_final.agp.txt ST.yash.out_scaffolds_final.fa.fai.txt
Hi Alex,
Your AGP file is somehow corrupted. You might accidentally rewrite it.
Chenxi
If you do not want to rerun scaffolding, you can simply run yahs -a ST.yash.out_r*_break.agp -r 500000000 STpurged.fa ST.yash.out.bin -o ST.yash.out_rerun
to regenerate the AGP file, where ST.yash.out_r*_break.agp
is the output AGP file in the last round (replace *
with the largest number). It sets the resolution parameter to a very large value (500Mb here). This fools yahs to make a final AGP without doing any actual scaffolding.
Forget to mention that I renamed the output file to avoid overwriting the existing files. The output should be ST.yash.out_rerun_scaffolds_final.agp
.
Ah, that explains all the problems. Thanks so much. I'll run this tonight and let you know if it's worked.
All the best Alex
Dear Chenxi,
I've rerun the entire pipeline because I think something went wrong with my initial yahs command. I have managed to work through to the end of "Generate HiC contact maps" without any errors. This creates a out.hic file which is 114M (I'm not sure what size to expect for a genome of ~450Mb?). However, I can't open this in juicebox - It just says "error loading .hic file".
Is there a way I can check if the commands have run correctly? There are no errors in the log files. I've attached the .hic.
Secondly, when I run the "Manual curation with Juicebox (JBAT)" commands, I get the same error as before. The first command runs without any error messages and produces the expected 5 output files. But the command to create the .hic file fails.
If I run it with juicer-tools: (java -jar /SAN/ugi/StalkieGenomics/software/juicer_tools_2.17.00.jar pre out_JBAT.txt out_JBAT.hic.part <(cat out_JBAT.log | grep PRE_C_SIZE | awk '{print $2" "$3}')) && (mv out_JBAT.hic.part out_JBAT.hic) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2022-10-27T11:41:10,638] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) for primary task Using 10 CPU thread(s) for secondary task Not including fragment map Start preprocess Writing header Writing body .. Writing footer nBytesV5: 2484682 masterIndexPosition: 99651623
Finished preprocess
Calculating norms for zoom BP_2500000java.lang.NullPointerException: Cannot invoke "java.util.Iterator.hasNext()" because "this.currentIterator" is null
at juicebox.data.iterator.ListOfListIterator.hasNext(ListOfListIterator.java:44)
at juicebox.data.iterator.IteratorContainer.getNumberOfContactRecords(IteratorContainer.java:54)
at juicebox.data.iterator.ListOfListIteratorContainer.getIsThereEnoughMemoryForNormCalculation(ListOfListIteratorContainer.java:56)
at juicebox.tools.utils.norm.NormalizationCalculations.
And if I run it with yahs' juicer pre: (/SAN/ugi/StalkieGenomics/software/yahs/juicer pre out_JBAT.txt out_JBAT.hic.part <(cat out_JBAT.log | grep PRE_C_SIZE | awk '{print $2" "$3}')) && (mv out_JBAT.hic.part out_JBAT.hic) [E::make_asm_dict_from_agp] sequence not found Segmentation fault
Does this mean that the files are still corrupted? Is it the agp file that's the issue? How can I check that this has run correctly before going through all these steps? Attached are all the files you might need.
Thank you for any help Alex
yahs.out_scaffolds_final.fa.fai.txt yahs.out_scaffolds_final.agp.txt
It looks like I can't upload the .hic as it's too big.
All the best Alex
Dear Chenxi,
I can open the .hic file on the web app for juicebox, just not the desktop version. I think I might need the JBAT files to open in the desktop version.
All the best Alex
Hi Alex,
I believe something went wrong, you should be able to open hic file without the JBAT files. I heard that the juice_tools version 2.x seems less stable, and do saw problems with manual curation using it. I am using version 1.9.9, you might give it a try.
In this command, (/SAN/ugi/StalkieGenomics/software/yahs/juicer pre out_JBAT.txt out_JBAT.hic.part <(cat out_JBAT.log | grep PRE_C_SIZE | awk '{print $2" "$3}')) && (mv out_JBAT.hic.part out_JBAT.hic)
, you should use juicer_tools_xxx pre
instead of yahs's juicer pre
(it is only used to generate the input file for jucer_tools_xxx) - sorry for the naming confusion.
Chenxi
Hi Chenxi,
Changing the juicer_tools to v. 1.9.9 worked! Thank you for all the help!
All the best Alex
That is great! For your information, for Juicebox, I am using version 1.x as well. Chenxi
I think what can happen when running these contact map steps independently is that if you accidentally run the Aiden Lab juicer_tools.*.jar
in place of the yahs-1.*/juicer
binary, the AGP initially produced by yahs
gets corrupted because the Aiden Lab juicer_tools.*.jar
writes its outfile to the filename in the middle position of the command. Then, every other attempt to get things to work goes south because the AGP file is corrupted.
For example, yahs-1.*/juicer
takes the arguments:
yahs-1.1/juicer pre <bin> <agp> <contigs fai>
all of which are existing files, while juicer_tools.*.jar
expects:
java -jar juicer_tools.*.jar pre <infile> <outfile> <genomeID>
So, juicer_tools.*.jar
is writing its outfile to the argument in the middle position of the program call. If you get the two mixed up, this corrupts the AGP file that was initially produced when running ``yahs-1./yahsto scaffold (it takes on a binary format), and every subsequent attempt to makes things work using the correct
yahs-1./juicer` dies with:
[E::make_asm_dict_from_agp] sequence not found
Segmentation fault
I know mostly because I initially made this mistake of using juicer_tools.*.jar
in place of yahs-1.*/juicer
. Then, once I corrected this to use yahs-1.*/juicer
, every attempt to do anything died of a segfault because of a corrupted AGP file.