hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Read error correction does not reduce the number of kmers present once, twice or three times

Open chklopp opened this issue 1 month ago • 0 comments

I try to assemble herro error corrected reads with hifiasm 0.19.8

But the number of kmers seen a low number of times does not decrease as expected

Initial histogram in the log

[M::ha_hist_line]     2: ****************************************************************************************************> 107716610
[M::ha_hist_line]     3: ****************************************************************************************************> 30996780
[M::ha_hist_line]     4: ************************************************************** 15593515
[M::ha_hist_line]     5: ********************************************* 11280672
[M::ha_hist_line]     6: ****************************************** 10378745
[M::ha_hist_line]     7: ******************************************** 10969244
[M::ha_hist_line]     8: ************************************************** 12440832
[M::ha_hist_line]     9: ********************************************************* 14313356
[M::ha_hist_line]    10: ****************************************************************** 16548878
[M::ha_hist_line]    11: *************************************************************************** 18834001
[M::ha_hist_line]    12: ************************************************************************************ 20983530
[M::ha_hist_line]    13: ******************************************************************************************* 22728212
[M::ha_hist_line]    14: ************************************************************************************************ 24067655
[M::ha_hist_line]    15: **************************************************************************************************** 24853609
[M::ha_hist_line]    16: **************************************************************************************************** 24957299
[M::ha_hist_line]    17: *************************************************************************************************** 24619025
[M::ha_hist_line]    18: *********************************************************************************************** 23715443
[M::ha_hist_line]    19: ****************************************************************************************** 22573874
[M::ha_hist_line]    20: ************************************************************************************* 21273382
[M::ha_hist_line]    21: ******************************************************************************** 19908321
[M::ha_hist_line]    22: *************************************************************************** 18789819
[M::ha_hist_line]    23: ************************************************************************ 18034450
[M::ha_hist_line]    24: ********************************************************************** 17581256
[M::ha_hist_line]    25: ********************************************************************** 17553086
[M::ha_hist_line]    26: ************************************************************************ 17887681
[M::ha_hist_line]    27: ************************************************************************** 18591686
[M::ha_hist_line]    28: ****************************************************************************** 19396013
[M::ha_hist_line]    29: ********************************************************************************* 20301932
[M::ha_hist_line]    30: ************************************************************************************ 21087688
[M::ha_hist_line]    31: *************************************************************************************** 21817734
[M::ha_hist_line]    32: ***************************************************************************************** 22298434
[M::ha_hist_line]    33: ****************************************************************************************** 22557746
[M::ha_hist_line]    34: ****************************************************************************************** 22440107
[M::ha_hist_line]    35: ***************************************************************************************** 22130525
[M::ha_hist_line]    36: ************************************************************************************** 21459325
[M::ha_hist_line]    37: ********************************************************************************** 20509144
[M::ha_hist_line]    38: ****************************************************************************** 19399337
[M::ha_hist_line]    39: ************************************************************************ 17962454
[M::ha_hist_line]    40: ***************************************************************** 16334878
[M::ha_hist_line]    41: *********************************************************** 14619679
[M::ha_hist_line]    42: *************************************************** 12793736
[M::ha_hist_line]    43: ******************************************** 11035127
[M::ha_hist_line]    44: ************************************** 9361763
[M::ha_hist_line]    45: ******************************* 7757802
[M::ha_hist_line]    46: ************************** 6387930
[M::ha_hist_line]    47: ********************* 5197173
[M::ha_hist_line]    48: ***************** 4122166

2nd histogram

M::ha_hist_line]     1: ****************************************************************************************************> 79362122
[M::ha_hist_line]     2: ****************************************************************************************************> 5824705
[M::ha_hist_line]     3: ****************************************************************************************************> 1842811
[M::ha_hist_line]     4: **************************************************************************************** 934285
[M::ha_hist_line]     5: ************************************************************* 646881
[M::ha_hist_line]     6: ***************************************************** 561993
[M::ha_hist_line]     7: ***************************************************** 563891
[M::ha_hist_line]     8: ********************************************************** 610696
[M::ha_hist_line]     9: **************************************************************** 67923

Third histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 65809002
[M::ha_hist_line]     2: ****************************************************************************************************> 4505333
[M::ha_hist_line]     3: ****************************************************************************************************> 1459641
[M::ha_hist_line]     4: *************************************************************************** 774497
[M::ha_hist_line]     5: ******************************************************* 566043
[M::ha_hist_line]     6: ************************************************* 508811
[M::ha_hist_line]     7: *************************************************** 523826
[M::ha_hist_line]     8: ******************************************************** 572254

Fourth histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 56509704
[M::ha_hist_line]     2: ****************************************************************************************************> 3725834
[M::ha_hist_line]     3: ****************************************************************************************************> 1243762
[M::ha_hist_line]     4: ******************************************************************** 688236
[M::ha_hist_line]     5: **************************************************** 520982
[M::ha_hist_line]     6: *********************************************** 479553
[M::ha_hist_line]     7: ************************************************** 500632
[M::ha_hist_line]     8: ****************************************************** 549716

Fifth histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 50554387
[M::ha_hist_line]     2: ****************************************************************************************************> 3283946
[M::ha_hist_line]     3: ****************************************************************************************************> 1127797
[M::ha_hist_line]     4: **************************************************************** 642271
[M::ha_hist_line]     5: ************************************************** 496973
[M::ha_hist_line]     6: ********************************************** 463586
[M::ha_hist_line]     7: ************************************************* 486543
[M::ha_hist_line]     8: ****************************************************** 534726

The resulting assemlby metrics are low = small split assembly. The coverage given in the gfa files are very low.

With hifi reads the last histogram only has very few kmer seens once left. What parameter could I tweak to improve this?

chklopp avatar May 21 '24 11:05 chklopp