seqkit icon indicating copy to clipboard operation
seqkit copied to clipboard

Add sorting by average quality to `seqkit sort`

Open vmikk opened this issue 1 year ago • 1 comments

Hello!

This PR adds a new feature to the seqkit sort command, allowing sequences to be sorted by their average Phred quality. The following options have been introduced:

  • -q or --by-avg-qual
  • --qual-ascii-base

With the implementation of quality estimation, I tried to follow the approach used in the fx2tab command.

However, the code currently won't work because the StringCount type (which was used for storing sequence length) cannot be used for storing average quality. Therefore, a new type (e.g., StringFloat), probably should be added to util/stringutil.

Could you please confirm that this is a useful feature and, if so, review the code?

PS. For the short ASCII base flag, -b is already occupied by --by-bases option, so it is not consistent with -b in fx2tab.

vmikk avatar Jul 31 '24 13:07 vmikk

It looks good to me, thank you Vladimir! I'll handle the util package and merge this later.

shenwei356 avatar Jul 31 '24 15:07 shenwei356