nanocomp Got negative values on weighted histogram plot

Hello,

I am using NanoComp v1.23.1 and got a weird plot after filtering my input fastq files (see attached image). Screenshot 2023-09-22 163016

When I did the same command on input fastq which were not filtered, I got normal plots. But after filtering my fastq files to only keep 1-27kb reads, I now get negative values in the weighted plots. Is this "normal"?

Can you also explain the difference between weighted and normalized?

best, S

Sep 22 '23 23:09 najohink

I forgot to add the photo of the unfiltered fastq output plot:

Screenshot 2023-09-22 164308

Sep 22 '23 23:09 najohink

I am very confused and will need to think about this.

Sep 26 '23 17:09 wdecoster

I filtered my dataset with FiltLong before running NanoComp and getting the weird result.

In the meantime, I figured out how to do what I wanted by running this:

df3 = pickle.load(open('barcode03_1-27kb_NanoComp-data.pickle', 'rb'))

bins = numpy.arange(0, 30000, 500)
h3 = numpy.histogram(df3['lengths'], bins=bins)

plt.bar(h3[1][:-1], height = h3[0], width=450)

xdata3 = (h3[1][:-1] + h3[1][1:])/2
ydata3 = xdata3 * h3[0]
plt.bar(xdata3, ydata3, width=450)
ydata3[xdata3 > 25000].sum() / ydata3.sum()

I was interested in knowing what percent of the total bases my full length sequence was. So I wanted to divide the 26kb bases by the total number of bases, but wanted to also keep out the weird long stuff from the dataset, hence filtering with FiltLong.

Sep 26 '23 17:09 najohink

Does the plot without weighted look normal? I will explain later what those mean when I'm at the computer...

Sep 26 '23 18:09 wdecoster

Yes, the others look normal. Only the two weighted plots have negative values.

Sep 26 '23 18:09 najohink

So normalized plots mean that every dataset in the plot adds up to "1" - so datasets with significant differences in yield can still be compared on length. Without normalization, just the number of reads is used. And weighted means that instead of the number of reads per bin, the number of bases per bin is used (as is also the case in the minKNOW interface). As such, a read of 25000 bases in the bin of 24000-26000 will increase the count on the y-axis for 25000 rather than just 1.

Sep 27 '23 06:09 wdecoster

Do you think it would be possible to share the data that caused this?

Sep 28 '23 10:09 wdecoster

nanocomp nanocomp copied to clipboard

Got negative values on weighted histogram plot

nanocomp
nanocomp copied to clipboard