hulk icon indicating copy to clipboard operation
hulk copied to clipboard

Paired-ended Fastq Input files chosen (foward or reverse ended?), and why the score is not consistent?

Open Dtdavidgit opened this issue 5 years ago • 2 comments

Dear author,

First of all, I would like to thank you for providing this novel and interesting analysis tools, however, I do have some questions.

  1. Paired-ended fastq data I have read the paper and tutorial but it did not mention which one to choose (or I missed it), do you have any suggestion? I have tested both ended data , and the results are not completely the same. My guess is since the quality of forward ended is higher in general, so I used forward ended fastq file, or do you have other suggestions?

  2. The similarity matrix problem

I tested the software in my own mgs data, somehow I discovered that the matrix is like below:

For example A B C D 100 12 15 25 10 100 20 43 13 17 100 32 23 45 31 100

As my understanding, the value is pairwise distance similarity, so the row name should be A B C D as well, but we can see that the values in row and column do not exactly match, it is similar but not consistent. like in row A, A to B is 12, but in column A, A to B is 10

If you can help me to get through these issues, I will be much appreciated.

Thank you very much

Dtdavidgit avatar Feb 05 '20 10:02 Dtdavidgit

Sorry for the delay in getting back to you here. I'm trying to find the time to get back to working on this project.

  1. Paired end info isn't handled by the program. So provide all the reads at once.

  2. That does look wrong! Not sure why that has happened as you are right, it is a pairwise matrix. If you could let me know some details (e.g. distance metric) and even some test data, I can try and debug/fix it.

will-rowe avatar Nov 03 '20 10:11 will-rowe

I notice that all distance are integer, which is strange even with weighted Jaccard distance. can you please check what happened?

Jianshu

jianshu93 avatar Oct 07 '21 13:10 jianshu93