pegas
pegas copied to clipboard
Duplicated Sequence for MJN
Excuse me, Mr. Emmanuel. I am trying to your function mjn() but it always tells me “maybe there are duplicated sequences in your data”. My data is a set of 108 sequences; some of which are identical to each other. Does this mean that I have to make each haplotype have only one sequence? Or is there something I miss? Thank you in advance.
Hello,
Try the function haplotype()
(in pegas too) on your 108 sequences. Once you did it, you can check all sequences are distinct with:
h <- haplotype(<<you sequence data name>>)
all(dist.dna(h, "n") > 0)
If the result is TRUE
, you should be able to use mjn(h)
.
Best,
E.
It came out as “FALSE”. I have already used “strict = TRUE” in the haplotype() function. By the way, some haplotypes are different through only deletions. Is this also why mjn() didn’t work?
I suggest you try:
all(dist.dna(h, "n", pairwise.deletion = TRUE) > 0)
And also:
image(h)
It seems you read the help page ?haplotype
so that you understand that trailing/leading gaps are a problem when identifying haplotypes. The above commands should help you to assess the situation with your data.
I have tried all() and the result came up as “FALSE”, unfortunately. I have also tried image() and there are several gaps and degenerate bases (namely N and Y). I also tried trailingGapAsN = TRUE, and the all() still came up as “FALSE”. Any advice? I’m so sorry for wasting your time.
It seems you have a very difficult data set, so your inferences will be necessarily limited.
Roger that. I have dived into the data and found out that all pairs of haplotypes that have dist.dna = 0 differ only in either deletion or insertion. I really wish that you might try giving us an option to use mjn() despite the deletion, because the haplotype() function can separate them nicely.
Have a look at this function in ape: DNAbin2indel
. With it, you can then create a binary matrix indicating presence/absence of indels. pegas::mjn()
can also analyse binary (0/1) data.
I have tried a subset of my data with only sixteen sequences. I have checked the all(dist.dna()) function. If I used pairwise.deletion = TRUE, the all() function came up as TRUE. If I used pairwise.deletion = FALSE, the all() function came up as FALSE. When I used mjn(), it said “duplicate” again. Is there anything I can do?
By the way, can mjn() consider both base difference and deletion/insertion at the same time? I have tried DNAbin2indel() and the matrix I got is concerned only on deletion/insertion? Can I mix this matrix with base difference matrix and make it go through mjn()?
Two other functions from ape that could help you with your data: latag2n()
and solveAmbiguousBases()
(maybe you already found them in the meantime).
You can try rmst()
(also in pegas): it requires distances (unlike mjn()
). It's not the same algorithm of course but it can sometimes give the same network (see the last example in ?rmst
).
Thank you so much for your response. I am now using rmst(). The only problem I have is that it doesn't generate median vectors. I will try to tackle with more sequences in the future. By the way, I would love to try mjn() that considers both base difference and base deletion/insertion at the same time. That would be revolutionary!
By the way, I would love to try mjn() that considers both base difference and base deletion/insertion at the same time. That would be revolutionary!
There is the difficulty that indels within or on the head/tail of sequences should be treated differently. Another difficulty is that indels may also have substitutions (either before the deletion, or after the insertion). This could affect the time-reversibility of the model but I'm not sure how this is critical to the median-vectors. The revolution has to wait a bit!