Cameron Kroll

Results 12 comments of Cameron Kroll

I've solved this by having open and closed term blocks. I'll have a rule where there's a nterm `OPEN_BLOCK` with the rules `OPEN_BLOCK(OPEN_BLOCK, GENERIC_TERM)` and `OPEN_BLOCK(OPENING_TERM, GENERIC_TERM)`. In this case,...

Oh yeah, and you'd close `OPEN_BLOCK` by having the nterm `BLOCK` with the rule `BLOCK(OPEN_BLOCK, CLOSING_TERM)`

I'm just throwing out ideas here, but what if you design a hash that takes into account the history of the sequence, but with less and less importance the further...

That was my bad - I forgot to include the masking. You'd obviously need to mask it so that the least significant bit disappears. I'm actually testing this right now...

Okay I tested it and it does converge back to the same hash after a certain number of characters. Here's my test script - converged after an average of 32...

`hash_string` is in another file and is just a dummy hash function: ```python def hash_string(s): hashes = [hash(s[0])] for n in range(1, len(s)): hashes.append((hash(s[n]) + ((hashes[n - 1] >> 1)...

After moving to a 64-bit hash, I get an average convergence in 64 chars, maximum in 90. It seems that for a hash of `k` bits, this approach converges back...

Would that solve the issue of distinguishing between sequences that don't have unique k-mers, though?

Hey - sorry for the delay, I was off on reading week and focusing on seeing family. It sounds like a CUDA implementation of megamash would still be useful, so...

That's actually fair. Note to self: add support for * in an amino sequence