PopPUNK
PopPUNK copied to clipboard
Lineage model fitting - PopPUNK changes
- Relies upon https://github.com/bacpop/pp-sketchlib/compare/master...flexible_lineages?expand=1
- All lineage models should now work from a single sparse matrix with kNN nearest neighbours (N.B. I have not checked whether nearest neighbours of the same distance are selected randomly, or if this is affected by input name order)
- The kNN of this matrix is determined by the
max_search_depth
option - The matrix used for clustering comes from reducing this matrix by counting neighbours/counting unique distances/reciprocal BLAST - always regenerated from main matrix, which is the only aspect updated with querying
- Added a script that generates consistent lineage databases for all strains in a non-lineage database - would be good to use this as an example workflow for beebop, views of @johnlees and @muppi1993 on how to store script/information needed for relating this databases appreciated!
- At the moment, GPU analyses crash on exit, as there appears to be a memory leak associated with
cugraph
- see https://github.com/rapidsai/raft/issues/740, https://github.com/rapidsai/rmm/pull/931 - hopefully fixed inrapids=22.12
(based on https://github.com/rapidsai/raft/commit/2325d2b4cad2faf0ef1bce976cb377eb25b4d81d), but22.10
is the latest version available on conda (https://anaconda.org/rapidsai/rapids - 16/10/22) - Tests run on all the lineage clustering options
- Will update documentation if changes are satisfactory
Validation on serotype 3 dataset:
- default lineages CPU: https://microreact.org/project/ixyy41yEJLr1HmoJECobxT-s3lineagescpu
- default lineages GPU: https://microreact.org/project/szuvahjuAnc7RQEtEc5mNK-s3lineagesgpu
- count unique distances CPU: https://microreact.org/project/t1wsVwjuBE2fa6Z63FmGvZ-s3countdistancescpu
- count unique distances GPU: https://microreact.org/project/hHGYFUcLfA2fS6fTe1Xio5-s3countdistancesgpu
- reciprocal matches CPU: https://microreact.org/project/7ydpd5v3xr6ewktB8aZ7eQ-s3reciprocalmatchescpu
- reciprocal matches GPU: https://microreact.org/project/213LjA7KhLa1c2MjoHNFot-s3reciprocalmatchesgpu
Just going to update the lineage querying process to give a better idea of how the beebop workflow could work.
Alright, hopefully got that workflow working in a rudimentary way now - done with this for now