PopPUNK icon indicating copy to clipboard operation
PopPUNK copied to clipboard

Lineage model fitting - PopPUNK changes

Open nickjcroucher opened this issue 1 year ago • 2 comments

  • Relies upon https://github.com/bacpop/pp-sketchlib/compare/master...flexible_lineages?expand=1
  • All lineage models should now work from a single sparse matrix with kNN nearest neighbours (N.B. I have not checked whether nearest neighbours of the same distance are selected randomly, or if this is affected by input name order)
  • The kNN of this matrix is determined by the max_search_depth option
  • The matrix used for clustering comes from reducing this matrix by counting neighbours/counting unique distances/reciprocal BLAST - always regenerated from main matrix, which is the only aspect updated with querying
  • Added a script that generates consistent lineage databases for all strains in a non-lineage database - would be good to use this as an example workflow for beebop, views of @johnlees and @muppi1993 on how to store script/information needed for relating this databases appreciated!
  • At the moment, GPU analyses crash on exit, as there appears to be a memory leak associated with cugraph - see https://github.com/rapidsai/raft/issues/740, https://github.com/rapidsai/rmm/pull/931 - hopefully fixed in rapids=22.12 (based on https://github.com/rapidsai/raft/commit/2325d2b4cad2faf0ef1bce976cb377eb25b4d81d), but 22.10 is the latest version available on conda (https://anaconda.org/rapidsai/rapids - 16/10/22)
  • Tests run on all the lineage clustering options
  • Will update documentation if changes are satisfactory

Validation on serotype 3 dataset:

  • default lineages CPU: https://microreact.org/project/ixyy41yEJLr1HmoJECobxT-s3lineagescpu
  • default lineages GPU: https://microreact.org/project/szuvahjuAnc7RQEtEc5mNK-s3lineagesgpu
  • count unique distances CPU: https://microreact.org/project/t1wsVwjuBE2fa6Z63FmGvZ-s3countdistancescpu
  • count unique distances GPU: https://microreact.org/project/hHGYFUcLfA2fS6fTe1Xio5-s3countdistancesgpu
  • reciprocal matches CPU: https://microreact.org/project/7ydpd5v3xr6ewktB8aZ7eQ-s3reciprocalmatchescpu
  • reciprocal matches GPU: https://microreact.org/project/213LjA7KhLa1c2MjoHNFot-s3reciprocalmatchesgpu

nickjcroucher avatar Oct 16 '22 07:10 nickjcroucher

Just going to update the lineage querying process to give a better idea of how the beebop workflow could work.

nickjcroucher avatar Oct 17 '22 08:10 nickjcroucher

Alright, hopefully got that workflow working in a rudimentary way now - done with this for now

nickjcroucher avatar Oct 17 '22 10:10 nickjcroucher