PopPUNK icon indicating copy to clipboard operation
PopPUNK copied to clipboard

--fit-model error

Open luciagrami opened this issue 1 year ago • 2 comments

Hello, I am using poppunk 2.7.0 with pp-sketchlib v2.1.4.

I am having issues when running fit model:

poppunk --fit-model dbscan --ref-db TBdb --output TBdb_hdbscan

Output: PopPUNK (POPulation Partitioning Using Nucleotide Kmers) (with backend: sketchlib v2.1.4 sketchlib: /atlas/apps/miniconda3/envs/pp_env/lib/python3.11/site-packages/pp_sketchlib.cpython-311-x86_64-linux-gnu.so)

Graph-tools OpenMP parallelisation enabled: with 35 threads Mode: Fitting dbscan model to reference database

Fitting HDBSCAN model using a CPU Fitting HDBSCAN model using a CPU Fitting HDBSCAN model using a CPU Fitting HDBSCAN model using a CPU Assigning distances with DBSCAN model 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1026/1026 [00:26<00:00, 38.10it/s] Fit summary: Number of clusters 21 Number of datapoints 100000 Number of assignments 81096

Scaled component means [0.65383488 0.22816747] [0.38883853 0.14202529] [0.09366661 0.00975697] [2.56513596e-01 4.57124497e-06] [0.22594947 0. ] [4.91949245e-02 4.69420002e-06] [7.73994699e-02 2.12680433e-07] [1.55685291e-01 3.80115409e-07] [8.13707039e-02 9.56804570e-07] [0.08370336 0. ] [8.88103917e-02 5.63386891e-07] [0.0864341 0. ] [0.12166263 0. ] [1.23963781e-01 1.00648538e-06] [0.09071088 0. ] [0.09283514 0. ] [1.10260807e-01 9.44051010e-07] [1.17498189e-01 5.60512490e-08] [0.0949481 0. ] [9.98577848e-02 8.35967001e-07] [1.04917660e-01 6.65403661e-07]

Network summary: Components 1547 Density 0.0163 Transitivity 0.6403 Mean betweenness 0.3620 Weighted-mean betweenness 0.1349 Score 0.6299 Score (w/ betweenness) 0.4019 Score (w/ weighted-betweenness) 0.5449 Traceback (most recent call last): File "/atlas/apps/miniconda3/envs/pp_env/bin/poppunk", line 11, in sys.exit(main()) ^^^^^^ File "/atlas/apps/miniconda3/envs/pp_env/lib/python3.11/site-packages/PopPUNK/main.py", line 668, in main isolateClustering = {fit_type: printClusters(genomeNetwork, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/atlas/apps/miniconda3/envs/pp_env/lib/python3.11/site-packages/PopPUNK/network.py", line 1520, in printClusters unword = next(unword_generator) ^^^^^^^^^^^^^^^^^^^^^^ File "/atlas/apps/miniconda3/envs/pp_env/lib/python3.11/site-packages/PopPUNK/unwords.py", line 31, in gen_unword word += "".join(syllable()0) ^^^^^^^^^^^^^^^ File "/atlas/apps/miniconda3/envs/pp_env/lib/python3.11/site-packages/PopPUNK/unwords.py", line 20, in cv = lambda: consonant() + vowel() ^^^^^^^^^^^ File "/atlas/apps/miniconda3/envs/pp_env/lib/python3.11/site-packages/PopPUNK/unwords.py", line 19, in consonant = lambda: random.sample(consonants, 1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/atlas/apps/miniconda3/envs/pp_env/lib/python3.11/random.py", line 439, in sample raise TypeError("Population must be a sequence. " TypeError: Population must be a sequence. For dicts or sets, use sorted(d).

The error occurs when trying different models. Any suggestions? Thanks!

luciagrami avatar Aug 16 '24 19:08 luciagrami

Hopefully a fix for this in #322, which should appear in v2.7.1.

I'm not sure why that has suddenly started happening, and why it doesn't happen in the tests. Also makes it hard for me to verify this does fix the issue!

Can you change the following line in PopPUNK/unwords.py: https://github.com/bacpop/PopPUNK/pull/322/files

To do this you'll need to clone the repository, and run poppunk with python poppunk-runner.py instead of just poppunk.

johnlees avatar Aug 19 '24 08:08 johnlees

i.e. on line 16 add sorted() around the right hand side consonants = sorted(set(string.ascii_lowercase) - set(vowels) - set(trouble))

johnlees avatar Aug 19 '24 08:08 johnlees

Fixed in v2.7.1

johnlees avatar Nov 07 '24 10:11 johnlees