pandora Pandora mapping is slow

Placeholder as @Danderson123 keeps saying this and I don't want to lose it. Mapping reads from one sample to a PRG (Ecoli + a few thousand AMR genes) taking 30 mins with 64 cores. I realise our performance stats in the pandora paper are a) way out of date b) end to end performance of doing pandora compare on 20 samples. I don;t think mapping can ever be as fast as the subsequent steps @Danderson123 does with amira, so it's not realistic to expect that, but I do want to understand if this is true (and set my expectations on how fast mapping is), or something weird about Dan's setup.

Oct 18 '23 10:10 iqbal-lab

Actually this might be a RAM/speed tradeoff - if we lazy load gene/PRGs into RAM only when we see them?

Oct 18 '23 10:10 iqbal-lab

(Hope i'm not misrepresenting you @Danderson123 , I can delete this or update it)

Oct 18 '23 10:10 iqbal-lab

This is correct, I will try to make plots of runtime vs number of reads after I have looked at the gene calling in more detail- without lazy loading the RAM usage was far higher than is reasonable for a laptop

Oct 18 '23 10:10 Danderson123

I have to say @Danderson123 , first mapping to an AMR-only PRG and only keeping those reads, and then mapping just those to the big PRG, would probably make a significant difference

Oct 18 '23 10:10 iqbal-lab