pyani
                                
                                
                                
                                    pyani copied to clipboard
                            
                            
                            
                        Collating results is slow for large datasets (>1500 genomes)
Currently, the code writes out all results individually and leaves processing output for calculation of ANI etc. until the end. This leaves an uninformative, and long, lag time before the results are presented to the user.
It may be possible to collate/summarise intermediate results in file, as we go. The total analysis time will be no shorter, but it might avoid that 'dead time' after the alignments are done.
This could be implemented as cached matrix and/or dataframe results in the pyani database, with one table/matrix type for each run. Then, when pulling down the complete dataset for a run, we need only make one SQL request, rather than one for each result.