pyani icon indicating copy to clipboard operation
pyani copied to clipboard

Collating results is slow for large datasets (>1500 genomes)

Open widdowquinn opened this issue 9 years ago • 1 comments

Currently, the code writes out all results individually and leaves processing output for calculation of ANI etc. until the end. This leaves an uninformative, and long, lag time before the results are presented to the user.

It may be possible to collate/summarise intermediate results in file, as we go. The total analysis time will be no shorter, but it might avoid that 'dead time' after the alignments are done.

widdowquinn avatar Nov 09 '15 10:11 widdowquinn

This could be implemented as cached matrix and/or dataframe results in the pyani database, with one table/matrix type for each run. Then, when pulling down the complete dataset for a run, we need only make one SQL request, rather than one for each result.

widdowquinn avatar Nov 19 '18 19:11 widdowquinn