chewBBACA icon indicating copy to clipboard operation
chewBBACA copied to clipboard

PROPOSAL: diffs

Open lskatz opened this issue 2 years ago • 6 comments

Hi, I am finding one aspect of ChewBBACA problematic: that it adds alleles in the same command that it analyzes. This leads to several problems including

  • Automatic errors if the database is on a read-only drive. It will err as soon as it tries to write. This has happened if I mount read-only with Singularity, for example. Or if there is a central read-only MLST database on our high performance computer (HPC) that everyone uses.
  • Pollution of the database. I queried with some bad assemblies and now the database is ruined. The only way to backtrack is to delete and recreate the database. If there is a central MLST database on our HPC, then it is problematic if one user's mistakes lead to the pollution of the database which affects all users.

I would like to propose that the AlleleCall step produces something like diff or patch files. I would also like to propose an additional step that can accept a patch file to update the database. The most efficient way to accept a patch might be through git commands but that is just a suggestion.

Having patch files might also be helpful for compatibility with any current or future MLST callers like STing, if they decide to accept patches. It would also help in communicating between labs using ChewBBACA. For example, if I discover a new allele, it would be a standardized approach to communicating it to chewbbaca.online.

Thank you for your consideration on this topic.

lskatz avatar Aug 24 '21 17:08 lskatz

The standard patch format: https://www.oreilly.com/library/view/git-pocket-guide/9781449327507/ch11.html

lskatz avatar Aug 24 '21 17:08 lskatz

Thanks for the suggestions @lskatz . Some of the points you raised have been in discussion in the group for some time, so your comments are an excellent starting point to think more seriously about this. I see @rfm-targa has already self-assigned this. I would just like to highlight that the communication with chewie name server at chewbbaca.online is already automated in chewBBACA, including the submission of new alleles identified for the first time locally. You can see more on this at https://chewie-ns.readthedocs.io/en/latest/user/synchronize_api.html.

ramirma avatar Aug 25 '21 14:08 ramirma

Thank you @ramirma and @rfm-targa for having already thought about this! Thank you for considering this topic!

lskatz avatar Aug 25 '21 16:08 lskatz