proteinfold
proteinfold copied to clipboard
Convert Stockholm format MSAs to more efficient a3m format before emitting to publishdir
Description of feature
The stockholm format MSAs are extremely bulky files which leave a large disk footprint - particularly the uniprot.sto files generated for pairing sequences in AlphaFold2 multimer. It might be nice to convert MSA files to a3m format before emitting to publishdir.
Maybe a compressed archive is a better solution to preserve the original data format.
A quick fix we used is adding a compression step to afterScript. A similar script could be added to the module's main.nf
withName: 'RUN_ALPHAFOLD2_MSA|RUN_ALPHAFOLD2' {
afterScript = """
find . -type f -name '*.sto' -exec zstd -19 --rm {} \\;
"""
}