Augustus
Augustus copied to clipboard
Difference in usage of join_aug_pred.pl in parallelized run
Dear all,
I have found a difference in using split results as an input of join_aug_pred.pl. This was mentioned in #73, but I think this thread couldn't give me a clear explanation. There are basically three options to use this perl script to combine the parallelized results.
-
The article Hoff and Stanke 2019 (doi: 10.1002/cpbi.57) suggested this which does not work for (i=1; i<=30; i++); do cat $augDir/augustus.$i.out | join_aug_pred.pl > augustus.gff done
-
Katharina Hoff suggested at #73, for (i=1; i<=30; i++); do cat $augDir/augustus.$i.out | join_aug_pred.pl >> augustus.gff done
-
Other user suggested at #73 that I could do for i in {1..30}; do cat augustus.$i.out ; done > concatenated_augustus.gff join_aug_pred.pl < concatenated_augustus.gff > joined_augustus.gff; done
I tried 2) and 3), and both worked. However, I found ~5% of file size difference between outputs from 2) and 3). Am I supposed to use 2)? if yes, could you kindly explain how join_aug_pred.pl treats split results?
Thanks,