aMeta icon indicating copy to clipboard operation
aMeta copied to clipboard

Default parameters in MaltExtract leading missing output

Open ardasevkar opened this issue 1 year ago • 2 comments

Hi, When comparing my previous mapping results -which seem to be authentic- with the aMETA output, I noticed that there were significantly fewer reads in each case. Upon further investigation into the aMETA outputs, I observed that the default parameters in MaltExtract may lead to missing data in the final output. The default settings and their functions are as follows:

--destackingOff: This option removes reads in situations where there is an overlapping sequence between them. Essentially, using this option significantly decreases the total number of assigned reads on the specified node. It results in a lower number of reads but achieves high genome coverage.

--downSampOff: I think this option was added to speed up the process. Essentially, it restricts the maximum number of assigned reads for a specified node to 10,000.

--dupRemOff: It removes PCR duplicates. While it is a good option, sometimes obtaining raw, unfiltered data can be preferable. I use my in-house script to remove PCR duplicates.

From my understanding, in the aMETA pipeline, MaltExtract is only running with the parameters below:

-i -f -o --reads --threads --matches --minPI 85.0 --maxReadLength 0 --minComp 0.0 --meganSummary -t

It appears that the three parameters I mentioned above are missing. It seems like there may need to be some modifications in the aMETA pipeline to include those parameters accordingly. Detailed information for MaltExtract parameters can be found in the HOPS Github page.

ardasevkar avatar Jan 23 '24 14:01 ardasevkar