issues with gprofiler2/gost
Have you checked the docs?
Description of the bug
Several issues found for module gprofiler2/gost:
- Inconcistency between using
--organism mmusculusor not Running with--organism mmusculusvs running with--organism null --gmt_file gprofiler_full_mmusculus.ENSG.gmt(gmt obtained by downloading from their website) will return different results. - Fixing the database using
set_base_urland then rungoston an archived database works locally but somehow gives error in the CI (https://github.com/nf-core/modules/actions/runs/13199316156/job/36850361811) - By default, the module will not use the
gmt_filegiven by user, iforganismis specified. Maybe it should be the other way around?
Other issues that makes testing the module using nf-test difficult:
4. Fixing the database using set_base_url and then run gost on an archived database is extremely slow even for test dataset (about 8 min vs 30s when using the updated database). Not sure why, but it could be related to how gost interact with old archives.
Are you planning to work on fixing these issues yourself? :)
@famosab Hello! I am currently busy with other issues, so welcome if anybody wants to work on this :)
@suzannejin One comment about your first issue
- Inconcistency between using
--organism mmusculusor not Running with--organism mmusculusvs running with--organism null --gmt_file gprofiler_full_mmusculus.ENSG.gmt(gmt obtained by downloading from their website) will return different results.
When downloading the GMT file, they remove KEGG and Transfac (TF), due to licensing reasons. Are those the differences you are referring to? If you remove KEGG and TF from the list of the sources, is the result the same as the downloaded GMT file?