fix: gprofiler2 output files missing gene names in intersection columns #497
The gprofiler2 module outputs were missing the actual gene names/IDs in the expected columns, making it impossible to identify which specific genes contribute to pathway enrichment.
Expected behavior:
*.gprofiler2.all_enriched_pathways.tsv should contain an intersection column with gene names/IDs *.gprofiler2.[source].sub_enriched_pathways.tsv should contain actual gene names in the DE_genes_names column
Actual behavior:
all_enriched_pathways.tsv file lacks the intersection column entirely sub_enriched_pathways.tsv files have DE_genes_names column containing numeric values (same as DE_genes) instead of gene names
Now with the fix
Enable g:Profiler evidence codes so the intersection column is emitted.
Populate sub-tables with both Ensembl IDs and symbols:
DE_genes_ids = originalintersection IDs DE_genes_names = gene symbols (from DE table where available, else gprofiler2::gconvert), fallback to IDs if unmapped
nextflow run . -profile test,docker --gprofiler2_run true --gprofiler2_organism mmusculus --gprofiler2_evcodes true --outdir test_gprofile_symbols
*all_enriched_pathways.tsvnow containsintersection.*sub_enriched_pathways.tsvnow hasDE_genes_idsandDE_genes_names(symbols present; IDs used as fallback).
Notes
- No changes to output file names besides adding
DE_genes_idsin sub tables.