modules icon indicating copy to clipboard operation
modules copied to clipboard

fix: gprofiler2 output files missing gene names in intersection columns #497

Open mohe1linux opened this issue 2 months ago • 0 comments

The gprofiler2 module outputs were missing the actual gene names/IDs in the expected columns, making it impossible to identify which specific genes contribute to pathway enrichment.

Expected behavior:

*.gprofiler2.all_enriched_pathways.tsv should contain an intersection column with gene names/IDs *.gprofiler2.[source].sub_enriched_pathways.tsv should contain actual gene names in the DE_genes_names column

Actual behavior:

all_enriched_pathways.tsv file lacks the intersection column entirely sub_enriched_pathways.tsv files have DE_genes_names column containing numeric values (same as DE_genes) instead of gene names

Now with the fix

Enable g:Profiler evidence codes so the intersection column is emitted. Populate sub-tables with both Ensembl IDs and symbols: DE_genes_ids = originalintersection IDs DE_genes_names = gene symbols (from DE table where available, else gprofiler2::gconvert), fallback to IDs if unmapped

nextflow run . -profile test,docker --gprofiler2_run true --gprofiler2_organism mmusculus --gprofiler2_evcodes true --outdir test_gprofile_symbols

  • *all_enriched_pathways.tsv now contains intersection.
  • *sub_enriched_pathways.tsv now has DE_genes_ids and DE_genes_names (symbols present; IDs used as fallback).

Notes

  • No changes to output file names besides adding DE_genes_ids in sub tables.

mohe1linux avatar Oct 29 '25 15:10 mohe1linux