PPanGGOLiN icon indicating copy to clipboard operation
PPanGGOLiN copied to clipboard

Clarification about the contents of `gene_to_gene_family.tsv ` from projection

Open szhan opened this issue 9 months ago • 4 comments

I have been running projection on a reconstructed pangenome and a set of assembly FastA files for input genomes, in order to assign each gene to a gene family in the pangenome for each input genome.

I tried consulting the documentation about the output of projection, but the link doesn't seem to go anywhere (https://github.com/labgem/PPanGGOLiN/blob/f3ba6a1f33256f19175b570c4b711bb8970d0365/docs/user/projection.md).

The documentation states that gene_to_gene_family.tsv "provides the mapping of genes to gene families of the pangenome." I was expecting to see one line per gene for an input genome, which indicates that the gene in a line is assigned to a gene family in the reconstructed pangenome. But this isn't what I got. Instead, I got files with 100s of thousands of lines, even though an input genome contains 2.5k to 2.9k genes.

Any clarifications would be much appreciated. Thank you in advance.

szhan avatar May 13 '24 15:05 szhan

Hi,

The "projection" documentation about its output files is here: https://ppanggolin.readthedocs.io/en/latest/user/projection.html#output-files

However, indeed it is right that the current behavior is not the one that was intended. I see where the bug is. Currently, the "gene_to_gene_family.tsv" file contains this information for ALL given input genomes, and not just the single input genome. The file is likely equal between the different "input genome" output directories. we'll get a fix for this in the upcoming version.

Thank you very much for the bug report.

Adelme

axbazin avatar May 14 '24 09:05 axbazin

Thank you for the explanation. I checked whether "The file is likely equal between the different "input genome" output directories" for a few input genomes. But it didn't seem to be the case. I look forward to the updated version. Thank you.

szhan avatar May 14 '24 10:05 szhan

Also, I was referring to https://github.com/labgem/PPanGGOLiN/blob/f3ba6a1f33256f19175b570c4b711bb8970d0365/docs/user/Outputs.md#gene-families-and-genes, which doesn't seem to exist anymore, in https://github.com/labgem/PPanGGOLiN/blob/f3ba6a1f33256f19175b570c4b711bb8970d0365/docs/user/projection.md

szhan avatar May 14 '24 10:05 szhan

Alright thank you for the additional input, and indeed I misunderstood what you meant, I see the broken link now ! Will fix this as well.

axbazin avatar May 14 '24 11:05 axbazin

The fix for this issue has been released in v2.1.0.

JeanMainguy avatar Jul 11 '24 07:07 JeanMainguy