PPanGGOLiN icon indicating copy to clipboard operation
PPanGGOLiN copied to clipboard

Trying to generate the projection directory without succes

Open cmonat opened this issue 4 months ago • 3 comments

Hello,

i'm using PPanGGOLiN v2.2.3 and I would like to produce a projection folder as in v1. I've tryied the following command:

ppanggolin projection -p pangenome.h5 --fasta ../Bacillus_A_ombysepticus_ICSA_fasta_29072025_modified.list --anno ../Bacillus_A_ombysepticus_ICSA_gff_29072025_modified.list --verbose 2
2025-07-31 08:34:24 utils.py:l239 INFO  Command: /SD5/people/s1060627/miniconda3/bin/ppanggolin projection -p pangenome.h5 --fasta ../Bacillus_A_ombysepticus_ICSA_fasta_29072025_modified.list --anno ../Bacillus_A_ombysepticus_ICSA_gff_29072025_modified.list --verbose 2
2025-07-31 08:34:24 utils.py:l242 INFO  PPanGGOLiN version: 2.2.3
2025-07-31 08:34:24 utils.py:l710 DEBUG The parameter "--anno: ../Bacillus_A_ombysepticus_ICSA_gff_29072025_modified.list" has been specified in the command line with a non-default value. Its value overwrites the default value (None).
2025-07-31 08:34:24 utils.py:l710 DEBUG The parameter "--fasta: ../Bacillus_A_ombysepticus_ICSA_fasta_29072025_modified.list" has been specified in the command line with a non-default value. Its value overwrites the default value (None).
2025-07-31 08:34:24 utils.py:l710 DEBUG The parameter "--pangenome: pangenome.h5" has been specified in the command line with a non-default value. Its value overwrites the default value (None).
2025-07-31 08:34:24 utils.py:l710 DEBUG The parameter "--verbose: 2" has been specified in the command line with a non-default value. Its value overwrites the default value (1).
2025-07-31 08:34:24 utils.py:l891 DEBUG 1 projection parameters have non-default value: verbose=2
2025-07-31 08:34:24 utils.py:l977 INFO  1 parameters have a non-default value.
2025-07-31 08:34:24 projection.py:l1473 DEBUG   The provided file (../Bacillus_A_ombysepticus_ICSA_fasta_29072025_modified.list) is detected as a TSV file.
2025-07-31 08:34:24 projection.py:l1473 DEBUG   The provided file (../Bacillus_A_ombysepticus_ICSA_gff_29072025_modified.list) is detected as a TSV file.
2025-07-31 08:34:24 projection.py:l1521 DEBUG
2025-07-31 08:34:26 utils.py:l387 DEBUG Create output directory /biostress/pangenomics/ICSA/Bacillus_A_bombysepticus_ICSA/Bacillus_A_bombysepticus_ICSA_PPGGv2/ppanggolin_projection_DATE2025-07-31_HOUR08.34.24_PID1716937
/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/tables/attributeset.py:363: FiltersWarning:

Failed parsing FILTERS key

2025-07-31 08:34:26 readBinaries.py:l123 INFO   Getting the current pangenome status
2025-07-31 08:34:26 readBinaries.py:l1503 INFO  Reading pangenome annotations...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2325/2325 [00:00<00:00, 79107.00genome/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 397286/397286 [00:06<00:00, 65175.65contig/s]
Traceback (most recent call last):
  File "/SD5/people/s1060627/miniconda3/bin/ppanggolin", line 8, in <module>
    sys.exit(main())
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/ppanggolin/main.py", line 269, in main
    ppanggolin.projection.projection.launch(args)
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/ppanggolin/projection/projection.py", line 1572, in launch
    check_pangenome_info(
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/ppanggolin/formats/readBinaries.py", line 1799, in check_pangenome_info
    read_pangenome(pangenome, disable_bar=disable_bar, **need_info)
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/ppanggolin/formats/readBinaries.py", line 1504, in read_pangenome
    read_annotation(pangenome, h5f, disable_bar=disable_bar)
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/ppanggolin/formats/readBinaries.py", line 1244, in read_annotation
    genedata_dict = read_genedata(h5f)
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/ppanggolin/formats/readBinaries.py", line 200, in read_genedata
    for row in read_chunks(table, chunk=20000):
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/ppanggolin/formats/readBinaries.py", line 182, in read_chunks
    yield from table.read(start=i, stop=i + chunk, field=column)
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/tables/table.py", line 1900, in read
    arr = self._read(start, stop, step, field, out)
  File "/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/tables/table.py", line 1814, in _read
    self._read_records(start, stop - start, result)
  File "tables/tableextension.pyx", line 645, in tables.tableextension.Table._read_records
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5D.c", line 1061, in H5Dread
    can't synchronously read data
  File "H5D.c", line 1008, in H5D__read_api_common
    can't read data
  File "H5VLcallback.c", line 2092, in H5VL_dataset_read_direct
    dataset read failed
  File "H5VLcallback.c", line 2048, in H5VL__dataset_read
    dataset read failed
  File "H5VLnative_dataset.c", line 363, in H5VL__native_dataset_read
    can't read data
  File "H5Dio.c", line 383, in H5D__read
    can't read data
  File "H5Dchunk.c", line 2856, in H5D__chunk_read
    unable to read raw data chunk
  File "H5Dchunk.c", line 4468, in H5D__chunk_lock
    data pipeline read failed
  File "H5Z.c", line 1391, in H5Z_pipeline
    filter returned failure during read
  File "hdf5-blosc2/src/blosc2_filter.c", line 458, in blosc2_filter
    Cannot get super-chunk from buffer

End of HDF5 error back trace

Problems reading records.
/SD5/people/s1060627/miniconda3/lib/python3.9/site-packages/tables/file.py:113: UnclosedFileWarning:

Closing remaining open file: /biostress/pangenomics/ICSA/Bacillus_A_bombysepticus_ICSA/Bacillus_A_bombysepticus_ICSA_PPGGv2/pangenome.h5

Can you help me solve this? Thanks for your help, and have a great day C.

cmonat avatar Jul 31 '25 12:07 cmonat

Hi Cécile,

Was the pangenome.h5 file generated with a version 2 of PPanGGOLiN?

Best,

David

dvallenet avatar Jul 31 '25 13:07 dvallenet

Hi David,

yes it was produce with v2.2.3

cmonat avatar Jul 31 '25 13:07 cmonat

Hello,

From the error message, the issue seems to come from reading the pangenome.h5 file. It may be corrupted or your current environment does not have the correct libraries installed to read it.

Could you please check if the file can be read using another PPanGGOLiN command, such as:

ppanggolin write_pangenome -p pangenome.h5 --stats -o output

Also, note that in version 2 of PPanGGOLiN, the meaning of "projection" has changed. In version 1, it referred to exporting pangenome annotations for the input genomes in a tabular format. In version 2, however, a command called projection has been added to annotate new genomes (not used to build the original pangenome) with an existing pangenome. This projects information such as partitions, spots, modules, and RGPs onto the new genomes. You can find more details in the documentation: https://ppanggolin.readthedocs.io/en/latest/user/projection.html

If you want to retrieve pangenome-based annotations for the original genomes (like in version 1), you can now use the write_genomes command (doc: https://ppanggolin.readthedocs.io/en/latest/user/writeGenomes.html):

The annotation can be exported in several formats:

  • --tables → TSV format
  • --gff → GFF format
  • --proksee → JSON format compatible with Proksee

For example:

ppanggolin write_genomes -p pangenome.h5 --table -o output_folder

Best regards, Jean

JeanMainguy avatar Aug 04 '25 14:08 JeanMainguy