fcs
fcs copied to clipboard
AssertionError: Integrity check failed
Hello. Half of my FCS-GX runs still end in crashes, even after assigning FCS-GX more than 470 Gb memory in every run (which was suggested in issue #69). The crashes seem to happen randomly: a resubmitted run with exactly the same input files and settings may or may not crash again on second try. The error messages are different from what I reported in issue #69. One of them is AssertionError: Integrity check failed
.
A log of a run with this error is below:
===============================================================================
Source: /mft-volume
Destination: /app/db/gxdb
Resuming failed transfer in /app/db/gxdb...
Space check: Available:1.14TiB; Existing:0B; Incoming:464.34GiB; Delta:464.34GiB
Requires transfer: 59B all.meta.jsonl
Copying /mft-volume/all.meta.jsonl to /app/db/gxdb/all.meta.jsonl.part...
Requires transfer: 187B all.README.txt
Copying /mft-volume/all.README.txt to /app/db/gxdb/all.README.txt.part...
Requires transfer: 6.09MiB all.taxa.tsv
Copying /mft-volume/all.taxa.tsv to /app/db/gxdb/all.taxa.tsv.part...
Requires transfer: 7.86MiB all.blast_div.tsv.gz
Copying /mft-volume/all.blast_div.tsv.gz to /app/db/gxdb/all.blast_div.tsv.gz.part...
Requires transfer: 8.48MiB all.assemblies.tsv
Copying /mft-volume/all.assemblies.tsv to /app/db/gxdb/all.assemblies.tsv.part...
Requires transfer: 21.51MiB all.seq_info.tsv.gz
Copying /mft-volume/all.seq_info.tsv.gz to /app/db/gxdb/all.seq_info.tsv.gz.part...
Requires transfer: 165.14GiB all.gxs
Copying /mft-volume/all.gxs to /app/db/gxdb/all.gxs.part...
/app/db/gxdb/all.gxs.part - file-size changed.
Traceback (most recent call last):
File "/tmp/Bazel.runfiles_26ygs8hq/runfiles/cgr_fcs/apps/fcs_genome/public/sync_files/sync_files.py", line 724, in <module>
main()
File "/tmp/Bazel.runfiles_26ygs8hq/runfiles/cgr_fcs/apps/fcs_genome/public/sync_files/sync_files.py", line 700, in main
transfer_file(mi, src_mft_dir, work_dir)
File "/tmp/Bazel.runfiles_26ygs8hq/runfiles/cgr_fcs/apps/fcs_genome/public/sync_files/sync_files.py", line 572, in transfer_file
assert file_integrity_ok(mi, tmp_file_path, verify_hashes=(not hash_ok), verbose=True), "Integrity check failed."
AssertionError: Integrity check failed.
-----------------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/Bazel.runfiles_nwibowqx/runfiles/cgr_fcs/apps/fcs_genome/public/run_gx/run_gx.py", line 1037, in <module>
main()
File "/tmp/Bazel.runfiles_nwibowqx/runfiles/cgr_fcs/apps/fcs_genome/public/run_gx/run_gx.py", line 1004, in main
assert len(paths) == 1, f"Cannot resolve path to *.gxi file from {args.gx_db}: {paths}"
AssertionError: Cannot resolve path to *.gxi file from /app/db/gxdb/gx_mapper_1530334: []
What has happened here, and how to prevent this crash?
These are the software versions used for this run: OS: Ubuntu 22.04.4 LTS Singularity: v3.10.0 FCS image: 0.5.0 Python: 3.8.12 Platform: LSF
AssertionError: Integrity check failed.
This error indicates that the database files copied to the destination directory are corrupted. What did you do for batch processing in #78 ?
Please verify that the database files you downloaded from source are correct (see db check
command).
If the problem persists, you may want to try using your preferred method of copying files to the destination directory instead of using fcs.py db get
and then verifying the integrity of the transfer with fcs.py db check
.
https://github.com/ncbi/fcs/wiki/FCS-GX-input
AssertionError: Cannot resolve path to *.gxi file from /app/db/gxdb/gx_mapper_1530334: []
This error is indicative that screen genome
command was invoked with --gx-db=
path containing incomplete or corrupted gx-database.
Hello. I haven't so far implemented batch processing, although it's still in the plans to try it. In the runs that I'm talking about here I'm using FCS-GX with only one assembly file per run, and the database gets copied to a subdirectory on /tmp
for the run using fcs.py db get
.
I assume the database downloaded from https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/latest
is fine. The AssertionError: Integrity check failed.
error message appears intermittently but if it was caused by a faulty database download from the FTP site it would probably appear every time I run FCS-GX.
The output for the python3 fcs.py db check --mft "$SOURCE_DB_MANIFEST" --dir "$LOCAL_DB"
command is:
===============================================================================
/app/db/gxdb is up-to-date with https://ncbi-fcs-gx.s3.amazonaws.com/gxdb/latest.
So I think fcs.py
occasionally fails to fully copy the database over to /tmp
when the fcs.py db get
command is run. I guess I could indeed try to work around it by using some other method of copying files instead of relying on fcs.py db get
. I'm not sure what method it should be, though
Did you manage to resolve this issue? The various db retrieval methods are described here:
https://github.com/ncbi/fcs/wiki/FCS-GX-input#fcs-gx-database-location
I think so. About a month ago I replaced the code for copying the database files with my own Python script that copies the files, checks the copied files with md5sum
and retries copying if copying failed. I then also runs the fcs.py db check
command. I haven't seen crashes with the AssertionError: Integrity check failed.
error or any other error in the database copying stage since then
Thanks. Hope things continue to run smoothly. I know you have another open reported issue so let me know if you get the latest release to run.