TotalSegmentator icon indicating copy to clipboard operation
TotalSegmentator copied to clipboard

`s0864/ct.nii.gz` cannot be unarchived properly

Open nai62 opened this issue 2 years ago • 3 comments

TL; DR

After unarchiving Totalsegmentator_dataset.zip, access to the very last element of s0864/ct.nii.gz results in a CPC check error.

Minimum python code to reproduce:

import nibabel as nib
import numpy as np
a = nib.load('s0864/ct.nii.gz')
print(a.dataobj[-1, -1, -1])

Error:

BadGzipFile: CRC check failed 0x5ab25329 != 0x84f031a9

Details

  • I downloaded Totalsegmentator_dataset.zip from Zenodo, and I confirmed that the MD5 hash matches the correct one.
    $ md5sum Totalsegmentator_dataset.zip
    f0fbb3f91d9d99f2169863545f577b30  Totalsegmentator_dataset.zip
    
  • When I unarchived it using "The Unarchiver" on Mac, it showed an error message like Unable to unarchive s0864/ct.nii.gz: archive broken (the original message was in Japanese), although the file was generated.
  • After unarchiving the whole ZIP archive, I found the error described above with the python environment on Mac.
  • I also tried unzip command on Linux to unarchive it, and the python environment on Linux to reproduce it. The error occurred in any cases.
  • I also confirmed that all other ct.nii.gz files can be loaded properly.

Versions:

  • Python 3.9.12
  • nibabel==4.0.2
  • numpy==1.23.3

Could you confirm if the error occurs in your environment? Thank you in advance!

nai62 avatar Nov 01 '22 10:11 nai62

Sorry, there's no need to use python to reproduce this error. The gunzip command also yields similar errors.

$ gunzip s0864/ct.nii.gz

gzip: s0864/ct.nii.gz: invalid compressed data--crc error

gzip: s0864/ct.nii.gz: invalid compressed data--length error

nai62 avatar Nov 01 '22 10:11 nai62

Hi, I also have the same problem. Unfortunately this file somehow got corrupted during the creation of the zip archive. For now the easiest solution is to exclude this file from the dataset.

wasserth avatar Nov 01 '22 11:11 wasserth

Thank you for your confirmation! I tried to visualize it and found that, as you mentioned, this file seems to be corrupted except for the first few slices. For now, I think I'm going to exclude it in my experiments as a workaround. Thank you for your suggestion.

nai62 avatar Nov 01 '22 13:11 nai62

Hello, Is it the only file you found out to be corrupted? I'm running some trainings with nnUnet and until now everything seems fine. I tried to run a training with a model called swin UNETR from their BTCV tutorial. And the training with the swin UNETR code encounters many corrupted files. I dropped this experiment because of that.

naayem avatar Jan 16 '23 12:01 naayem

Hi, as far as I confirmed, all the other files could be successfully opened without errors (in the way I wrote in the "TL; DR" above). I haven't performed a training with them, though.

nai62 avatar Jan 16 '23 14:01 nai62

code:

import os
import nibabel as nib

dir_path = "/scratch/izar/naayem/TEST/nnUNet_raw_data_base/nnUNet_raw_data/Task601_Totalsegmentator"
corrupted_files = []
for subdir in os.listdir(dir_path):
    subdir_path = os.path.join(dir_path, subdir)
    if os.path.isdir(subdir_path):
        for file in os.listdir(subdir_path):
            file_path = os.path.join(subdir_path, file)
            try:
                a = nib.load(file_path)
                print(a.dataobj[-1, -1, -1])
                print(f"{file_path} is loaded successfully")
            except Exception as e:
                print(f"Error occurred in file {file_path}: {e}")
                corrupted_files.append(file_path)
                
if corrupted_files:
    print("Corrupted files:")
    for file in corrupted_files:
        print(file)
else:
    print("No corrupted files found.")

I ran this code and also got: Corrupted files: /scratch/izar/naayem/TEST/nnUNet_raw_data_base/nnUNet_raw_data/Task601_Totalsegmentator/imagesTr/s0864_0000.nii.gz

So I guess it confirms it on my side.

naayem avatar Jan 17 '23 06:01 naayem

Hi, I don't know if this is the correct place to report my error or not. I downloaded the dataset and unzip on my Macbook. I dragged and dropped the file (e.g., Totalsegmentator_dataset/s0000/ct.nii.gz) to the 3D slicer v5.2.1, however, I always got the error Error: Loading Totalsegmentator_dataset/s0000/ct.nii.gz - load failed. Did I open the file correctly?

Khoa-NT avatar Feb 06 '23 09:02 Khoa-NT

I uploaded a new version of the dataset. Now unzipping should work without errors.

wasserth avatar Oct 04 '23 11:10 wasserth

@wasserth I was using the new V2 version, with md5sum hash of fd65f71cf3ef78c67a3740909ecef674. However, I'm also getting gzip CRC issues while reading with nibabel. I also tested gunzip which doesn't work either.

Files I'm having issues with are:

  • s0045/ct.nii.gz
  • s0830/ct.nii.gz

michaelmyc avatar Oct 13 '23 08:10 michaelmyc