ITK icon indicating copy to clipboard operation
ITK copied to clipboard

itk(GDCM|DCMTK)SeriesFileNames: memory leaks, bad behavior

Open PHLF opened this issue 3 years ago • 7 comments

Hello,

I may have not searched thoroughly but I am surprised not to have found an issue about this:

  • itkGDCMSeriesFileNames seems to suffer from a huge memory leak: it relies on the gdcm::SerieHelper class which... has some issues like this commented call to delete And looking at GCDM API doc, its authors seem well aware of the situation:

image

But you know... who read the doc anyway? (Please don't take this too seriously ;) )

So out of luck and running out of time, instead of patching ITK's GDCM I tried to use itkDCMTKSeriesFileNames which does not suffer from memory leaks, but... when you are out of luck you should stop trying:

  • itkDCMTKSeriesFileNames stores SeriesUID in a vector... which means that when you are querying Series UIDs you end up with duplicates (n files of the same series => n duplicated seriesUID): they should be stored in a set/unordered_set (or maybe in a hash map for linking series with their files). Not to mention that it lags behind itkGDCMSeriesFileName in terms of features:
    • The recursive flag is not used
    • The AddSeriesRestriction function is a NOOP
    • The series uids and filenames are recomputed at each call to GetInputFileNames/GetFileNames("seriesUID")/GetSeriesUID but are stored as member variables: they could be used as cache values

Thank you very much for your time and for your work on this library.

PHLF avatar Sep 14 '21 06:09 PHLF

I don't think that memory leaks are known. @malaterre @issakomi correct me if I am wrong. @PHLF do you have a simple repro case for a memory leak?

dzenanz avatar Sep 14 '21 14:09 dzenanz

Something like:

#include <itkGDCMImageIO.h>
#include <itkGDCMSeriesFileNames.h>

#include <filesystem>
#include <vector>

int main()
{
  using SeriesIdContainer = std::vector<std::string>;
  SeriesIdContainer seriesUIDs;

  auto nameGenerator = itk::GDCMSeriesFileNames ::New();

  nameGenerator->SetUseSeriesDetails(true);
  nameGenerator->AddSeriesRestriction("0020|000d");
  nameGenerator->SetGlobalWarningDisplay(false);
  nameGenerator->SetRecursive(false);
  nameGenerator->SetLoadSequences(false);

  namespace fs = std::filesystem;

  fs::path inDcmRootDir = ".";

  // inDcmRootDir is a directory containing DICOM data subfolders
  for (auto const& dirEntry : fs::recursive_directory_iterator{inDcmRootDir})
  {
    if (!dirEntry.is_directory())
    {
      continue;
    }
    // I am pretty sure the leak happen during the following call as GDCMSeriesFileNames call "m_SerieHelper->Clear()"
    nameGenerator->SetInputDirectory(dirEntry.path());

    seriesUIDs = nameGenerator->GetSeriesUIDs();

    for(const auto& seriesUID : seriesUIDs)
    {
      nameGenerator->GetFileNames(seriesUID);
    }
  }

  return EXIT_SUCCESS;
}

PHLF avatar Sep 14 '21 15:09 PHLF

But maybe I did not use the API properly and there is no leak at all.

I forgot to mention: I am using ITK 5.2.0 on a x86_64 debian bullseye system

PHLF avatar Sep 14 '21 15:09 PHLF

I have tried to run the above test with Valgrind on Debian 11, ITK-5.2.1 release. No leaks or other errors were detected. I don't use the classes in my apps, maybe i am missing something. Could you reproduce leaks with Valgrind? And yes, probably it were not bad to review gdcmSeriesHelper.cxx.

issakomi avatar Sep 21 '21 14:09 issakomi

Could you reproduce leaks with Valgrind?

When running my application with massif (valgrind heap analyzer) the allocated memory reaches a limit value for an unknown reason and never goes OOM. But not using valgrind, the same application goes OOM when I use the path of a folder containing several dicoms as a parameter to the itk::GDCMSeriesFileNames::SetInputDirectory: I clearly see an amount of memory being allocated corresponding to the size of my folder's content. And this memory is never freed when I call itk::GDCMSeriesFileNames::SetInputDirectory with a different folder or simpler when I instantiate an itk::GDCMSeriesFileNames owned by a smartpointer in a loop: destroying the itk::GDCMSeriesFileNames does not deallocate the memory corresponding to the DICOM contents.

Looking further in gdcm::SerieHelper used by itk::GDCMSeriesFileNames, on can see that the whole file content of each iterated .dcm file is copied into a FileList's node and these nodes are never destroyed as the call to the delete operator is commented in the gdcm::SerieHelper::Clear function (that's my two cents hypothesis).

I may be missing something as the FileList is making use of SmartPointers and they should free the memory when the FileList is cleared, despite the commented call to "delete", but that is definitely not what I observe.

PHLF avatar Sep 21 '21 14:09 PHLF

I clearly see an amount of memory being allocated corresponding to the size of my folder's content. And this memory is never freed

OK, i have looked at it again. Have tried folder with DICOM sub-folders, 1GB. In fact, the scan seems to stress the memory, it grows and grows, after the scan is done with one sub-folder, it start another one and memory is not freed. Valgrind still doesn't detect a leak, because memory is freed at exit.

issakomi avatar Sep 21 '21 15:09 issakomi

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

stale[bot] avatar Apr 16 '22 10:04 stale[bot]