droid
droid copied to clipboard
Stress-testing DROID with extremely large collections (1m plus) and looking at potential scaling optimisations
resulting from #107 - we should stress-test DROID at gigantic scale, both for processing and outputs
When DROID 6 was originally written, there were some large scale tests conducted of several million files. These worked, so it is possible unless there has been a regression.
Reporting however has definitely regressed in some way. The comprehensive breakdown report fails to complete when only processing a few thousand files - and this definitely used to work, even if was slow. The reporting technology was always bad at dealing with very large collections though.
I think all the other aspects of droid (database, profiling, filtering, exporting, etc) are OK with large collections (should be retested I guess).. Reporting is definitely something that could be enhanced or fixed.
Dear DROID team,
Do you know whether DROID has a limit regarding the number of files, directories, and / or size it can identify ?
One of my colleague had to identify files one by one, or DROID (v6.5 on Windows 10) would refuse to work.
The submission was comprised of 7 files amounting to 71GB. The heaviest file weighted 11GB.
Another submission of 20GB with files each under 1GB worked perfectly.
Best regards,
Samuel, for Conseil Départemental de l'Hérault
I'm not a member of the DROID team, so this isn't an official response. I have worked on DROID for many years.
DROID has been tested on millions of files, so number of individual files is unlikely to be a problem. I regularly test it with thousands of files.
Files should also not have any real size limitation. 64 bit addressing is used in I/O. However, if the files are archive files of some sort, it is possible that there may be size limitations imposed by the compression libraries used to decode them, or even a bug somewhere.
It seems odd that the files failed together but worked individually. The only explanation I can think of that fits is some kind of out of memory error caused by trying to process too much data at once. These errors have occurred in earlier versions of DROID, but have not been seen for a while with better memory management.
It would be helpful to have the DROID log for the failing runs to see what's going on.
Dear Droid team,
We failed to reproduce the issue today. It certainly was a transient problem.
Best regards,
Samuel