browsertrix-crawler
browsertrix-crawler copied to clipboard
Parameter diskUtilization is ignoring input
Hello,
I'm not 100% sure default diskUtilization=90 is the best decision but I guess it kinda is a failsafe, it might have been better to also have it in the example
because when I ran the example it aborted my fetch because of the disk utilization, and I would not call 100gb of free space "not enough" to run a crawler on a small site
I also think that if you do --diskUtilization 100 it just ignores the value without any error that my input is out of range or anything, I suspect it has something to do with this https://github.com/webrecorder/browsertrix-crawler/blob/c3b98e5047ea219336883b0b1969da425fc43456/util/argParser.js#L551
what I got in log: {"timestamp":"2023-11-21T11:31:32.891Z","logLevel":"info","context":"general","message":"Disk utilization threshold reached 99% > 90%, stopping","details":{}}
my hdd I used for this:
So I would propose to adjust the validation so it says that 100 is out of range but also adjust the code so the --diskUtilization 99 starts working because: {"timestamp":"2023-11-21T11:51:20.378Z","logLevel":"info","context":"general","message":"Disk utilization threshold reached 99% > 99%, stopping","details":{}}
or change "diskutilization" to something like minimumFreeSpace in actual units like 10gb default value if there is a need to really have this turned on by default
Thanks for reading