browsertrix-crawler icon indicating copy to clipboard operation
browsertrix-crawler copied to clipboard

Parameter diskUtilization is ignoring input

Open ydyote opened this issue 1 year ago • 2 comments

Hello,

I'm not 100% sure default diskUtilization=90 is the best decision but I guess it kinda is a failsafe, it might have been better to also have it in the example

because when I ran the example it aborted my fetch because of the disk utilization, and I would not call 100gb of free space "not enough" to run a crawler on a small site

I also think that if you do --diskUtilization 100 it just ignores the value without any error that my input is out of range or anything, I suspect it has something to do with this https://github.com/webrecorder/browsertrix-crawler/blob/c3b98e5047ea219336883b0b1969da425fc43456/util/argParser.js#L551

what I got in log: {"timestamp":"2023-11-21T11:31:32.891Z","logLevel":"info","context":"general","message":"Disk utilization threshold reached 99% > 90%, stopping","details":{}}

my hdd I used for this: image

So I would propose to adjust the validation so it says that 100 is out of range but also adjust the code so the --diskUtilization 99 starts working because: {"timestamp":"2023-11-21T11:51:20.378Z","logLevel":"info","context":"general","message":"Disk utilization threshold reached 99% > 99%, stopping","details":{}}

or change "diskutilization" to something like minimumFreeSpace in actual units like 10gb default value if there is a need to really have this turned on by default

Thanks for reading

ydyote avatar Nov 21 '23 11:11 ydyote