Jakub Beránek comments

Results 761 comments of


                                            Jakub Beránek

Create overhead benchmark

Measure overhead of HQ so far: - Barbora/Karolina - 1ms - Notebook - 0.1ms Overhead on IT4I clusters is caused by old GLIBC and slow `fork` (https://github.com/rust-lang/rust/issues/87764).

Storing worker monitoring into binary log

I think that this is now superseded by the HQ server event log. @spirali?

Automatic allocation behavior clarification / feature request

Hi :) So there are a few things to unpack here. > Possible that I might have miss-configured, I'm a bit uncertain about the backlog flag, and e.g setting that...

Automatic allocation behavior clarification / feature request

Ok, this definitely sounds suspicious and unintended. I will try to simulate this situation and see if I can fix it.

Automatic allocation behavior clarification / feature request

I implemented a change that should fix the behaviour in this situation. However, we will need to make larger changes to the autoallocator to make it more robust and "smarter",...

Incorrect number of autodetected GPUs

Interesting. Is there some proper defined way of finding out which GPUs are actually enabled for the user? Trying to open `/dev/nvidia` seems a bit "fishy". In general, we tried...

Incorrect number of autodetected GPUs

Could you please provide some more context? On which cluster does this happen, what is the Nvidia/CUDA configuration used there? Is there e.g. some environment variable that could be used...

Incorrect number of autodetected GPUs

Thanks for the details. Using `CUDA_VISIBLE_DEVICES` makes sense to us, since it is also what users are used to, so they could expect that `CUDA_VISIBLE_DEVICES=1,2 hq worker start` will set...

Alloc crash + new allocs have (deleted) in jobscript

Hi, that looks really weird. Basically, what the autoallocator does it that it first tries to discover the path to the `hq` binary from `/proc/self/exe` so that it knows how...

Alloc crash + new allocs have (deleted) in jobscript

We will try to detect `(deleted)` in the executable path and provide a better warning to the user. I'll close this issue once it's implemented.