Vadim Kantorov
Vadim Kantorov
> I'll double check that this is the case. Yeah, as I got this in the log: ``` INFO 08-01 12:18:31 [config.py:1869] Defaulting to use mp for distributed inference INFO...
One last thing: in DP mode, are actually forward methods supposed to be invoked simultaneously and block/wait for other ranks? Or are the workers completely independent and require no sync...
> DP is usually used for MoE models where the expert layers use TP and/or EP > This synchronization isn't necessary for non-MoE models (where the DP would be completely...
@njhill So maybe some sort of non-synchronized DP is needed, or some sort of built-in simple proxy/load-balancer - so that `vllm serve` could use all GPUs and that all GPUs...
@njhill It also appears that in `vllm serve` multiple DP ranks are downloading safetensors simultaneously which leads to clobbering of the command line output. Also tqdm's `\r` screws everything up...
@janeyx99 Does this then fix this? - https://github.com/pytorch/pytorch/issues/57947 Or does it only fix weight type promotion without start/end input type promotion? Could it somehow reuse the kernel / type promotion...
> as they're not scalar tensors Sometimes supporting constant python scalar as start/end is also useful. > I'd imagine would not have such a binary weight. True, but given the...
For python scalars I mean usage like so: ```python foreground_mask = torch.rand(16, 16) image = torch.randint(0, 256, (16, 16), dtype = torch.uint8) torch.lerp(image, 255, foreground_mask) # TypeError: lerp() received an...
@janeyx99 pasted these examples also into: - https://github.com/pytorch/pytorch/issues/57947#issuecomment-2832006122 currently python scalar as `input` / `end` are not supported as the only overloads are ``` * (Tensor input, Tensor end, Tensor...
It's hard to give a small example, since it's the last step of my TexLive building pipeline that takes > 1h to build. I think that somehow gcc's ld decides...