NeedsMoar
NeedsMoar
Which GPU is it?
Ditch: --force-fp32 --use-split-cross-attention AMD cards since at least Vega run fp16 at double speed and half the space as packed vector instructions, and LCM models are fine running without fp32...
It sounds exactly like the mmap issue someone hit last week, lemme find it: https://github.com/comfyanonymous/ComfyUI/issues/1992#issuecomment-1817797912 That might help. I'd suggest leaving this bug open even if changing the file open...
> I think you should underclock your GPU or increase your fan speed if you have temperature issues instead of doing this. I'd be strongly looking into this if I...
You can directly call C / C++ code from Rust (and dynamically or statically link to it), and aria2 can be built as a static library or a DLL. You're...
No, because the optimized kernels are built for specific sizes, and the maximum size anything is built for in sm_80 for ampere is 32768 because of the number of possible...
Sorry, max bounds (from the source) are 65536, 128, 32 from the code, but it looks like you should be able to fit in that by reshaping the tensor. The...
Removing hwupload / hwdownload worked on Vega too. I don't know what implicit hwdownload does to the ability to chain together vulkan filters or if ffmpeg handles this internally. I...
Try --gpu-only if you're not already. Sometimes that message shows up when nothing is swapped off or onto the GPU... based on your first vs second run times (assuming those...
@lw > Fused sequence parallel by itself might work on Windows (I'm not aware of any blockers?) but: > > * its fused Triton kernels won't be available, as @danthe3rd...