Daniele

Results 63 comments of Daniele
trafficstars

> I tested the comfyui, by creating a venv from rocm sdk python and then by installing the whl files that rocm sdk builder builded and then finally the comfyui...

> * we can now build and launch following extra tools with the latest rocm_sdk_builder. What is missing is the better documentation for this so that people know it. Not...

That's nice to know, thanks. I'll look into it to check where it does make a difference because, as I mentioned, it actually slows down im2col.

I've been trying to use the VK_EXT_subgroup_size_control extension in GLSL to set the `requiredSubgroupSize` but I can't manage to make it work. Is the right approach to enable the "GL_EXT_subgroup_size_control"...

Thanks a lot for the information, I've been trying to implement it directly in the shader not knowing it was already implemented in the main ggml-vulkan.cpp code. I did some...

Setting some pipelines to subgroup 64 and some to subgroup 32 I can get some good performance gains on stable diffusion xl 1024x1024 20 steps with tiled vae decode: stock...

With the latest changes RX 5700 uses subgroup 32 when it's detected and forces subgroup 64 on IM2COL (other operations may be faster on subgroup 64 like softmax but since...

I've pushed the last changes in which I remove the awful code duplication and instead introduce a helper function that checks if there's an override for the subgroup size from...

I've just created two stable-diffusion.cpp branches in my fork with all the required changes. You can just checkout `sync` as the baseline and `wave_test` with this PR changes and build...

Here are my findings: 1. ``` vk::PhysicalDeviceProperties2 props2; device->physical_device.getProperties2(&props2); std::string device_name = props2.properties.deviceName.data(); ``` Declaring those three lines is needed to effectively enabling wave32, at least on RDNA1 (otherwise if...