Steward Garcia comments

Results 92 comments of


                                            Steward Garcia

WIP - Web server + conv2d fused + k-quants + dynamic gpu offloading

@Green-Sky If it seemed easier to me to implement it this way (stream endpoint like chat), since otherwise it would have required websockets or a loop calling an 'http:127.0.0.0:7680/progress' endpoint,...

WIP - Web server + conv2d fused + k-quants + dynamic gpu offloading

@Green-Sky ~~Unfortunately, I cannot run tests on CUDA Toolkit 11.8; I have no means to conduct the tests. I tried using Google Colab, but they already use the latest version...

WIP - Web server + conv2d fused + k-quants + dynamic gpu offloading

@Green-Sky try now

WIP - Web server + conv2d fused + k-quants + dynamic gpu offloading

Try enable SD_CONV2D_MEMORY_EFFICIENT this reduces the vae memory usage, or enable VAE tiling manually on the ui

WIP - Web server + conv2d fused + k-quants + dynamic gpu offloading

@Green-Sky Thank you for fixing the error, and yes, I have been inactive because I've been feeling a bit demotivated.

Be careful posting anime pictures!

Good advice!, We should take care it

"CUDA error" when set resolution higher than 1280 x 1280

It seems to be an error in the way matrix multiplications are performed in ggml. Does it work if you do it only with CPU?

"CUDA error" when set resolution higher than 1280 x 1280

@XienXX cmake .. - DSD_CUBLAS=OFF

export some api to free the image memory

I understand, for now the project is a mishmash of things, it still doesn't have a specific format to follow, the latest refactorings have made the code very verbose and...

any plan to merge upstream libggml?

I have a branch with the latest changes from ggml #221