AlphaAtlas comments

Repositories
Issues
Comments

Results 24 comments of


                                            AlphaAtlas

[Enhancement] Simultaneous CLBLAS/CUBLAS instances.

Hmmm, does CLBlast reduce generation speed on IGPs now? I would think the transfers would be fine over 1 PCIe bus and to 1 IGP.

Apple M1 metal lag

@ggerganov llama-cpp-python (which text gen ui uses) implements the additional caching: https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama.py#L865

ggml : get rid of BLAS and all it's variants

If y'all progressively get rid of blas libraries, cublas is probably lowest on the totem pole? AFAIK users still need the huge cuda toolkit to run cuda inference anyway, so...

Server slows down after each question (with gpu).

This is normal for chats. Processing is slower and vram usage goes up as the context size (aka your chat history) increases. Also, the CPU/GPU tend to run at turbo...