exo icon indicating copy to clipboard operation
exo copied to clipboard

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900

Open FFAMax opened this issue 1 year ago • 2 comments

  1. Added few GPUs.
  2. Tuned timeout. On slow setups (~1 token per second) average response may take ~600-1000 tokens. In most cases it will lead to timeout (network error which is not). Fixing to reduce exceptions. Who looking for better performance and know what to do need adjust with a knowledge how it will impact. By default making it will work for most cases.

FFAMax avatar Oct 28 '24 04:10 FFAMax

Can you double check the FP16 numbers here? Those look a little too low. They are usually halfway between the 8 and 32.

dtnewman avatar Nov 03 '24 03:11 dtnewman

Can you double check the FP16 numbers here? Those look a little too low. They are usually halfway between the 8 and 32.

For example take GTX 1080 Ti

According to https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

FP16 & INT8 are NA

Based on https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877 FP16 (half) 177.2 GFLOPS (1:64) What is .177 TFLOPS as mentioned

So they are low probably due no HW support or something like that.

For example for https://www.techpowerup.com/gpu-specs/geforce-gtx-1660-ti.c3364 We can see 2:1 as you mentioned while for 1080 it is 1:64

FFAMax avatar Nov 03 '24 05:11 FFAMax

I'm concerned with increasing the timeout this much. If a request would take this long, I'd say it should be treated differently. Request handling generally needs to be reworked with a new scheduler that has better control of the request flow. Right now it's pretty much fire-and-forget and hope we get some response back from the cluster.

I changed the timeout back to 90. The rest looks good to me.

AlexCheema avatar Nov 23 '24 19:11 AlexCheema

It was changed to 900 due failures on old HW like GTX 1080. As I see project mostly focused on Apple devices so for most people it may have no sense while for other it is a problem, it is already confirmed by other user so better to have dynamic timeout based on HW. Default 90 in my case will lead all the time to failure.

FFAMax avatar Nov 25 '24 07:11 FFAMax