FFAMax comments

Results 75 comments of


                                            FFAMax

CUDA Error 2, out of memory

Hello, Team. Anybody found solution to avoid `CUDA Error 2, out of memory`? ``` loaded weights in 4041.00 ms, 8.03 GB loaded at 1.99 GB/s Error processing tensor for shard...

In my case GPUs was not defined so it was unable properly proceed. Once FLOPs defined, it was able split according to available VRAM on all GPUs. Example https://github.com/exo-explore/exo/pull/393/files

Getting Killed

With `DEBUG=8 TINYGRAD_DEBUG=8 DEBUG_DISCOVERY=8 exo` got some info: ``` Broadcasting presence at (127.0.0.1) Broadcasting presence at (10.1.3.177): {"type": "discovery", "node_id": "ce0c3546-20d9-4a2c-9e96-16c6894259fa", "grpc_port": 49868, "device_capabilities": {"model": "Linux Box (NVIDIA GEFORCE GTX...

Use SUPPORT_BF16=0 with llama3.py leading to Segmentation fault

> It might be worth trying the patch in #7376 Thanks, it helped! Full command to run in my case: `SUPPORT_BF16=0 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8,9 python3 examples/llama3.py --download_model --shard 10 --size 8B`

Use SUPPORT_BF16=0 with llama3.py leading to Segmentation fault

Now it failing on another issue but that's another story :D ``` ptxas fatal : SM version specified by .target is higher than default SM version assumed Failed to generate...

[BOUNTY - $200] Manual networking with configuration files

Is it should just translate config file to equivalent of CLI options like --listen-port --broadcast-port --discovery-module --discovery-timeout --wait-for-peers or the goal is add more options like - on what interface...

[BOUNTY - $200] Manual networking with configuration files

@lipere123 do you mind to clone repo and submit your changes so can clone and try/contribute?

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900

> Can you double check the FP16 numbers here? Those look a little too low. They are usually halfway between the 8 and 32. For example take GTX 1080 Ti...

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900

It was changed to 900 due failures on old HW like GTX 1080. As I see project mostly focused on Apple devices so for most people it may have no...

With exo unable to run llama-3.2-1b

> Are you using tinygrad? Yes. That's a linux machine there therefore TinygradDynamicShardInferenceEngine picked up.