FFAMax issues

Results 25 issues of


                                            FFAMax

Implement protection from: reduced capacity may crash network

Scenario: model require 3 node to match minimal VRAM. Once one of the node experienced timeout, task will fallback to host and lead to crash due not enough memory. In...

Web UI may disconnect due network issue (actually not) and unable to recover

Due some reason browser may disconnect from node, task keep running but WEB UI will unable to recover last conversation.

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900

1. Added few GPUs. 2. Tuned timeout. On slow setups (~1 token per second) average response may take ~600-1000 tokens. In most cases it will lead to timeout (network error...

Crash due: SQLite objects created in a thread can only be used in that same thread

Found instability: ``` Error processing tensor for shard Shard(model_id='mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated', start_layer=0, end_layer=15, n_layers=32): SQLite objects created in a thread can only be used in that same thread. The object was created...

Add native tinygard sharding support

Example: CUDA_VISIBLE_DEVICES=0,1,2 python3 examples/llama3.py --download_model --shard 3 So exo need to switch from model where only one device per node, to model where node is a node where multiple devices...

Getting Killed

The last meaningful output: ``` ram used: 5.67 GB, tok_embeddings.weight : 99%|▉| 290/292 [00:25

Nodes can see each other but task doesn't split

Hello, Team. Have 2 nodes on the same host, but only one taking load. Any ideas? Running by: ` CUDA_VISIBLE_DEVICES=0 exo --node-id=node1 --node-port=65001 --discovery-module manual --discovery-config-path n2.cfg --inference-engine=tinygrad` `DEBUG=6 CUDA_VISIBLE_DEVICES=1...

FFAMax