distributed-llama issues

Vulkan Acceleration

35

Hi @b4rtaz I was tinkering a bit over the weekend and figured it might be possible to create a version of worker/main that accelerates the inference by offloading some work...

DifferentialityDevelopment

How To Add Suppoerted Model

2

@b4rtaz Hey, thank you for your wonderful work. Could you please offer some details about how to add supported model? For example, how to split the network according to structure...

hyperbolic-c

Add safe tensor support to convert-llama.py

9

I haven't yet updated the other model conversion scripts yet, but this allows you to convert any llama model that uses safetensor.

DifferentialityDevelopment

[Feature Suggest] Tensor Parallellism for Accelerating LLM

22

Dear Author, Your contribution is critical for the open-source community. The distributed-llama repo has implemented tensor parallelism from scratch. And the result is amazingly significant. However, there are still improvements...

zhengpeirong

API Server

3

This pull request introduces API functionality to the distributed llama project. The main addition is the implementation of the chat completion endpoint, following the specifications outlined by OpenAI for chat...

DifferentialityDevelopment

terminate called after throwing an instance of 'ReadSocketException'

35

The nodes connect, but crash after roughly 3 seconds. Server: ``` sudo main simple-server --weights-float-type q40 --buffer-float-type q40 --nthreads 4 --model ~/dllama_meta-llama-3-8b_q40.bin --tokenizer ~/dllama-llama3-tokenizer.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998...

unclemusclez

JSONDecodeError("Expecting value", s, err.value) from None

10

``` ubuntu@ubuntu:~/llama3/Meta-Llama-3-8B-Instruct$ python3 ../../distributed-llama/converter/convert-llama.py ./ q40 Model name: Target float type: q40 Target file: dllama__q40.bin Traceback (most recent call last): File "/home/ubuntu/llama3/Meta-Llama-3-8B-Instruct/../../distributed-llama/converter/convert-llama.py", line 119, in convert(modelPath, outputFileName, targetFloatType) File "/home/ubuntu/llama3/Meta-Llama-3-8B-Instruct/../../distributed-llama/converter/convert-llama.py",...

unclemusclez

Fleshing out API mode

12

Hi there Amazing project by the way, it has given me hopes in being able to run really big models, specifically I'm very excited about the upcoming 400b Llama model...

DifferentialityDevelopment

add distributed llama on docker container test

```sh # 1 worker + inference make docker-1-worker-inference # 3 workers + inference like this: make docker-3-worker-inference WORKERS="172.18.0.2:9997 172.18.0.3:9997 172.18.0.4:9997" ``` my local test on docker containers: (use default checkpoint:...

weedge

Unknown header key's while converting llama 3 70b to distributed format

1

Hi there I'm busy converting llama 3 70b to the distributed format, but I get the following output: Target float type: q40 Target file: D:\Meta-Llama-3-70B-Instruct-Distributed\dllama_original_q40.bin 💿 Chunking model 1/16... Unknown...

DifferentialityDevelopment

distributed-llama
distributed-llama copied to clipboard

Metadata

Vulkan Acceleration

How To Add Suppoerted Model

Add safe tensor support to convert-llama.py

[Feature Suggest] Tensor Parallellism for Accelerating LLM

API Server

terminate called after throwing an instance of 'ReadSocketException'

JSONDecodeError("Expecting value", s, err.value) from None

Fleshing out API mode

add distributed llama on docker container test

Unknown header key's while converting llama 3 70b to distributed format

← Metadata

Owner

Metadata

distributed-llama distributed-llama copied to clipboard

Metadata

← Metadata

Owner

Metadata

distributed-llama
distributed-llama copied to clipboard