Mahmoud Shehata comments

Results 32 comments of


                                            Mahmoud Shehata

feat: DNS + SSL requirements, verify registration, docker setup

This can be split up into different issues, especially the ones that aren't currently actionable

Multi GPU support

> There's 2 different ways to do multi-gpu: multi-device and multi-host. We'll need to do both to truly reach llm scale training, but we should start with multi-device. > >...

> jorgeantonio21 Determinism is essential. Given the project I am working on, cross-platform consistency and verifiable communication (moot) across nodes are paramount. Given this paper [Agatha](https://dl.acm.org/doi/10.1145/3340531.3412684), here are some conclusions...

chore: add `llama_multinode` example with `nccl`

> > * I am blocked on the `flash_attn` on cuda >`12.X` so I haven't been able to finish this > > Are you sure it's blocked on these versions?...

Adapting llama_multiprocess to use `rsmpi`

> Oh nice, and why not using NCCL for multi node ? > > I haven't checked but the bindings to NCCL are pretty agnostic it should be easy to...

Adapting llama_multiprocess to use `rsmpi`

> > Can i use nccl for cross node communication? > > It's one of the big selling points ! https://developer.nvidia.com/blog/massively-scale-deep-learning-training-nccl-2-4/ I'm not a big user of multi nodes but...

Adapting llama_multiprocess to use `rsmpi`

> @b0xtch Did you manage to get llama_multiprocess running on a multi node setup with NCCL? I started it a while back, but I have been blocked by the flash...

Cross GPU device mapping feature

Related issue: https://github.com/huggingface/candle/issues/2007 There was an attempt to do tensor parallelism: https://github.com/EricLBuehler/mistral.rs/pull/72

Cross GPU device mapping feature

Amazing stuff! The tensor parallelism I am guessing will be on the core candle repo? or do you plan to abstract that in some way under this repo? I have...

Fixed duplicated arguments in tool-call-stream example

Yeah noticed that same with groq mistral model as well