Alexander Borzunov

Results 80 comments of Alexander Borzunov

Hi @scenaristeur, How much GPU memory does this model take in total? At first sight, it seems that this model requires < 8-10 GB and fits many consumer GPUs, so...

Hi @mryab, can you take a look at this?

Hi @Mathnerd314, Your suggestions sound reasonable. We'll start with an option to slice inference session (`reuse_inference(old[start:end])`) - I hope to add it in the nearest releases.

Hi @LouSparfell, As far as we know, homomorphic encryption and ZK methods are too slow to be applied for LLMs, since they are designed for integer computations and are not...

Hi @fadenb, What you're saying is 100% reasonable, we just didn't have time to do that since it would require additional complexity on the server-side. If you can help with...

Hi @iateadonut, Yes, a server should host a set of sequential blocks. Re mock CPU servers, you can create a [private swarm](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm) with a really small model like `bigscience/bloom-560m` and...

Hi @iateadonut, `dht_utils.get_remote_module_infos()` returns information about all servers (remote and your own ones). Note that: - You need to be connected to the **public swarm** to see servers hosted by...

@iateadonut No, but you can filter out your local peer_id to keep only remote infos, like we do in `should_choose_other_blocks()`.

@fadenb @iateadonut For the record, another reason why downloading blocks is slow is that StableBeluga2 weights are distributed in float32 and Llama weights are distributed in float16, while we host...

@iateadonut Yes, you can extract it into a separate function if it's useful.