Mahmoud Shehata comments

Results 32 comments of


                                            Mahmoud Shehata

Support EIP-2309: ERC-721 Consecutive Transfer Extension

@abcoathup I started on a new implementation that builds on top of openzeppelin ERC721 implementation. 1. I added the `event ConsecutiveTransfer()` event to IERC721.sol 2. Added a new function named...

Support EIP-2309: ERC-721 Consecutive Transfer Extension

@Soren74 recently I got started with erc2309, so if you want to share ideas lmk!

Support EIP-2309: ERC-721 Consecutive Transfer Extension

@Soren74 Sounds good! I am sure they have their reasons. Sean also created cargo which is a platform for batch minting, so I guess it's fair to be competitive and...

add deleteMessagesOnSuccess option

Can we get this in, thanks!

Ack partial of a batch of messages

This would be really helpful

How to run inference of a (very) large model across mulitple GPUs ?

> Please see [the llama multiprocess](https://github.com/huggingface/candle/blob/f48c07e2428a6d777ffdea57a2d1ac6a7d58a8ee/candle-examples/examples/llama_multiprocess/main.rs#L18) example. The multi-GPU inference is used to create parellelized linear layers: > > > https://github.com/huggingface/candle/blob/f48c07e2428a6d777ffdea57a2d1ac6a7d58a8ee/candle-examples/examples/llama_multiprocess/model.rs#L293-L308 That example is for a single node. How about...

How to run inference of a (very) large model across mulitple GPUs ?

I started a draft here for the splitting a model across multiple GPUs on different nodes. There is a mapping feature as I linked above on `mistral.rs` repo - https://github.com/huggingface/candle/issues/1936

Flash Attention not working on CUDA 12.1

Getting the same problem on a fresh aws instance as well! @hugoabonizio any luck with this problem? Do I just upgrade CUDA. Repro (even with nvidia/cuda:12.1.1-devel-ubuntu20.04 does same problem): ```...

panic without nvidia-smi

I am facing the same issue here when building a docker image on non cuda to run on cuda device ```sdh RUN --mount=type=cache,target=/usr/local/cargo/registry \ --mount=type=cache,target=/root/workspace/target \ CUDA_COMPUTE_CAP=87 cargo build --features...

FlashAttention support?

You can also add this part ```python if torch.cuda.get_device_capability()[0] >= 8: from utils.llama_patch import replace_attn_with_flash_attn replace_attn_with_flash_attn() ```