Alyssa Vance issues

Results 16 issues of


                                            Alyssa Vance

Cannot compile v0.7.0, undefined reference to "hd_drive_map"

Hi! I tried to compile v0.7.0 on my Ubuntu 22.04 laptop, following the instructions here: https://github.com/ghaerr/elks/blob/master/BUILD.md When I ran "make -j 16 all", it ran for a few minutes, then...

bug

Accelerate integration with Transformer Engine crashes when using FlashAttention

### System Info ```Shell Accelerate: 0.24.1 OS: Ubuntu 22.04 Python: 3.10 NumPy: 1.26.0 Torch: 2.1.0 Accelerate configuration: compute_environment: LOCAL_MACHINE debug: false deepspeed_config: gradient_accumulation_steps: 4 gradient_clipping: 1.0 offload_optimizer_device: none offload_param_device: none...

enhancement

feature request

8-bit optimizer crashes when fine-tuning gpt2-large

Using the bnb.optim.Adam8bit optimizer in place of torch.optim.Adam causes a crash after a handful of batches: ```12it [00:22, 1.82s/it]Error an illegal memory access was encountered at line 198 in file...

Does the compiler support 8087 hardware floating point instructions?

Am interested in using this, but haven't been able to find documentation. If I run `ia16-elf-as --help` on my machine, 8087 is listed under supported extensions: ``` -march=CPU[,+EXTENSION...] generate code...

DoRA uses lots of GPU VRAM due to fp32 upcasting

### System Info peft 0.10.0, transformers 4.40.1, Python 3.10 on Ubuntu 22.04 ### Who can help? _No response_ ### Information - [ ] The official example scripts - [X] My...

IndexError: map::at when doing torch.ops.matmul on fp32 matrices

fp16 works fine, but fp32 crashes. I'm using the latest nightly package from PyPi with Torch 2.3.0 on Ubuntu 22.04. Here's the code and stack trace: ``` >>> import torch...

Matmul errors out when one tensor is batched and another isn't

Cool idea! Proud to submit a first bug report :) This PyTorch code (Ubuntu, CUDA 12.1, Torch 2.2.2, Nvidia 4090): ``` >>> import cublas_ops >>> import torch >>> x =...

Feature request: Add Llama-style MLP with three linear layers

Llama and several other popular open-source models use an MLP design with three linear layers: ``` class LlamaMLP(nn.Module): def __init__(self, config): super().__init__() self.config = config self.hidden_size = config.hidden_size self.intermediate_size =...

General code refactoring/cleanup done to prepare adding CAV-MAE to HuggingFace

This is some general code refactoring and cleanup I've already done locally to prepare for a PR to add CAV-MAE to HuggingFace (https://github.com/huggingface/transformers/pull/28246). This uses the standard Python formatting tools...

MS-AMP crashes with DeepSpeed ZeRO 3

I am fine-tuning Facebook's OPT-1.3B on 2x 4090 GPUs, using Ubuntu 22.04, PyTorch 2.1.0, CUDA 12.1, and HuggingFace Accelerate, using this code from the HuggingFace examples repo: https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py When using...