Hangliang Ding
Hangliang Ding
### Question (ll) root@platform:/workspace/dhl/LLaVA# bash /workspace/dhl/LLaVA/scripts/v1_5/finetune.sh [2023-12-06 16:23:00,608] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-06 16:23:07,002] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local...
Hello everyone, I am looking for some minimal test code to help me understand the implementation and functionality of this model. Does anyone know if there is already existing test...
``` python3 -m flexgen.flex_opt --model facebook/opt-30b --percent 0 100 100 0 100 0 --offload-dir /scratch/bcjw/ding3/flexgen_offload_dir --path /scratch/bcjw/ding3/opt_weights : args.model: facebook/opt-30b model size: 55.803 GB, cache size: 2.789 GB, hidden size...
For the matmul example code, I want to print the offset when set blocksize=16 and M = 30, meaning the last block doesn't have enough datas. I use `tl.device_print`, but...
ipynb file has wrong type and can only be opened in json format.
```python >>> import flash_attn >>> flash_attn.__version__ '2.5.0' >>>from flash_attn import (flash_attn_func, flash_attn_varlen_func, flash_attn_varlen_func_with_kvcache, flash_attn_with_kvcache) ImportError: cannot import name 'flash_attn_varlen_func_with_kvcache' from 'flash_attn' ```
RuntimeError: The detected CUDA version (12.2) mismatches the version that was used to compile PyTorch (11.7). Please make sure to use the same CUDA versions.
with the following script nvcc -O3 --use_fast_math -lcublas -lcublasLt layernorm_backward.cu -o layernorm_backward ./layernorm_backward 2 output ``` Using kernel 2 Checking correctness... dinp: -1.182338 -1.187500 0.236102 0.236328 0.667884 0.667969 -1.111703 -1.117188...