stoperro

Results 13 comments of stoperro

> so if you e.g. pass --threshold 60 then it will only keep allocations which lived at least for 60 seconds. I presume many (most?) of allocations are such temporary...

I don't understand why IntelliSense tries to convert paths to ALL CAPS, it could store actual information internally. Gives me similar problems with Visual Studio C++. Code compiles well with...

Happens to me on Windows too, but looks same as #3, so likely not Windows specific.

Hmm, #3 seemed like caused by to old transformers version (without PRs). I doublechecked and I do have newest transformers with the PRs, yet the issue still happens.

Ok, this might be Windows specific. The problem is on `cudaMemPrefetchAsync()` and [stack overflow](https://stackoverflow.com/a/43430831/950131) suggest GPU may not support this feature. I wrote this code to check if my GPU...

Good news is that this `cudaMemPrefetchAsync()` call may be not required for code to work - https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge8dc9199943d421bc8bc7f473df12e42: > Note that this API is not required for functionality and only serves...

I've created issue for this in TimDettmers/bitsandbytes#453 . The bad news is that likely this Paged Optimizer (to avoid OoM due to memory spikes) will likely won't work as advertised...

@johnny0213 this is my latest compiled, but I did it around 1 month ago - https://github.com/stoperro/bitsandbytes_windows/releases/tag/pre-v0.39.0-win0 , so it's not based on literally latest version of bitsandbytes. It was working...

Same here, as a workaround, commenting out `m = m.merge_and_unload()` seems to work for me. 13b inference worked with ~11GB VRAM, so this 4bit option did something positive indeed.

I do wonder also about this part from README.md: ```python quantization_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type='nf4' ), ``` Seems the code works without it, but maybe quality is affected? Dunno. Edit:...