lorax icon indicating copy to clipboard operation
lorax copied to clipboard

marlin

Open flozi00 opened this issue 1 year ago • 3 comments
trafficstars

What does this PR do?

Fixes # (issue)

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Was this discussed/approved via a Github issue or the discord / slack channel? Please add a link to it if that's the case.
  • [ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

flozi00 avatar Jan 18 '24 08:01 flozi00

Looks like I need help at debugging @tgaddair The kernel is incompatible with the flash attention kernels Illegal Memory access error occures every time

docker run --pull always -v ./data:/data --gpus all -d --shm-size 1g -p 8080:80 ghcr.io/predibase/lorax:marlin --model-id TheBloke/dolphin-2.6-mistral-7B-dpo-GPTQ --quantize marlin

flozi00 avatar Jan 19 '24 14:01 flozi00

@tgaddair on the disco research server i read an comment about the incompitability with fused attention Don't have any idea if they want to support it in future or not.

I think without flash attention this feature would not makes much sense because of the much higher memory requirements for longer sequences.

Will keep this PR as draft until it's compatible but won't work actively on it

flozi00 avatar Jan 23 '24 20:01 flozi00

Thanks @flozi00 , we can hold off until that's supported then.

tgaddair avatar Jan 23 '24 21:01 tgaddair