lorax icon indicating copy to clipboard operation
lorax copied to clipboard

feat: support lazy loading the lora module for reducing the loading p…

Open thincal opened this issue 1 year ago • 3 comments

What does this PR do?

Fixes #433

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Was this discussed/approved via a Github issue or the discord / slack channel? Please add a link to it if that's the case.
  • [ ] Did you write any new necessary tests?

Who can review?

@tgaddair

thincal avatar Apr 23 '24 07:04 thincal

It seems that caching the handle from safe_open might be a better solution, but need to consider the file handle reference management that used by multiple layers, I will refine it later.

thincal avatar Apr 24 '24 02:04 thincal

It seems that caching the handle from safe_open might be a better solution, but need to consider the file handle reference management that used by multiple layers, I will refine it later.

Still keep cache the filenames instead of filehandles, since that 1) safe_open needs the device info which differs during loading the lora modules, 2) safe_open is lazy loading until specific tensor loaded by get_tensor invoked, which is already the optimized behavior for our case.

thincal avatar May 19 '24 02:05 thincal

@tgaddair could you help review this change ?

thincal avatar May 19 '24 02:05 thincal

Looks great @thincal, thanks for the PR, and apologies for the slow review!

I had one question about the file handle, but happy to land this and iterate on it to see if there's any room to further optimize.

It is fine to land it firstly, since the safe_open is already lazy behavior and main overhead is about reading out the specific tensor.

thincal avatar May 25 '24 12:05 thincal

@thincal I noticed there's a failing test:

FAILED server/tests/adapters/test_medusa.py::test_batched_medusa_weights - safetensors_rust.SafetensorError: device cpu is invalid

Would you be able to take a look before we merge? We should be good to go once that's resolved.

tgaddair avatar May 25 '24 20:05 tgaddair

@thincal I noticed there's a failing test:

FAILED server/tests/adapters/test_medusa.py::test_batched_medusa_weights - safetensors_rust.SafetensorError: device cpu is invalid

Would you be able to take a look before we merge? We should be good to go once that's resolved.

OK, I will finish it today, thanks.

thincal avatar Jun 11 '24 01:06 thincal

@tgaddair fix passed for server/tests/adapters/test_medusa.py::test_batched_medusa_weights, the remained errors seem related with repo access failure, could you help have a check ?

thincal avatar Jun 11 '24 01:06 thincal

@tgaddair ping, sorry for my late response. Could you help review the revised commit? Thanks.

thincal avatar Jun 18 '24 09:06 thincal

Hey @thincal, very sorry for the delay here. Let me take a look now.

tgaddair avatar Aug 02 '24 22:08 tgaddair

@tgaddair it's glad to see this change be merged, and thanks for your support :)

thincal avatar Aug 04 '24 02:08 thincal