Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

add tool to encrypt & decrypt weights

Open iurimatias opened this issue 2 years ago • 7 comments

This script is to create XOR diff between fine-tuned weights and the original weights. The llama weights fine tuned weights on OA HF friendly and sharded by 10GB, so first the original llama weights should be loaded and saved using python model_enc.py convert_llama_to_hf... Then a XOR diff can be created between the original and the finetuned weights.

HOWEVER unfortunately when encrypting and decrypting it back the weights file is not exactly the same and wouldn't load. Creating this as a draft hoping to get a second pair of eyes here on what the issue might be.

iurimatias avatar Mar 20 '23 21:03 iurimatias

:x: pre-commit failed. Please run pre-commit run --all-files locally and commit the changes. Find more information in the repository's CONTRIBUTING.md

github-actions[bot] avatar Mar 20 '23 21:03 github-actions[bot]

I don't know much about the llama stuff, but-

If you take random files (memoryfile or dd from urand) and xor them together in different ways, do you get the original output?

Are there range constraints on the things you're xoring? Like, negative numbers etc? I'm guessing binary format is important here but I've not run the code.

I highly recommend throwing in some pytest unit tests, try hard to make them fail, then fix the failures. This usually works for me and let's me write systems that are far more complex than I can understand.

bitplane avatar Mar 20 '23 23:03 bitplane

@bitplane yes as in (A XOR B) XOR A == B , the files are supose to be the same format.

However. I believe I found the issue, the files don't necessary have the same size, so the code needs to take this into account. I'll try to fix it tomorrow if no one gets to it first.

iurimatias avatar Mar 20 '23 23:03 iurimatias

Yep I get xor, I just wanted to point out that the test driven approach is a great way to force you to understand and communicate each unit's intent.

However. I believe I found the issue, the files don't necessary have the same size, so the code needs to take this into account. I'll try to fix it tomorrow if no one gets to it first.

Cool. ~~Can you do hmac from a seed? That works with arbitrary sized files, and you can zip a sequence generator rather than a file. Chaining SHA1 is reasonably fast and resistant enough to collisions.~~

Edit: I should have read the topic! Of course you can't use hmac, it's weights.

bitplane avatar Mar 21 '23 00:03 bitplane

:x: pre-commit failed. Please run pre-commit run --all-files locally and commit the changes. Find more information in the repository's CONTRIBUTING.md

github-actions[bot] avatar Mar 23 '23 20:03 github-actions[bot]

sorry to get a bit late to this. did you fix the issue?

yk avatar Apr 14 '23 11:04 yk

Yes should all be working On Apr 14, 2023 at 7:48 AM -0400, Yannic Kilcher @.***>, wrote:

sorry to get a bit late to this. did you fix the issue? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

iurimatias avatar Apr 14 '23 12:04 iurimatias

@iurimatias thanks a lot for your work on this ..

andreaskoepf avatar Apr 28 '23 20:04 andreaskoepf