Open-Assistant
Open-Assistant copied to clipboard
add tool to encrypt & decrypt weights
This script is to create XOR diff between fine-tuned weights and the original weights.
The llama weights fine tuned weights on OA HF friendly and sharded by 10GB, so first the original llama weights should be loaded and saved using python model_enc.py convert_llama_to_hf... Then a XOR diff can be created between the original and the finetuned weights.
HOWEVER unfortunately when encrypting and decrypting it back the weights file is not exactly the same and wouldn't load. Creating this as a draft hoping to get a second pair of eyes here on what the issue might be.
:x: pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md
I don't know much about the llama stuff, but-
If you take random files (memoryfile or dd from urand) and xor them together in different ways, do you get the original output?
Are there range constraints on the things you're xoring? Like, negative numbers etc? I'm guessing binary format is important here but I've not run the code.
I highly recommend throwing in some pytest unit tests, try hard to make them fail, then fix the failures. This usually works for me and let's me write systems that are far more complex than I can understand.
@bitplane yes as in (A XOR B) XOR A == B , the files are supose to be the same format.
However. I believe I found the issue, the files don't necessary have the same size, so the code needs to take this into account. I'll try to fix it tomorrow if no one gets to it first.
Yep I get xor, I just wanted to point out that the test driven approach is a great way to force you to understand and communicate each unit's intent.
However. I believe I found the issue, the files don't necessary have the same size, so the code needs to take this into account. I'll try to fix it tomorrow if no one gets to it first.
Cool. ~~Can you do hmac from a seed? That works with arbitrary sized files, and you can zip a sequence generator rather than a file. Chaining SHA1 is reasonably fast and resistant enough to collisions.~~
Edit: I should have read the topic! Of course you can't use hmac, it's weights.
:x: pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md
sorry to get a bit late to this. did you fix the issue?
Yes should all be working On Apr 14, 2023 at 7:48 AM -0400, Yannic Kilcher @.***>, wrote:
sorry to get a bit late to this. did you fix the issue? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
@iurimatias thanks a lot for your work on this ..