OpenAdapt
OpenAdapt copied to clipboard
Implementing RWKV-LLM (#37)
This pull requests implements RWKV's RNN model using Modal. (Issue #37 ) (Reference)
Summary: Created RWKV.py, (will move to openadapt/strategies/mixins and restructure to accept inputs)
Run using the following command (will be changed once implemented as a mixin):
modal token new
modal run .\openadapt\RWKV\RWKV.py
Implementation:
- Starts a Modal application, gpu must be set to a100 in order to run Raven-14B (largest model)
- Downloads weights for the desired model from huggingface to the modal server
- Computes and returns output
TODO:
- Change structure to function as a mixin (take inputs such as prompts or task descriptions)
- Allow for modification of parameters via config file (Temperature, Top P, Presence Penalty, etc...)
- Deploy Modal application permanently to avoid startup times (Upon startup weights must be downloaded, to avoid this we can deploy the Modal app externally and make requests to it instead)
Update: Run using the following commands
first time using modal:
modal token new
then:
modal deploy .\openadapt\RWKV\RWKV.py
python .\openadapt\RWKV\test_RWKV.py
Modal server only needs to be deployed if the user wants to use modal (modify the "USE_MODAL" variable in config.py to run locally)
test_RWKV makes a simple call and just prints the output to test functionality. To use RWKV as a mixin, refer to openadapt/strategies/mixins/rwkv.py
Note:
- Startup takes a long time with modal since weights need to be downloaded to the modal server, repeated calls are slightly faster, but still slow
- Models can be specified as follows from openadapt/strategies/mixins/rwkv.py
# Choose a model:
MODEL = config.RWKV_MODEL
# model 0: RWKV-4-Raven-14B
# model 1: RWKV-4-Raven-7B
# model 2: RWKV-4-Raven-1B5 <-- smallest model
# model 3: RWKV-4-Pile-14B
# model 4: RWKV-4-World-1.5B
Model 0 is the most effective model in most cases.
Please let me know if there are any improvements or changes to be made!
Thanks @AvidEslami ! What's the status on being able to run this locally? Can we add a setup
method that automatically runs the required setup steps?
Also please fix merge conflicts so we can get this merged 🙏
What's the status on being able to run this locally? Can we add a
setup
method that automatically runs the required setup steps?
Thanks for the comment @abrichr ! Commit 702e065 added a 'use_cuda' parameter to the function call which assists with running locally, I've tested it with the smallest model (model 2) and I was able to run it locally.
To test this, simply set USE_MODAL=False in config.py , then select the desired model and run the following command:
python .\openadapt\RWKV\test_RWKV.py
When running the model locally for the first time, HuggingFace will download and cache the model weights, but no additional setup is required.
Unfortunately I don't think it would be possible to create a setup method for the Modal use-case since it requires that the user also creates an account on the Modal website.
I will resolve the merge conflicts, but please let me know if any other changes or desired or suggested!
Do you think I should add the following model descriptions to config.py? Currently they are located in openadapt/strategies/mixins/rwkv.py
# Choose a model:
# model 0: RWKV-4-Raven-14B <-- Largest model, finetuned version of model 3
# model 1: RWKV-4-Raven-7B
# model 2: RWKV-4-Raven-1B5 <-- smallest model
# model 3: RWKV-4-Pile-14B
# model 4: RWKV-4-World-1.5B
@AvidEslami it would be nice to be able to run this with a CPU only. What do you think about https://github.com/saharNooby/rwkv.cpp ?
Do you think I should add the following model descriptions to config.py? Currently they are located in openadapt/strategies/mixins/rwkv.py
Yes please!
@abrichr Here are some test results I've gathered from various parameter setups, unfortunately the accuracy for most setups is fairly low, but please take a look and let me know if you have any suggestions for improving the tests, or creating new ones, thanks! https://docs.google.com/document/d/13RJC1V_noQX08ZWpg10Ze8Yd1nhoCceGAWojUz2cBfI/edit?usp=sharing
Thanks @AvidEslami ! I wonder if it might be more useful if we compress the prompt as much as possible, and remove any extraneous information. We should also try to remove any ambiguity (e.g. use "id" instead of "number", "descriptor" instead of "address", 0-based indexing). For example:
# Instruction:
You are given a task, and a list of signals in JSON format. Please provide only the id of the signal that is most relevant to the task.
# Input:
Task: creating a website
Signals:
[{'id': 0, 'type': 'file', 'descriptor': 'restaurant_menu_data.txt'}
,{'id': 1, 'type': 'url', 'descriptor': 'https://en.wikipedia.org/wiki/Web_development'}
,{'id': 2, 'type': 'function', 'descriptor': 'math.sqrt'}]
# Response:
Thanks @AvidEslami ! I wonder if it might be more useful if we compress the prompt as much as possible, and remove any extraneous information.
Thanks for the advice @abrichr , I modified the setup which lead to more concise results but after running Raven-14B, and World-7B results were unfortunately still inaccurate: https://docs.google.com/document/d/1xH1P2IiaKPME9gPvbueLv-PU66hkP3kLDtcE9gK6ws0/edit?usp=sharing
Please let me know if you have any other suggestions here!
As mentioned in Slack, I think it's time to start fine tuning 🤓
Signals Finetune Update:
- The following sheet contains the outputs generated by the finetuned models link
- Any advice on the approach is appreciated, there are still several things to try:
- Having a random numbers of signals, which could perhaps reduce the models chance of memorizing results
- Changing prompt structure could improve understanding
- Creating a more varied dataset (more signals / more tasks) help model catch relations
- Finetuning Pile-14B for several epochs (will try), Pile-14B is the non finetuned version of Raven-14B
As usual please let me know if you have any suggestions!