OpenAdapt Implementing RWKV-LLM (#37)

This pull requests implements RWKV's RNN model using Modal. (Issue #37 ) (Reference)

Summary: Created RWKV.py, (will move to openadapt/strategies/mixins and restructure to accept inputs)

Run using the following command (will be changed once implemented as a mixin):

modal token new
modal run .\openadapt\RWKV\RWKV.py

Implementation:

Starts a Modal application, gpu must be set to a100 in order to run Raven-14B (largest model)
Downloads weights for the desired model from huggingface to the modal server
Computes and returns output

TODO:

Change structure to function as a mixin (take inputs such as prompts or task descriptions)
Allow for modification of parameters via config file (Temperature, Top P, Presence Penalty, etc...)
Deploy Modal application permanently to avoid startup times (Upon startup weights must be downloaded, to avoid this we can deploy the Modal app externally and make requests to it instead)

Jun 01 '23 05:06 AvidEslami

Update: Run using the following commands

first time using modal:

modal token new

then:

modal deploy .\openadapt\RWKV\RWKV.py
python .\openadapt\RWKV\test_RWKV.py

Modal server only needs to be deployed if the user wants to use modal (modify the "USE_MODAL" variable in config.py to run locally)

test_RWKV makes a simple call and just prints the output to test functionality. To use RWKV as a mixin, refer to openadapt/strategies/mixins/rwkv.py

Note:

Startup takes a long time with modal since weights need to be downloaded to the modal server, repeated calls are slightly faster, but still slow
Models can be specified as follows from openadapt/strategies/mixins/rwkv.py

# Choose a model:
MODEL = config.RWKV_MODEL
# model 0: RWKV-4-Raven-14B
# model 1: RWKV-4-Raven-7B
# model 2: RWKV-4-Raven-1B5 <-- smallest model
# model 3: RWKV-4-Pile-14B
# model 4: RWKV-4-World-1.5B

Model 0 is the most effective model in most cases.

Please let me know if there are any improvements or changes to be made!

Jun 12 '23 23:06 AvidEslami

Thanks @AvidEslami ! What's the status on being able to run this locally? Can we add a setup method that automatically runs the required setup steps?

Also please fix merge conflicts so we can get this merged 🙏

Jun 16 '23 17:06 abrichr

What's the status on being able to run this locally? Can we add a setup method that automatically runs the required setup steps?

Thanks for the comment @abrichr ! Commit 702e065 added a 'use_cuda' parameter to the function call which assists with running locally, I've tested it with the smallest model (model 2) and I was able to run it locally.

To test this, simply set USE_MODAL=False in config.py , then select the desired model and run the following command:

python .\openadapt\RWKV\test_RWKV.py

When running the model locally for the first time, HuggingFace will download and cache the model weights, but no additional setup is required.

Unfortunately I don't think it would be possible to create a setup method for the Modal use-case since it requires that the user also creates an account on the Modal website.

I will resolve the merge conflicts, but please let me know if any other changes or desired or suggested!

Do you think I should add the following model descriptions to config.py? Currently they are located in openadapt/strategies/mixins/rwkv.py

# Choose a model:
# model 0: RWKV-4-Raven-14B  <-- Largest model, finetuned version of model 3
# model 1: RWKV-4-Raven-7B
# model 2: RWKV-4-Raven-1B5 <-- smallest model
# model 3: RWKV-4-Pile-14B
# model 4: RWKV-4-World-1.5B

Jun 18 '23 21:06 AvidEslami

@AvidEslami it would be nice to be able to run this with a CPU only. What do you think about https://github.com/saharNooby/rwkv.cpp ?

Do you think I should add the following model descriptions to config.py? Currently they are located in openadapt/strategies/mixins/rwkv.py

Yes please!

Jun 26 '23 01:06 abrichr

@abrichr Here are some test results I've gathered from various parameter setups, unfortunately the accuracy for most setups is fairly low, but please take a look and let me know if you have any suggestions for improving the tests, or creating new ones, thanks! https://docs.google.com/document/d/13RJC1V_noQX08ZWpg10Ze8Yd1nhoCceGAWojUz2cBfI/edit?usp=sharing

Jul 01 '23 20:07 AvidEslami

Thanks @AvidEslami ! I wonder if it might be more useful if we compress the prompt as much as possible, and remove any extraneous information. We should also try to remove any ambiguity (e.g. use "id" instead of "number", "descriptor" instead of "address", 0-based indexing). For example:

# Instruction:
You are given a task, and a list of signals in JSON format. Please provide only the id of the signal that is most relevant to the task.

# Input:
Task: creating a website

Signals: 
[{'id': 0, 'type': 'file', 'descriptor': 'restaurant_menu_data.txt'}
,{'id': 1, 'type': 'url', 'descriptor': 'https://en.wikipedia.org/wiki/Web_development'}
,{'id': 2, 'type': 'function', 'descriptor': 'math.sqrt'}]

# Response:

Jul 02 '23 14:07 abrichr

Thanks @AvidEslami ! I wonder if it might be more useful if we compress the prompt as much as possible, and remove any extraneous information.

Thanks for the advice @abrichr , I modified the setup which lead to more concise results but after running Raven-14B, and World-7B results were unfortunately still inaccurate: https://docs.google.com/document/d/1xH1P2IiaKPME9gPvbueLv-PU66hkP3kLDtcE9gK6ws0/edit?usp=sharing

Please let me know if you have any other suggestions here!

Jul 02 '23 23:07 AvidEslami

As mentioned in Slack, I think it's time to start fine tuning 🤓

Jul 04 '23 00:07 abrichr

Signals Finetune Update:

The following sheet contains the outputs generated by the finetuned models link
Any advice on the approach is appreciated, there are still several things to try:
- Having a random numbers of signals, which could perhaps reduce the models chance of memorizing results
- Changing prompt structure could improve understanding
- Creating a more varied dataset (more signals / more tasks) help model catch relations
- Finetuning Pile-14B for several epochs (will try), Pile-14B is the non finetuned version of Raven-14B

As usual please let me know if you have any suggestions!

Aug 28 '23 14:08 AvidEslami

OpenAdapt OpenAdapt copied to clipboard

Implementing RWKV-LLM (#37)

OpenAdapt
OpenAdapt copied to clipboard