avalanche Add Expert Gate to Avalanche

Hi, this is my initial draft of the ExpertGate Plugin. It deals with issue #1023

A few points:

The ExpertGate model is implemented as a wrapper which holds two dictionaries, one for containing Autoencoders, another for holding "Experts" (Alexnets)
Using the ExpertGate model requires using the ExpertGateStrategy, which relies on the ExpertGatePlugin.
Almost all of the action happens in the before_training_exp and before_eval_iteration methods extended by the plugin
Before training a new "expert" for each task (except the initial task)
- An autoencoder is trained
- A previous, most related expert is selected using the task_relatedness metric
- The feature module is extracted from that most related expert and plugged into the "new expert" model
- Based on the task_relatedness result, LWF is used or not for training
Before each evaluation iteration
- Each autoencoder reconstructs the input and a reconstruction error is produced
- The reconstruction errors are fed through a softmax layer to determine the most relevant autoencoder
- The expert associated with that autoencoder is selected for evaluation

I am still in the process of running the model with real scenarios to squash any silly bugs. I wanted to open this draft in case somebody wanted to check it out and suggest any pointers.

Jun 03 '22 18:06 niniack

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

avalanche/training/supervised/expert_gate.py:59:81: E501 line too long (82 > 80 characters)
avalanche/training/supervised/expert_gate.py:88:81: E501 line too long (83 > 80 characters)
avalanche/training/supervised/expert_gate.py:118:81: E501 line too long (85 > 80 characters)
avalanche/training/supervised/expert_gate.py:173:81: E501 line too long (95 > 80 characters)
avalanche/training/supervised/expert_gate.py:198:81: E501 line too long (88 > 80 characters)
avalanche/training/supervised/expert_gate.py:211:81: E501 line too long (92 > 80 characters)
avalanche/training/supervised/expert_gate.py:241:81: E501 line too long (91 > 80 characters)
avalanche/models/expert_gate.py:54:81: E501 line too long (90 > 80 characters)
8       E501 line too long (82 > 80 characters)

Jun 03 '22 18:06 ContinualAI-bot

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

avalanche/training/supervised/__init__.py:13:44: W292 no newline at end of file
avalanche/training/supervised/expert_gate.py:59:81: E501 line too long (82 > 80 characters)
avalanche/training/supervised/expert_gate.py:88:81: E501 line too long (83 > 80 characters)
avalanche/training/supervised/expert_gate.py:118:81: E501 line too long (85 > 80 characters)
avalanche/training/supervised/expert_gate.py:173:81: E501 line too long (95 > 80 characters)
avalanche/training/supervised/expert_gate.py:198:81: E501 line too long (88 > 80 characters)
avalanche/training/supervised/expert_gate.py:211:81: E501 line too long (92 > 80 characters)
avalanche/training/supervised/expert_gate.py:241:81: E501 line too long (91 > 80 characters)
avalanche/models/__init__.py:25:56: W292 no newline at end of file
avalanche/models/expert_gate.py:54:81: E501 line too long (90 > 80 characters)
8       E501 line too long (82 > 80 characters)
2       W292 no newline at end of file

Jun 03 '22 19:06 ContinualAI-bot

Pull Request Test Coverage Report for Build 5071596552

201 of 217 (92.63%) changed or added relevant lines in 6 files are covered.
5 unchanged lines in 3 files lost coverage.
Overall coverage increased (+0.2%) to 72.322%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
avalanche/training/supervised/expert_gate.py	98	101	97.03%
avalanche/models/expert_gate.py	77	90	85.56%
<!--	Total:	201	217

Files with Coverage Reduction	New Missed Lines	%
avalanche/benchmarks/scenarios/new_classes/nc_scenario.py	1	94.48%
avalanche/benchmarks/scenarios/online_scenario.py	2	95.96%
avalanche/benchmarks/utils/flat_data.py	2	90.55%
<!--	Total:	5

Totals
Change from base Build 5001854992:	0.2%
Covered Lines:	15853
Relevant Lines:	21920

💛 - Coveralls

Jun 03 '22 19:06 coveralls

Thanks! Let me know when you are ready for a full code review. From a quick glance it looks quite good.

As a small suggestion, if you move the code from before_eval_iteration inside the forward method (only during eval mode) I think it would be better. This way, you can call the model forward outside the avalanche strategy, which is always a good thing.

Jun 06 '22 08:06 AntonioCarta

I'm trying to update my expert-gate branch to incorporate changes from avalanche/main, how do you suggest I go about this?

I want to do this on my end so I can test with the latest updates

Jun 14 '22 08:06 niniack

You can do a git pull upstream master on your local branch. The recent changes should not break anything in your code.

Jun 14 '22 09:06 AntonioCarta

Fair, I was trying to rebase to avoid the unnecessary merge commit but that didn't go very well. Maybe in another lifetime I'll understand how to do it :D

Thanks, I'll move stuff out of before_eval_iteration into the forward method (during eval mode). I'll drop another message here when I'm more comfortable with a code review!

Jun 14 '22 10:06 niniack

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

examples/expert_gate_example.py:33:81: E501 line too long (92 > 80 characters)
examples/expert_gate_example.py:36:81: E501 line too long (86 > 80 characters)
examples/expert_gate_example.py:63:81: E501 line too long (130 > 80 characters)
3       E501 line too long (92 > 80 characters)

Jun 14 '22 12:06 ContinualAI-bot

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

avalanche/training/supervised/expert_gate.py:286:81: E501 line too long (87 > 80 characters)
examples/expert_gate_example.py:33:81: E501 line too long (92 > 80 characters)
examples/expert_gate_example.py:37:81: E501 line too long (86 > 80 characters)
examples/expert_gate_example.py:65:81: E501 line too long (130 > 80 characters)
4       E501 line too long (87 > 80 characters)

Jun 14 '22 18:06 ContinualAI-bot

Hi,

I have arrived at a point where I think I would greatly benefit from an additional set of eyes.

My current concerns are:

The loss for the "expert" model doesn't go down. Things I thought were culprits but I have ruled out:

I thought the optimizer wasn't getting the right parameters. After checking that the model's parameters are the same object as the expert's parameters, I don't think this is the case
The loss was wrong because the output vector of the classifier was of size = number of classes and the target was a single value. I thought I had to one-hot the target, but, it seems pytorch's cross entropy loss handles this gracefully and I don't have to one-hot anything.
The learning rate was badly picked, but I've experimented with a few.

I think I am not using the MultiTaskModule correctly. I was initially using the forward_single_task method but it kept giving me errors regarding broadcasting the output of the classifier (e.g. 10 classes for mnist) into the highest target value of that task (maybe task 0 has digit 3, so it wants to broadcast [mb_size, 10] to [mb_size, 3])
(Kind of a minor concern) There is a ton of confusing output because so many things are getting trained and evaluated in the strategy. I guess this is a more "how do you guys handle this"-stylistic question

P.S. I'm running the ExpertGate model with the examples/expert_gate_example.py. I realize its redundantly named and not a very elegant file, I plan on fixing this at the end :)

EDIT: I realize you also suggested to move execution from before_eval_iteration into the forward method in eval mode. I didn't make that move yet because of point 2. I think I have just misunderstood which forward method I should be extending (implementing?) in the overall model class.

Jun 14 '22 18:06 niniack

Sorry for the late answer, I was at CVPR and I had a bit of backlog to work on after that.

The loss for the "expert" model doesn't go down

Is the problem in the autoencoder or the final model?

The loss was wrong because the output vector of the classifier was of size = number of classes and the target was a single value.

This is correct. If you do one-hot encoding it doesn't work.

The learning rate was badly picked, but I've experimented with a few.

Are you following the original paper? Do you have the same hyperparameters?

I think I am not using the MultiTaskModule correctly.

ExpertGate doesn't need task labels so you don't need a MultiTaskModule.

(Kind of a minor concern) There is a ton of confusing output because so many things are getting trained and evaluated in the strategy. I guess this is a more "how do you guys handle this"-stylistic question

yes, this is a problem with nested training but I don't have a solution right now. The best thing you can do is to reduce the amount of logging for the internal training.

Jun 30 '22 16:06 AntonioCarta

Thank you so much for taking the time, your code review is really appreciated. I will double check the things you have pointed out and get back.

Is the problem in the autoencoder or the final model?

The loss for autoencoder goes down. It is the AlexNet "expert" that doesn't seem to train. Consequentially, the final model performs poorly. Although, I will double check this claim after some of the things you pointed out in your code review.

This is correct. If you do one-hot encoding it doesn't work.

Yep, got it. I haven't done one-hot encoding here. Thankfully shot that idea down by reading some of the documentation :D

Are you following the original paper? Do you have the same hyperparameters?

Yes, I'm following the original paper. What I meant to say was: I thought this was an issue, but I've ruled it out by experimenting with a few values.

ExpertGate doesn't need task labels so you don't need a MultiTaskModule.

I see. I think I understand and it makes sense with my intuition. I will circle back if I realize I didn't understand.

Thanks again for this! I understand that you must be busy and these code reviews take time, but it was incredibly helpful to have you take a look! Hopefully, I can get this all cleared up soon!

Jul 01 '22 15:07 niniack

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

avalanche/training/supervised/expert_gate.py:137:81: E501 line too long (84 > 80 characters)
avalanche/training/supervised/expert_gate.py:142:81: E501 line too long (82 > 80 characters)
avalanche/training/supervised/expert_gate.py:299:81: E501 line too long (87 > 80 characters)
examples/expert_gate_example.py:33:81: E501 line too long (92 > 80 characters)
examples/expert_gate_example.py:37:81: E501 line too long (86 > 80 characters)
examples/expert_gate_example.py:69:81: E501 line too long (131 > 80 characters)
6       E501 line too long (84 > 80 characters)

Jul 04 '22 20:07 ContinualAI-bot

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

avalanche/models/expert_gate.py:20:81: E501 line too long (110 > 80 characters)
avalanche/models/expert_gate.py:28:81: E501 line too long (102 > 80 characters)
avalanche/models/expert_gate.py:84:81: E501 line too long (88 > 80 characters)
avalanche/models/expert_gate.py:133:81: E501 line too long (101 > 80 characters)
avalanche/training/supervised/expert_gate.py:9:81: E501 line too long (83 > 80 characters)
avalanche/training/supervised/expert_gate.py:21:81: E501 line too long (113 > 80 characters)
avalanche/training/supervised/expert_gate.py:62:81: E501 line too long (86 > 80 characters)
avalanche/training/supervised/expert_gate.py:63:81: E501 line too long (103 > 80 characters)
avalanche/training/supervised/expert_gate.py:99:81: E501 line too long (496 > 80 characters)
avalanche/training/supervised/expert_gate.py:128:81: E501 line too long (106 > 80 characters)
avalanche/training/supervised/expert_gate.py:159:81: E501 line too long (85 > 80 characters)
avalanche/training/supervised/expert_gate.py:164:81: E501 line too long (107 > 80 characters)
avalanche/training/supervised/expert_gate.py:169:81: E501 line too long (140 > 80 characters)
avalanche/training/supervised/expert_gate.py:225:81: E501 line too long (152 > 80 characters)
avalanche/training/supervised/expert_gate.py:242:81: E501 line too long (124 > 80 characters)
avalanche/training/supervised/expert_gate.py:277:81: E501 line too long (120 > 80 characters)
avalanche/training/supervised/expert_gate.py:291:81: E501 line too long (106 > 80 characters)
avalanche/training/supervised/expert_gate.py:301:81: E501 line too long (87 > 80 characters)
examples/expert_gate_example.py:30:81: E501 line too long (104 > 80 characters)
examples/expert_gate_example.py:66:81: E501 line too long (131 > 80 characters)
20      E501 line too long (110 > 80 characters)

Jul 06 '22 20:07 ContinualAI-bot

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

examples/expert_gate.py:68:1: E128 continuation line under-indented for visual indent
1       E128 continuation line under-indented for visual indent

Jul 06 '22 21:07 ContinualAI-bot

Hello,

I have made several notable changes:

I figured out why the expert wasn't training: the optimizer wasn't updating the model's parameters as a new expert was being selected. I make use of the update_optimizer method now to make sure this happens. Training happens now.
As recommended, I moved the expert selection logic from before_eval_iteration to the forward method when not in training mode so that a typical inference call could make use of that logic as well.
I used the functional methods where I could
I stopped using the MultiTaskModule for the ExpertGate class
I modified how LwF is called to make it slightly more readable and also fix an issue where it was reusing the same plugin between experts. It should be using a "fresh" plugin to avoid cross-contamination.
I added more documentation and cleaned up the example file.

I am a lot more confident about this submission and believe I have largely managed to get ExpertGate working in Avalanche!

Any suggestions are appreciated :)

Jul 06 '22 21:07 niniack

Thanks, this looks much cleaner. You also fixed all the previous issues.

Did you check the final performance? How does it compare to the original paper?

Jul 07 '22 07:07 AntonioCarta

Hello,

Thank you for your patience. I spent some time to do aggressive testing (and breaking of things) to eventually fix the following concerns:

Using deep copies to prevent irrelevant weights from changing (major bug)
Reset optimizer instead of updating it. The latter was adding parameters instead of replacing (major bug)
Fixing where tensors exist when a GPU is available. It's handled more strictly now with the device argument being passed around to every object.
Updated how pretrained weights from AlexNet are grabbed, old method is going to be deprecated (minor)
General code cleanup

Also, as suggested, I set up the scenario from the original paper. The simplest experiment from the paper uses MIT Indoor Scenes 67, Caltech Birds 200, and Oxford Flowers 102, and treats each of these datasets as a separate task. Please find the results below.

Source	Method	Scenes	Birds	Flowers	avg
Original Paper	Fine-tuned AlexNet on Scenes	63.4	-	-	-
Original Paper	Fine-tuned AlexNet on Scenes -> Birds	50.3	57.3	-	-
Original Paper	Fine-tuned AlexNet on Scenes -> Birds -> Flowers	46.0	43.9	84.9	58.2
This Pull Request	Fine-tuned AlexNet on Scenes	56.9	-	-	-
This Pull Request	Fine-tuned AlexNet on Scenes -> Birds	36.3	45.1	-	-
This Pull Request	Fine-tuned AlexNet on Scenes -> Birds -> Flowers	32.2	35.4	65.2	44.3

Source	Method	Scenes	Birds	Flowers	avg
Original Paper	ExpertGate	63.5	57.6	84.8	68.6
This Pull Request	ExpertGate	58.9	45.3	65.6	56.6

Notes on the results:

The original paper does not describe many hyperparameters aside from the latent dimensions of the Autoencoder (100 neurons).
For all experiments, I trained with
- 50 epochs for AlexNets
- 10 epochs for autoencoders (relevant only in ExpertGate)
- SGD momentum=0.9 weight_decay=0.0005
- AlexNet learning rate 1e-3 with StepLR, stepsize 20 and gamma 1e-1
- AutoEncoder learning rate 5e-4 (relevant only in ExpertGate)

Despite the fact that the accuracies my experiments achieved are lower than that from the paper, I claim that the ExpertGate implementation here works as expected.

Considering only the first task (scenes), the naive fine-tuning on AlexNet results in catastrophic forgetting of MIT Scenes by the time it fine-tunes on the final task (Oxford Flowers). On the other hand, after training on all three tasks, the ExpertGate model achieves similar accuracy to an AlexNet fine-tuned only on the first task. Simply put, the fine-tuned AlexNet kind of forgets about Scenes by the end, but the ExpertGate doesn't forget at all (unsurprising, because it has a separate "expert" for each task)

I attribute the lower accuracies in my results to my hyperparameter selection. I don't have access to what configuration was used for training by the authors. I understand that there is a GH repo by one of the authors, but its not easy to parse. From some experimentation and reading around, I came up with these sensible and typical hyperparameters. Also, my results seem to be consistently ~10% worse, which is why I don't think there is something lacking with the implementation of ExpertGate.

Hopefully, this is a meaningful contribution to Avalanche. Thank you for reviewing this for me.

Jul 28 '22 13:07 niniack

Hey @niniack thanks for all you work. I agree with you that the performance gap seems to be due a problem in the base performance of the model. I think the code is ready to be merged in Avalanche in its current state.

To warn the users that there may still be some minor issues, I suggest to add a warning inside the constructor:

warnings.warn("This strategy is currently in the alpha stage and we are still working to reproduce the original paper's results. You can find the code to reproduce the experiments at github.com/continualAI/continual-learning-baselines")

So, I would ask you to do a few last things before merging the PR:

adding the warning.
adding a test under tests/training/test_strategies.py. You can copy/paste the one for the other strategies.
(after the PR) can you push your scripts to the continual-learning-baselines repository? This will help us to reproduce the results and possibly match the paper's performance in the future.

Thanks again for all your work.

Aug 01 '22 15:08 AntonioCarta

Hi, thank you for taking the time to review the work.

I will add the warning message and add a test in tests/training/test_strategies.py.

In regards to the baselines repository: I can publish the script I used but I wrote a few custom dataloading functions for the datasets. I wouldn't be worried if I was using Pytorch datasets, but in this case I had to download these datasets myself since they aren't on Pytorch. So, is it okay for me to include an "auto-download" the way Pytorch does?

Aug 01 '22 16:08 niniack

All the datasets that we provide in Avalanche have automatic download if possible. You can put them in avalanche.benchmarks.datasets. Check Tiny ImageNet for a simple example of the API that we follow. Also, in the script for continual-learning-baselines you can put any custom code that you need to reproduce the experiments, it's not a problem.

Aug 01 '22 17:08 AntonioCarta

Hi @niniack, do you have any updates?

Sep 06 '22 08:09 AntonioCarta

Hi, thanks for your patience. I had to take a pause but incidentally noticed a not-instantly-fixable bug while preparing the final fixes for this PR. Now that things are picking up for me again, I expect to be able to wrap it up within a week.

Thanks for checking in

Sep 12 '22 18:09 niniack

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

tests/training/test_strategies.py:786:41: E225 missing whitespace around operator
tests/training/test_strategies.py:789:38: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:30:1: E302 expected 2 blank lines, found 1
examples/expert_gate.py:76:37: E225 missing whitespace around operator
examples/expert_gate.py:79:34: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:89:37: E225 missing whitespace around operator
examples/expert_gate.py:92:34: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:96:1: E302 expected 2 blank lines, found 1
examples/expert_gate.py:98:7: E275 missing whitespace after keyword
examples/expert_gate.py:99:5: E115 expected an indented block (comment)
examples/expert_gate.py:100:5: E115 expected an indented block (comment)
examples/expert_gate.py:101:5: E115 expected an indented block (comment)
examples/expert_gate.py:102:5: E115 expected an indented block (comment)
examples/expert_gate.py:121:29: E128 continuation line under-indented for visual indent
examples/expert_gate.py:122:29: E128 continuation line under-indented for visual indent
examples/expert_gate.py:123:29: E128 continuation line under-indented for visual indent
examples/expert_gate.py:127:1: E302 expected 2 blank lines, found 1
examples/expert_gate.py:162:1: E305 expected 2 blank lines after class or function definition, found 1
examples/expert_gate.py:172:81: E501 line too long (84 > 80 characters)
4       E115 expected an indented block (comment)
3       E128 continuation line under-indented for visual indent
3       E225 missing whitespace around operator
3       E251 unexpected spaces around keyword / parameter equals
1       E275 missing whitespace after keyword
3       E302 expected 2 blank lines, found 1
1       E305 expected 2 blank lines after class or function definition, found 1
1       E501 line too long (84 > 80 characters)

Sep 28 '22 03:09 ContinualAI-bot

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

tests/training/test_strategies.py:789:37: E251 unexpected spaces around keyword / parameter equals
tests/training/test_strategies.py:789:39: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:80:33: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:80:35: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:90:37: E225 missing whitespace around operator
examples/expert_gate.py:93:33: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:93:35: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:100:7: E275 missing whitespace after keyword
examples/expert_gate.py:169:9: E128 continuation line under-indented for visual indent
examples/expert_gate.py:177:81: E501 line too long (84 > 80 characters)
1       E128 continuation line under-indented for visual indent
1       E225 missing whitespace around operator
6       E251 unexpected spaces around keyword / parameter equals
1       E275 missing whitespace after keyword
1       E501 line too long (84 > 80 characters)

Sep 28 '22 03:09 ContinualAI-bot

Oh no! It seems there are some PEP8 errors! 😕 Don't worry, you can fix them! 💪 Here's a report about the errors and where you can find them:

tests/training/test_strategies.py:789:37: E251 unexpected spaces around keyword / parameter equals
tests/training/test_strategies.py:789:39: E251 unexpected spaces around keyword / parameter equals
examples/expert_gate.py:100:7: E275 missing whitespace after keyword
examples/expert_gate.py:177:81: E501 line too long (84 > 80 characters)
2       E251 unexpected spaces around keyword / parameter equals
1       E275 missing whitespace after keyword
1       E501 line too long (84 > 80 characters)

Sep 28 '22 04:09 ContinualAI-bot

Hi @AntonioCarta,

Updates:

Significantly cleaned up the example file
Bug has been identified and I added a note at the top in the example file as well as fixed the example
Added alpha warning
Added test

Once the PR is merged, I will add the benchmark script to the baselines repository

Sep 28 '22 04:09 niniack

Thanks! everything looks in order now. Can you fix the merge issues? We removed the suppress_warning argument from the logger (we removed the warnings).

I will merge the PR as soon as the CI is green.

Oct 03 '22 14:10 AntonioCarta

Hi niniack, I'm still getting a bunch of errors from your strategy:

Error
Traceback (most recent call last):
  File "D:\OneDrive - University of Pisa\Uni\code_repo\avalanche\tests\training\test_strategies.py", line 834, in test_expertgate
    model = ExpertGate(shape=(3, 227, 227), device=self.device)
  File "D:\OneDrive - University of Pisa\Uni\code_repo\avalanche\avalanche\models\expert_gate.py", line 181, in __init__
    models.__dict__[arch](
  File "C:\Users\w-32\Anaconda3\envs\avalanche-env\lib\site-packages\torchvision\models\alexnet.py", line 62, in alexnet
    model = AlexNet(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'weights'

Dec 09 '22 09:12 AntonioCarta

Hi, apologies it has been a very busy few months. I'll take a look at what I need to wrap this PR up soon!

Jan 13 '23 09:01 niniack