How to turn off task-specific component in initial layer.
Hi Ayman,
I am trying to implement one of the architectures you mention in your paper relating to this repository. Specifically I am interested in MTL-DGP* where the task-specific component is turned off.
I am not sure how to do this from reading the code. I understand that in order to switch off the shared components I set the MTL variable to false. Is there a similar way for this to be done for the task specific components?
Thanks
Hi Pavlos,
Apologies for the lack of good documentation. It is currently being worked on.
The multitask attribute in Layer objects propagates task labels through the cascade. It is not related to sharing or task-specific components.
To create a model resembling MTL-DGP* you have to do this in the kernels. More specifically, you should be using a kernel that acts on all your data regardless on task label. The SwitchedKernel object uses different kernels for different tasks. So to have a completely shared middle layer, you should refrain from using this kernel in that Layer. You can still use the MultiKernelLayer object if you want you latent space to have multiple types of processes, even if they are all shared between the tasks.
I hope this clarifies things. Please let me know if you have any follow up questions.
Ayman
Hello Ayman,
First of all congratulations for your amazing work and thanks for sharing your code. I am reading your multi-task notebook and trying to relate it to the MTL-DGP models from the paper. I have a few questions which I think are related Pavlos's.
The model in the notebook apparently creates a 2 layer model, with an input layer composed of 2 shared kernels, and an outer layer with task specific kernels. Are my assumptions correct?
If so, how would you create something similar to MTL-DGP, with an input layer with shared and task specific components, and an output layer that combine those 2?
Lastly, why does the multi-kernel layer have the restriction that the output dimension must be a multiple of the number of kernels? I mean, thinking about shared kernels and using the paper notation (equation 1), you could have i=1 to I GP priors, with I being independent of the number of tasks, right?
Thanks, Rafael
Hi Rafael,
Thanks for your interest and kind words about my work.
Your assumption is correct about the model in the notebook, if you want to create something similar to mMDGP (MTL-DGP in the old version of the paper) you need to use the SwitchedKernel object. In my code, task specific processes are handled on the kernel level and the SwitchedKernel object wrapped around a list of kernels handles this. The output layer should stay the same as in the notebook.
The reason for restricting the output dimension to be a multiple of the number of kernels is purely for ease of implementation. In theory, you can use an arbitrary number of kernels and outputs associated with their processes; however, this was difficult to implement so I opted for this restriction.
For a more up to date version of the code which is compatible with the alpha version of GPFlow 2.0 check out the gpflow2.0-port branch. The multitask notebook in this branch contains an example of how to construct mMDGP. This version depends on my own fork of GPFlow which is included in the requirements.txt file. To install the requirements you can create a virtual environment and run pip -r requirements.txt. There are plans to port this to the stable version of GPFlow 2.0 but I haven't had the time yet.
Let me know if you have any further question and good luck on your work!
Thanks a lot for the quick reply. What version of tensorflow do you run with the gpflow2.0 branch?
This is compatible with Tensorflow 2.0 (not 2.1). The exact version is 2.0.0-dev20190916. I know this is quite brittle, so happy to help with getting this to work. I will be porting this to the stable version of GPflow in the next few months.
I got it working with tf 2.0, thanks!