Cross-stitch-Networks-for-Multi-task-Learning icon indicating copy to clipboard operation
Cross-stitch-Networks-for-Multi-task-Learning copied to clipboard

Parameters in cross-stitch units

Open ZhaoyangLiu-Leo opened this issue 6 years ago • 3 comments

Hi, it 's lucky to find your implementation for Cross-stitch units. Good job. Well, in the original paper, I think the cross-stitch matrix is shared by each channel or each layer. However, in your code implementation, I think your version is more like pixel-wise cross-stitch units, especially for convolution layer. Maybe it will introduce too many extra parameters.

ZhaoyangLiu-Leo avatar Nov 21 '18 09:11 ZhaoyangLiu-Leo

Sorry for the late reply. Thank you for your hint. I could be wrong. You could update the code if you like. Thanks

helloyide avatar Dec 25 '18 09:12 helloyide

Hi Helloyide, hi TowardSun, I also stumbled across your code while trying to implement Cross-Stitch units in tf.keras and agree with TowardSun's comment. As far as I can tell, Misra et al intended the cross-stitch matrix to be of size [num_tasks, num_tasks], so for two tasks, it would be like alpha=[[alpha_aa, alpha_ab], [alpha_ba, alpha_bb]. Thus, in the cross-stitch unit, for every pair of feature "pixels" x at position ij, x_ij=[x1_ij, x2_ij], the unit calculates hat{x}_ij = alpha * x_ij, which is [alpha_aa * x1_ij + alpha_ab * x2_ij, alpha_ba * x1_ij + alpha_bb * x2_ij]. This has the effect that after the CS operation, every feature pixel hat{x}_ij in task t is a linear combination of the corresponding feature pixel in each task. So alpha_aa weights how much information from task a should be included in the output for tasks a, while alpha_ab weights how much info from task b to add. What your code does is basically adding a fully-connected unit between the two task layers (without biases), which is of course a very valid (and very interesting, as I haven't come across any such publications!) approach, but gives very different results for the application at hand.

When I have finished my implementation in keras, I will share it here so you can take a look.

jpwiedekopf avatar Oct 04 '19 14:10 jpwiedekopf

Hello,

as promised, I have shared my implementation at https://gist.github.com/LtSurgekopf/659231ce03eed4579f203577eed99c6c. I don't have any results so far, but I am fairly confident that my implementation is valid. Please note that I had to transform the matrix-vector multiplication into scalar multiplications and additions, as otherwise, my machine would not be able to carry out the computations required due to the number of hidden features I am working with. This implementation is seemingly easier on the hardware.

jpwiedekopf avatar Oct 05 '19 21:10 jpwiedekopf