pygcn icon indicating copy to clipboard operation
pygcn copied to clipboard

Could I get the node embedding?

Open zhiqiangzhongddu opened this issue 6 years ago • 7 comments

Hi,

Could I get the node embedding with a certain length based on this code? I should extract the output of which step?

Thanks,

zhiqiangzhongddu avatar Nov 04 '18 22:11 zhiqiangzhongddu

You can extract and examine any hidden layer activation and check whether it is useful as some form of graph embedding. If you train in a supervised way, then these embeddings will be very specialized for the task that you trained the model for, of course. If you want unsupervised embeddings, have a look at my code for graph auto-encoders: https://github.com/tkipf/gae

On Sun 4. Nov 2018 at 23:03 Zhiqiang ZHONG [email protected] wrote:

Hi,

Could I get the node embedding with a certain length based on this code? I should extract the output of which step?

Thanks,

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/26, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYKZtj3ZvnvnX0X33OFbSF6Fhcvr_ks5ur2QugaJpZM4YNiO4 .

tkipf avatar Nov 05 '18 08:11 tkipf

Excuse me that I'm training in a supervised way. It seems that GCN could only extract embeddings with the same length as num-class if I want to use the output of the last GCN-layer? Not available for a random defined range.

zhiqiangzhongddu avatar Nov 05 '18 09:11 zhiqiangzhongddu

In this case it’s best to simply take the embeddings just before doing the last linear projection to the softmax logits. In other words, if the last layer is softmax(AHW), take either the embedding H directly or A*H.

On 5 Nov 2018, at 10:00, Zhiqiang ZHONG [email protected] wrote:

Excuse me that I'm training in a supervised way. It seems that GCN could only extract embeddings with the same length as num-class if I want to use the output of the last GCN-layer? Not available for a random defined range.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/26#issuecomment-435800626, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYCI1VOTbDa4kclXdOmlJfNyPNgsKks5ur_5IgaJpZM4YNiO4.

tkipf avatar Nov 05 '18 09:11 tkipf

It's clear now. Thanks.

zhiqiangzhongddu avatar Nov 05 '18 09:11 zhiqiangzhongddu

I used the intermediate result H, but it seems the embeddings cannot be used for link prediction task, i.e. there is little difference between the dot product of the nodes with and without connection. It is understood because gcn is originally used for node classification instead of link prediction. Does anyone have an idea how to apply the embeddings for link predictions?

zfchen95 avatar Dec 02 '18 03:12 zfchen95

The dot product will not be a good scoring function on embeddings trained solely for classification. You can either use the embeddings from github.com/tkipf/gae which are optimized for dot-product scoring (link prediction), or you train a bilinear scoring function on top of the fixed embeddings (taken from the supervised GCN model). A bilinear scoring function looks like this: \sigma(h^TWh) where h are embeddings for nodes, W is a matrix that you train via gradient descent on some training data for link prediction and sigma is a sigmoid activation function.

On Sat 1. Dec 2018 at 22:55 Zhenfeng [email protected] wrote:

I used the intermediate result H, but it seems the embeddings cannot be used for link prediction task, i.e. there is little difference between the dot product of the nodes with and without connection. It is understood because gcn is originally used for node classification instead of link prediction. Does anyone have an idea how to apply the embeddings for link predictions?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/26#issuecomment-443479173, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYPEud18_Br-D0RtXkuAVztNVyZKKks5u008vgaJpZM4YNiO4 .

tkipf avatar Dec 02 '18 12:12 tkipf

The dot product will not be a good scoring function on embeddings trained solely for classification. You can either use the embeddings from github.com/tkipf/gae which are optimized for dot-product scoring (link prediction), or you train a bilinear scoring function on top of the fixed embeddings (taken from the supervised GCN model). A bilinear scoring function looks like this: \sigma(h^TWh) where h are embeddings for nodes, W is a matrix that you train via gradient descent on some training data for link prediction and sigma is a sigmoid activation function. On Sat 1. Dec 2018 at 22:55 Zhenfeng @.***> wrote: I used the intermediate result H, but it seems the embeddings cannot be used for link prediction task, i.e. there is little difference between the dot product of the nodes with and without connection. It is understood because gcn is originally used for node classification instead of link prediction. Does anyone have an idea how to apply the embeddings for link predictions? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYPEud18_Br-D0RtXkuAVztNVyZKKks5u008vgaJpZM4YNiO4 .

Can you please make it clear that how to adapt W matrix for link prediction tasks? To my understanding, we need the adjacency matrix of the last layer to check the linkings. But in the code 'adj' will not change.

Thanks.

IreneZihuiLi avatar Apr 07 '19 00:04 IreneZihuiLi