graphsage-simple icon indicating copy to clipboard operation
graphsage-simple copied to clipboard

Do you have plan to implement unsupervised versions of GraphSAGE?

Open chenwgen opened this issue 7 years ago • 12 comments

Hi, do you have plan to implement unsupervised versions of GraphSAGE? thanks.

chenwgen avatar Jan 04 '18 06:01 chenwgen

You just need to change the loss function to Eq. 1 from the original paper.

unsuthee avatar Apr 05 '18 22:04 unsuthee

@unsuthee have you implemented it? I have changed the loss function to Eq. 1 but the embeddings do not make sense: connected nodes do not have close embeddings. They are different from the tensorflow version.

chithangduong avatar May 09 '18 08:05 chithangduong

I've implemented the unsupervised version, by training the model using random walks or network edges. However, the converging is wired. Discussions are welcome.

HongxuChenUQ avatar Jun 06 '18 01:06 HongxuChenUQ

One thing that is necessary is to constrain the embeddings to be unit length. This is mentioned in the appendix of the paper I think. For instance you can use cosine instead of the dot product to achieve this. Though this is a minor thing, it can have a big impact on convergence.

williamleif avatar Jun 24 '18 13:06 williamleif

Sadly, I don't plan on implementing the unsupervised any time soon, but pull requests are welcome! :)

williamleif avatar Jun 24 '18 13:06 williamleif

@williamleif That makes sense now! Thanks for pointing out!

HongxuChenUQ avatar Jun 24 '18 14:06 HongxuChenUQ

Hi @HongxuChenUQ, is it possible for you to share the loss code you made for unsupervised version?

Actually, I tried to combine loss_label and loss_network and found the F1 score lifts from 0.88 to 0.93. But when I leave the loss_network alone, there will be none grad to the model's weight. Since I am new to PyTorch, it is really annoying! I can't figure out the problem.

Below is my loss code, where nodes and negtive_samples are node lists.

def loss(self, nodes, negtive_samples, num_neighs, labels):
        loss_list = []
        z_negtive_samples = self.enc(negtive_samples).t()
        z_querys = self.enc(nodes).t()
        for i,query in enumerate(nodes):
            z_query = z_querys[i]
            neighbors = list(self.adj_lists[int(query)])[:num_neighs]
            z_neighbors = self.enc(neighbors).t()
            pos = torch.min(torch.sigmoid(torch.tensor([torch.dot(z_query,z_neighbor) for z_neighbor in z_neighbors]))).requires_grad_()
            neg = torch.max(torch.sigmoid(torch.tensor([torch.dot(z_query,z_ns) for z_ns in z_negtive_samples]))).requires_grad_()
            loss_list.append(torch.max(Variable(torch.tensor(0.0)),neg-pos+self.margin))
        loss_net = Variable(torch.mean(torch.tensor(loss_list)),requires_grad=True)
        scores = self.forward(nodes)
        loss_sup = self.xent(scores, labels.squeeze())
        return loss_sup+loss_net

fs302 avatar Jul 03 '18 03:07 fs302

@HongxuChenUQ Really appreciate that! What about your performance of F1 score?

fs302 avatar Jul 03 '18 06:07 fs302

@fs302 I've tested it on AUC performance, it is good. You will have to train a classifier if you want to test it on F1 score.

HongxuChenUQ avatar Jul 03 '18 07:07 HongxuChenUQ

@HongxuChenUQ Yes, I use a 2-layer NN as downstream classifier, but only achieve F1=0.31, which is much lower than End-to-End supervised version(F1=0.84) on the same embedding setting.

I wonder if it might be the difference of positive & negative pair sampling.

fs302 avatar Jul 03 '18 08:07 fs302

Thanks to @HongxuChenUQ, after tuning the learning rate of downstream classifier and generate a robust negative samples, the best F1 hit 0.76 for unsupervised version.

fs302 avatar Jul 03 '18 13:07 fs302

@fs302 @ @HongxuChenUQ is it possible to share the code for unsupervised version?

ghost avatar Jan 31 '19 17:01 ghost