text_gcn names = ['x','y','tx','ty','allx','ally','adj']分别代表什么？

names = ['x','y','tx','ty','allx','ally','adj']分别代表什么？

Open dongcy-AHU opened this issue 5 years ago • 1 comments

读了你的源码，关于这些矩阵names = ['x','y','tx','ty','allx','ally','adj']分别代表什么？比如allx => the feature vectors of both labeled and unlabeled training docs/words，你的实验数据不都是有标签的嘛，为什么会有unlabeled training docs？你在论文中说你的节点初始化为one-hot向量，而我在代码中看到你用word嵌入的平均作为doc嵌入输入，这是为什么？the one-hot labels of the labeled training docs又代表什么？关于这些x,y,tx,ty等等，我比较难懂，请求您抽出时间为我解答，非常感谢

Sep 18 '19 12:09 dongcy-AHU

@dongcy-AHU

您好，这几个数据集应该是unlabeled words ，但只要把ally的一部分设为全0，也可以有无标签稳文档。其余几个我的注释应该表明了意思。

我在build_graph.py里是用词向量平均，但后来发现效果不好，于是在train.py里通过这一句features = sp.identity(features.shape[0]) # featureless 将特征向量都变成了one-hot。为了build_graph.py的可扩展和尽量少修改代码，我就没有去掉词向量平均的代码，但实际没有用到。

the one-hot labels of the labeled training docs应该就是y这个矩阵. 假设有三个训练文档，四种类别，它应该长这样:

[ [0,0,1,0], [1,0 ,0, 0], [0, 0, 0,1] ]

代表每个训练文档是哪个label存在，也就是为1

Sep 18 '19 13:09 yao8839836

text_gcn text_gcn copied to clipboard

names = ['x','y','tx','ty','allx','ally','adj']分别代表什么？

text_gcn
text_gcn copied to clipboard