Lei Yang comments

Results 79 comments of


                                            Lei Yang

over cluster of GCN-V

Hi @zhaoxin111 @yjhuasheng @jquesadap , thanks for the discussion. I think solving the mentioned problem is the key to increase the recall of the clustering results. We may need to...

over cluster of GCN-V

@Youskrpig Hi, thanks for trying. I suspect the reason may lie in the version of PyTorch, as the result of GCN_V in your test, `F_P=33.59` and `F_P=59.41`, is close to...

the relationship between model extracting features and clustering result

Hi @qiudi0127 , thanks for the question. I think it is similar to https://github.com/yl-1993/learn-to-cluster/issues/50. Basically, it is very likely that better feature extractor would lead to better clustering results. However,...

the relationship between model extracting features and clustering result

Hi @qiudi0127 , it seems that the `k` is set to 15 instead of 80 for testing?

make gcn-v to very large scale dataset

Hi @tranorrepository , sorry for the late reply. For settings as large as 5.2M, we use CPU instead of GPU for GCN inference, as the memory demand will exceed the...

关于性能

你好，LGCN是对原paper的复现，可以试试作者提供的代码库[gcn_clustering](https://github.com/Zhongdao/gcn_clustering)。如果原代码库没有这个问题，那说明是此代码库复现的问题；如果原代码库也有此问题，可能需要咨询一下原作者。

the public share nodes

@ltm920716 不只有“包含关系”，还有“存在交集但不包含”。当用iterative的方式生成proposal时，是包含关系；当取不同阈值或者取不同knn来生成proposal时，大部分是“存在交集但不包含”的子图。

the public share nodes

@ltm920716 关于提到的三个问题： - 这是一个动态阈值的切分，流程简单来说是这样的：取一个边权的基础阈值，切分成多个联通域，如果联通域超过了size的阈值，则增大边权的阈值，直到所有联通域都满足size的阈值。所以0.6的阈值不一定所有proposal都包含0.7的proposal。 - 验证方式很简单，可以对算法跑出来的proposal计算一下包含的关系 - 不同的knn指的是用同样的knn算法，但取不同数目的k

the public share nodes

@ltm920716 谢谢尝试。抱歉之前回复草率，在此做一下更正： - 当用iterative的方式生成proposal时，或者相同k取不同阈值时是包含关系；当取不同的K来生成proposal时，存在“交集但不包含”的子图。（当K发生变化时，graph的结构发生了变化，因此会出现“交集不包含”的proposals。例如可以尝试k=80,th=0.7和k=30,th=0.7这两种proposals，我用程序验证了一下，其中存在“交集但不包含“的proposal。） - 上面提到的“包含”或者“交集不包含”关系都是针对节点而言，如果将“包含”定义为“点和边都是子集”，取不同的K时，满足包含关系的proposal就会少很多。 - 在论文中的ablation的第一部分，分析了不同proposal数量，iteration次数和性能的关系，一般来说越多的proposal会得到越好的训练结果，当proposal的增多已经无法增加训练的性能，就可以加入proposal的迭代。如果想自己训练模型，建议可以先尝试少量的proposal策略看是否有提升，再按需增加计算量来获取更好的精度。对上述第一点中阈值变化的情况，可以有个简短的证明：考虑两个阈值th_1和th_2下有交集的proposal_1和proposal_2，不失一般性，假设th_1 > th_2， (1) 由定义可知，proposal_1的所有边阈值大于th_1，proposal_2的所有边大于th_2; (2) 由(1)可知，proposal_1中所有边都满足proposal_2的条件; (3) 因为th_1 > th_2，说明|proposal_1|

the public share nodes

@ltm920716 谢谢回复。 - 关于BCubed。简单而言，BCubed和Pairwise的区别可以理解为BCubed会根据类的大小来做加权。这篇[Face Clustering: Representation and Pairwise Constraints](https://arxiv.org/pdf/1706.05067.pdf)中有对这两种测试方式的详细说明，更详细的说明可以参考提出BCubed的[A comparison of extrinsic clustering evaluation metrics based on formal constraints](https://link.springer.com/article/10.1007/s10791-008-9066-8) - 关于mmcv。这是一个好问题，我们没有mmcv的相关文档。考虑到mmcv是一个通用的库，覆盖面广，我们可以考虑对这个repo中和mmcv有交集的接口做一些详细的说明。