robust_EM_for_gmm icon indicating copy to clipboard operation
robust_EM_for_gmm copied to clipboard

the problem with duplicated data set when updating hidden variable

Open xiaojin-hu opened this issue 4 years ago • 5 comments

The data I used was actually collected, and there are many of the same values. However, the data in the given code examples are all generated by sampling, and all data points are different. My initialization in this case: First use np.unique to remove the duplicate values of all data points, and the remaining sample points are used as the mean initialization. The corresponding cluster number is initialized using the number of sample means; the initialization of the mixing coefficient uses the mean The frequency of each data point is divided by the total data point. When the program is running, there will be problems in updating the hidden variable z: min (self.z_.sum (axis = 1)) = 0; that is, there are some data points in the data set that do not belong to all Gaussian sub-models. I look forward to your assistance in solving this problem. Thank you! Salute you

我使用的数据是实际情况下采集的,存在很多相同的值。然而所给的代码例子中的数据都是采样生成的,所有的数据点都不同。我的这种情况的初始化:首先使用np.unique把所有的数据点的重复值去掉,剩下的样本点作为均值初始化,相应聚类数初始化使用样本均值的数量;混合系数的初始化使用均值中每个数据点的频率除以总的数据点。在程序运行过程中,在更新隐变量z会出现问题:min(self.z_.sum(axis=1))=0;即数据集合中存在部分数据点不属于所有的高斯分模型。 期待您能帮助解决这个问题。谢谢!向您致敬

xiaojin-hu avatar May 17 '20 07:05 xiaojin-hu