Mine_pytorch icon indicating copy to clipboard operation
Mine_pytorch copied to clipboard

Is the mutual information estimate is wrong when change the data dimension?

Open zhuhaozh opened this issue 7 years ago • 8 comments
trafficstars

Hi, I changed the data dimension from 1 to another shape for example, 32828, and modified the model similar to offered in MINE appendix.
However, I found that estimated mutual information cannot convegence to real MI which always near 0.6. Do you know the reason about this?

zhuhaozh avatar Oct 24 '18 12:10 zhuhaozh

@zhuhaozh Hi, Fig.1 in the paper experiments the effectiveness of Mine in higher dimensions. Mine is better than traditional method in higher dimension but it has a little error.

However, I found that estimated mutual information cannot convegence to real MI which always near 0.6.

It is a little strange that Mine does not work well in higher dimensions. I will write using my note's notation https://github.com/MasanoriYamada/Mine_pytorch/blob/master/note.pdf

The following part is the point of MINE image where image T is neural network. Mine optimize to Δ→0 and KL divergence is scalar Independent of P and Q dimensions. I think Deep Learning is good at estimating scalar label from high dimension.

MasanoriYamada avatar Oct 24 '18 13:10 MasanoriYamada

Hi @MasanoriYamada , I'm not familiar with MI and I'm very confused with the dimension change and slight code change. Could you please kindly answer for me?

1. the following picture shows the plot when I only change the code to this(change x and y's dimension from 1 to 10):

def gen_x():
    return np.sign(np.random.normal(0., 1., [data_size, 10]))
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, H)
        self.fc2 = nn.Linear(10, H)
        self.fc3 = nn.Linear(H, 1)

    def forward(self, x, y):
        h1 = F.relu((self.fc1(x) + self.fc2(y)))
        h2 = self.fc3(h1)
        return h2

image

2. When I changed the codes slightly(dimension still 1):

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(1, H)
        self.fc2 = nn.Linear(1, H)
        self.fc3 = nn.Linear(H * 2, 1)# change the in_channle

    def forward(self, x, y):
        # h1 = F.relu((self.fc1(x) + self.fc2(y))) # change this line to the following
        h1 = F.relu(torch.cat((self.fc1(x), self.fc2(y)), dim=1))
        h2 = self.fc3(h1)
        return h2

The plot change to this (estimatied MI almost zero) image

zhuhaozh avatar Oct 25 '18 04:10 zhuhaozh

@zhuhaozh Sorry for my late reply. I confirmed the same situation. I do not know the cause and investigate(^_^)

Sorry, I'm busy, please wait a few weak.

MasanoriYamada avatar Oct 30 '18 07:10 MasanoriYamada

@zhuhaozh Hi, Is there any progress on this? I am also working on this and I think the traditional MI can not calculate correct MI on more than 2-dimensional variables. You may consider other equation for that. So, real MI that you said is wrong. Maybe neural MI estimator is right...but i am not sure. If you already get muti-dimensional variable mutual information, let me know. I am still finding how to compute this.

jeong-tae avatar Jan 03 '19 02:01 jeong-tae

@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...

zhuhaozh avatar Jan 07 '19 07:01 zhuhaozh

@zhuhaozh Sure, the MINE can compute multi-dimensional variable case as well. For traditional way, you have to use multivariate normal distribution pdf instead of uni-variate normal distribution pdf. I am not 100% sure, i am not a mathematics expert. but seems reasonable.

jeong-tae avatar Jan 07 '19 08:01 jeong-tae

@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...

Hi! So, using the code from MINE's author based on the JD-Divergence you were able to estimate MINE for multi-dimensional variables? I haven't been able to correctly approximate even the one-dimensional case with their implementation, do you have a repository to check your code? Thank you

tiagoCuervo avatar May 03 '19 11:05 tiagoCuervo

@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...

Hi @jeong-tae It is good news that is working for you. But I noticed when we applied it to the high dimensional data, it is hard to converge and heavily affected by the structure of the network and the learning rate even different runs. So I am confused if there is anything I did wrong. Do you mind share or give a quick reply about how did you do it or have you met the similar situation? Thanks very much.

DorisxinDU avatar Jul 06 '20 20:07 DorisxinDU