pytorch-topological
pytorch-topological copied to clipboard
Replicate the experiments of spheres from TopoAE
Hi,
Thanks for the great work and developing this library. I am trying to replicate the experiments from the Topological autoencoder paper using the script given as an example, but I can't. I used the hyperparameters from best run file in the original repo, and the DeepAE, but I still can't replicate. Is there something that I am missing?
Thanks!
Hey there!
This is a new implementation, so maybe there are some slight differences. Can you let me know what exactly doesn't work?
Hi Bastian,
Thanks for the quick response. I attach the results and the code I am using
n_spheres = 11
r = 5
d = 100
dataset_train = Spheres(n_spheres=n_spheres,
r = r,
d = d)
train_loader = DataLoader(
dataset_train,
batch_size=28,
shuffle=True,
drop_last=True
)
n_epochs = 100
lam = 0.43357738971723536
optimizer_lr = 0.000272683866289072
batch_size = 28
model = MLPAutoencoder_Spheres() #This one I took from the original code of TopoAE
topo_model = TopologicalAutoencoder(model, lam=lam)
optimizer = optim.Adam(topo_model.parameters(), lr=optimizer_lr, weight_decay=1e-5)
for i in tqdm(range(n_epochs)):
topo_model.train()
for batch, (x, y) in enumerate(train_loader):
loss = topo_model(x)
optimizer.zero_grad()
loss.backward()
optimizer.step()
And what is weird is that I get something like this for the validation dataset
I am tracking the loss, and it is decreasing very slow. Might be a problem with the architecture of the MLP? Becuase I tried with easier datasets (like circle in 2D), and in those cases it works somethimes with hidden_dim = 32, but I needed a higher hidden dimension in the MLP to have some consistency; with 32 in the hidden dim, sometimes I got
I tried also with the LinearAE, but the problem still remains.
I think that something that is missing in the code above is the normalization of the topological loss by the batch size. I added that, and it works better, but still, I am not able to reproduce the same results.
Interesting! Can you check what happens with a slightly different optimiser (as in the example code? There are also some minor differences in the way we normalise features (see here for the original data set). These influence the choice of learning rate quite a lot.
Mmm, no, I still don't get the results. They start to look similar, but not the same
I ran enough epochs (not improving). I will keep trying to debug.
OK, that's interesting, it does not seem to be any better than the regular AE, which makes me wonder about the strength of the topological regularisation. Did you try the point cloud normalisation as in the original code, i.e. here?
Hi, sorry for the late response. It seems that the topological is not as strength as in the original paper, indeed. I tried with the normalization and I still see the same. Very weird.
Is it possible that the dataset has been generated differently? I thought I had ported over that code nicely but maybe there are some differences that I did not observe? Does it work if you change the strength parameter?
Here is the dataset generation
n_spheres = 11
r = 5
d = 100
dataset_train = Spheres(n_spheres=n_spheres,
n_samples = 500,
r = r,
d = d)
def create_sphere_dataset(n_samples=500, n_spheres=11, d=100, r=5, seed=None):
"""Create data set of high-dimensional spheres.
Create `SPHERES` data set described in Moor et al. [Moor20a]_. The
data sets consists of `n` spheres, enclosed by a single sphere. It
is a perfect example of simple manifolds, being arranged in simple
pattern, that is nevertheless challenging to embed by algorithms.
Parameters
----------
n_samples : int
Number of points to sample per sphere.
n_spheres : int
Total number of spheres to create. The algorithm will always
create the *last* sphere to enclose the previous ones. Hence,
if `n_spheres = 3`, two spheres will be enclosed by a larger
one.
d : int
Dimension of spheres to sample from. A `d`-sphere will be
embedded in `d+1` dimensions.
r : float
Radius of smaller spheres. The radius of the larger enclosing
sphere will be `5 * r`.
seed : int, instance of `np.random.Generator`, or `None`
Seed for the random number generator, or an instance of such
a generator. If set to `None`, the default random number
generator will be used.
Returns
-------
Tuple of `np.array`, `np.array`
Array containing the coordinates of the spheres. The second
array contains the respective labels, ranging from `0` to
`n_spheres - 1`. This array can be used for visualisation
purposes.
Notes
-----
The original version of this code was authored by Michael Moor.
References
----------
.. [Moor20a] M. Moor et al., "Topological Autoencoders",
*Proceedings of the 37th International Conference on Machine
Learning*, PMLR 119, pp. 7045--7054, 2020.
"""
rng = np.random.default_rng(seed)
variance = (n_spheres - 1) / np.sqrt(d)
shift_matrix = rng.normal(0, variance, [n_spheres, d+1])
spheres = []
n_datapoints = 0
for i in np.arange(n_spheres - 1):
sphere = sample_from_sphere(n=n_samples, d=d, r=r)
spheres.append(sphere + shift_matrix[i, :])
n_datapoints += n_samples
# Build additional large surrounding sphere:
n_samples_big = 10 * n_samples
big = sample_from_sphere(n=n_samples_big, d=d, r=r*5)
spheres.append(big)
n_datapoints += n_samples_big
X = np.concatenate(spheres, axis=0)
y = np.zeros(n_datapoints)
label_index = 0
for index, data in enumerate(spheres):
n_sphere_samples = data.shape[0]
y[label_index:label_index + n_sphere_samples] = index
label_index += n_sphere_samples
return X, y
def normalize_features(data_train, data_test):
"""Normalize features to zero mean and unit variance.
Args:
data:
Returns:
(transformed_data_train, transformed_data_test)
"""
mean = np.mean(data_train, axis=0, keepdims=True)
std = np.std(data_train, axis=0, keepdims=True)
transformed_train = (data_train - mean) / std
# mean = np.mean(data_test, axis=0, keepdims=True)
# std = np.std(data_test, axis=0, keepdims=True)
transformed_test = (data_test - mean) / std
return transformed_train, transformed_test
class ManifoldDataset(Dataset):
def __init__(self, data, position, train, test_fraction, random_seed):
train_data, test_data, train_pos, test_pos = train_test_split(
data, position, test_size=test_fraction, random_state=random_seed)
self.train_data, self.test_data = normalize_features(
train_data, test_data)
self.train_pos, self.test_pos = train_pos, test_pos
self.data = self.train_data if train else self.test_data
self.pos = self.train_pos if train else self.test_pos
def __getitem__(self, index):
return self.data[index], self.pos[index]
def __len__(self):
return len(self.data)
class Spheres(ManifoldDataset):
def __init__(self, train=True, n_samples=500, d=100, n_spheres=11, r=5,
test_fraction=0.1, seed=42):
#here pos are actually class labels, just conforming with parent class!
data, labels = create_sphere_dataset(n_samples=n_samples, d=d, n_spheres=n_spheres, r=r, seed=seed)
pos = labels
data = data.astype(np.float32)
pos = pos.astype(np.float32)
_rnd = np.random.RandomState(seed)
self.dimension = data.shape[1]
super().__init__(data, pos, train, test_fraction, _rnd)
Compared to the original function, it looks the same for me. I tried with different strength, but it still looks the same. I don't know if someone else have tried and got the results, maybe I am missing something.
Hmm :thinking: I don't see the normalisation in the torch-topological code. I think the best thing would be to compare the outputs of the two functions—sorry if this is somewhat tedious!
No, I meant the original function from the topological repo (the original one). I agree that in this library there is no normalization.
No problem at all, actually thanks for the quick responses. I will keep debugging.
I think the normalisation makes a big difference since it changes all persistence diagram features. I think we also further normalise during the autoencoding process...let me check again!