pytorch-topological icon indicating copy to clipboard operation
pytorch-topological copied to clipboard

Replicate the experiments of spheres from TopoAE

Open nzilberstein opened this issue 1 year ago • 12 comments

Hi,

Thanks for the great work and developing this library. I am trying to replicate the experiments from the Topological autoencoder paper using the script given as an example, but I can't. I used the hyperparameters from best run file in the original repo, and the DeepAE, but I still can't replicate. Is there something that I am missing?

Thanks!

nzilberstein avatar Jun 26 '24 18:06 nzilberstein

Hey there!

This is a new implementation, so maybe there are some slight differences. Can you let me know what exactly doesn't work?

Pseudomanifold avatar Jun 27 '24 09:06 Pseudomanifold

Hi Bastian,

Thanks for the quick response. I attach the results and the code I am using

 n_spheres = 11
  r = 5
  d = 100

  dataset_train = Spheres(n_spheres=n_spheres, 
                          r = r, 
                          d = d)
                          
 train_loader = DataLoader(
    dataset_train,
    batch_size=28,
    shuffle=True,
    drop_last=True
)

n_epochs = 100
lam = 0.43357738971723536
optimizer_lr = 0.000272683866289072
batch_size = 28


  model = MLPAutoencoder_Spheres() #This one I took from the original code of TopoAE
  topo_model = TopologicalAutoencoder(model, lam=lam)

  optimizer = optim.Adam(topo_model.parameters(), lr=optimizer_lr, weight_decay=1e-5)

  for i in tqdm(range(n_epochs)):
      topo_model.train()

      for batch, (x, y) in enumerate(train_loader):
          loss = topo_model(x)

          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

And what is weird is that I get something like this for the validation dataset

Screenshot 2024-06-27 at 9 13 44 AM

I am tracking the loss, and it is decreasing very slow. Might be a problem with the architecture of the MLP? Becuase I tried with easier datasets (like circle in 2D), and in those cases it works somethimes with hidden_dim = 32, but I needed a higher hidden dimension in the MLP to have some consistency; with 32 in the hidden dim, sometimes I got

Screenshot 2024-06-27 at 9 24 27 AM

I tried also with the LinearAE, but the problem still remains.

nzilberstein avatar Jun 27 '24 16:06 nzilberstein

I think that something that is missing in the code above is the normalization of the topological loss by the batch size. I added that, and it works better, but still, I am not able to reproduce the same results.

nzilberstein avatar Jun 28 '24 04:06 nzilberstein

Interesting! Can you check what happens with a slightly different optimiser (as in the example code? There are also some minor differences in the way we normalise features (see here for the original data set). These influence the choice of learning rate quite a lot.

Pseudomanifold avatar Jul 01 '24 07:07 Pseudomanifold

Mmm, no, I still don't get the results. They start to look similar, but not the same

Screenshot 2024-07-02 at 2 44 31 PM

I ran enough epochs (not improving). I will keep trying to debug.

nzilberstein avatar Jul 02 '24 21:07 nzilberstein

OK, that's interesting, it does not seem to be any better than the regular AE, which makes me wonder about the strength of the topological regularisation. Did you try the point cloud normalisation as in the original code, i.e. here?

Pseudomanifold avatar Jul 03 '24 11:07 Pseudomanifold

Hi, sorry for the late response. It seems that the topological is not as strength as in the original paper, indeed. I tried with the normalization and I still see the same. Very weird.

nzilberstein avatar Jul 13 '24 21:07 nzilberstein

Is it possible that the dataset has been generated differently? I thought I had ported over that code nicely but maybe there are some differences that I did not observe? Does it work if you change the strength parameter?

Pseudomanifold avatar Jul 14 '24 20:07 Pseudomanifold

Here is the dataset generation

        n_spheres = 11
        r = 5
        d = 100

        dataset_train = Spheres(n_spheres=n_spheres, 
                                n_samples = 500,
                                r = r, 
                                d = d)


def create_sphere_dataset(n_samples=500, n_spheres=11, d=100, r=5, seed=None):
    """Create data set of high-dimensional spheres.

    Create `SPHERES` data set described in Moor et al. [Moor20a]_. The
    data sets consists of `n` spheres, enclosed by a single sphere. It
    is a perfect example of simple manifolds, being arranged in simple
    pattern, that is nevertheless challenging to embed by algorithms.

    Parameters
    ----------
    n_samples : int
        Number of points to sample per sphere.

    n_spheres : int
        Total number of spheres to create. The algorithm will always
        create the *last* sphere to enclose the previous ones. Hence,
        if `n_spheres = 3`, two spheres will be enclosed by a larger
        one.

    d : int
        Dimension of spheres to sample from. A `d`-sphere will be
        embedded in `d+1` dimensions.

    r : float
        Radius of smaller spheres. The radius of the larger enclosing
        sphere will be `5 * r`.

    seed : int, instance of `np.random.Generator`, or `None`
        Seed for the random number generator, or an instance of such
        a generator. If set to `None`, the default random number
        generator will be used.

    Returns
    -------
    Tuple of `np.array`, `np.array`
        Array containing the coordinates of the spheres. The second
        array contains the respective labels, ranging from `0` to
        `n_spheres - 1`. This array can be used for visualisation
        purposes.

    Notes
    -----
    The original version of this code was authored by Michael Moor.

    References
    ----------
    .. [Moor20a] M. Moor et al., "Topological Autoencoders",
        *Proceedings of the 37th International Conference on Machine
        Learning*, PMLR 119, pp. 7045--7054, 2020.
    """
    rng = np.random.default_rng(seed)

    variance = (n_spheres - 1) / np.sqrt(d)
    shift_matrix = rng.normal(0, variance, [n_spheres, d+1])
  
    spheres = []
    n_datapoints = 0
    for i in np.arange(n_spheres - 1):
        sphere = sample_from_sphere(n=n_samples, d=d, r=r)
        spheres.append(sphere + shift_matrix[i, :])
        n_datapoints += n_samples

    # Build additional large surrounding sphere:
    n_samples_big = 10 * n_samples
    big = sample_from_sphere(n=n_samples_big, d=d, r=r*5)
    spheres.append(big)
    n_datapoints += n_samples_big

    X = np.concatenate(spheres, axis=0)
    y = np.zeros(n_datapoints)

    label_index = 0

    for index, data in enumerate(spheres):
        n_sphere_samples = data.shape[0]
        y[label_index:label_index + n_sphere_samples] = index
        label_index += n_sphere_samples

    return X, y

def normalize_features(data_train, data_test):
    """Normalize features to zero mean and unit variance.

    Args:
        data:

    Returns:
        (transformed_data_train, transformed_data_test)

    """
    mean = np.mean(data_train, axis=0, keepdims=True)
    std = np.std(data_train, axis=0, keepdims=True)
    
    transformed_train = (data_train - mean) / std

    # mean = np.mean(data_test, axis=0, keepdims=True)
    # std = np.std(data_test, axis=0, keepdims=True)
    transformed_test = (data_test - mean) / std
    return transformed_train, transformed_test


class ManifoldDataset(Dataset):
    def __init__(self, data, position, train, test_fraction, random_seed):
        train_data, test_data, train_pos, test_pos = train_test_split(
            data, position, test_size=test_fraction, random_state=random_seed)
        self.train_data, self.test_data = normalize_features(
            train_data, test_data)
        self.train_pos, self.test_pos = train_pos, test_pos
        self.data = self.train_data if train else self.test_data
        self.pos = self.train_pos if train else self.test_pos

    def __getitem__(self, index):
        return self.data[index], self.pos[index]

    def __len__(self):
        return len(self.data)

class Spheres(ManifoldDataset):
    def __init__(self, train=True, n_samples=500, d=100, n_spheres=11, r=5,
                test_fraction=0.1, seed=42):
        #here pos are actually class labels, just conforming with parent class!
        data, labels = create_sphere_dataset(n_samples=n_samples, d=d, n_spheres=n_spheres, r=r, seed=seed)
        pos = labels
        data = data.astype(np.float32)
        pos = pos.astype(np.float32)
        _rnd = np.random.RandomState(seed)

        self.dimension = data.shape[1]

        super().__init__(data, pos, train, test_fraction, _rnd)

Compared to the original function, it looks the same for me. I tried with different strength, but it still looks the same. I don't know if someone else have tried and got the results, maybe I am missing something.

nzilberstein avatar Jul 16 '24 01:07 nzilberstein

Hmm :thinking: I don't see the normalisation in the torch-topological code. I think the best thing would be to compare the outputs of the two functions—sorry if this is somewhat tedious!

Pseudomanifold avatar Jul 16 '24 13:07 Pseudomanifold

No, I meant the original function from the topological repo (the original one). I agree that in this library there is no normalization.

No problem at all, actually thanks for the quick responses. I will keep debugging.

nzilberstein avatar Jul 16 '24 16:07 nzilberstein

I think the normalisation makes a big difference since it changes all persistence diagram features. I think we also further normalise during the autoencoding process...let me check again!

Pseudomanifold avatar Jul 16 '24 17:07 Pseudomanifold