Frédéric Branchaud-Charron comments

Results 137 comments of


                                            Frédéric Branchaud-Charron

Additional/new approach to estimating class overlap

Hello, Yeah I think this is a great idea. we always work with distance, so this should work with cosine and with euclidean. We would need to inverse in both...

Dataset shape

Hello, Do you have some code I could look at? We expect an array with shape [N, num_features] so you need to flatten the images.

Dataset shape

You can flatten the images with: ```python X_train = X_train.reshape((X.shape[0],-1)) ```

Dataset shape

Ah I see the issue, you must not call `to_categorical` here. And we expect an array with a single dimension. ```python y_train = y_train.reshape([-1]) ```

Dataset shape

Awesome! So a bit easier than notMNIST ![image](https://user-images.githubusercontent.com/8976546/150164283-ad795a15-de34-4d2f-b80a-c5594f0b5d85.png)

Dataset shape

This is the average over 20 runs I think (it's been a while)? But the standard deviation was very small as you can see in Figure 2.

Dataset shape

Referencing the CVPR paper is perfect thank you.

Dataset shape

WHen it is available, send me a link and I'll add it to the README :)

Yeah sure. For the paper, we got CIFAR10 embeddings using an autoencoder and ran t-SNE on it. We used MultiCoreTSNE. CNN encoder code: https://github.com/Dref360/spectral_metric/blob/master/experiments/embedding/cnn_autoencoder.py t-SNE code: https://github.com/Dref360/spectral_metric/blob/master/experiments/embedding/tsne.py To compare datasets,...

Add the option of saving in parquet instead of arrow

I think [`Dataset.to_parquet`](https://huggingface.co/docs/datasets/v1.10.2/package_reference/main_classes.html#datasets.Dataset.to_parquet) is what you're looking for. Let me know if I'm wrong