meshgpt-pytorch icon indicating copy to clipboard operation
meshgpt-pytorch copied to clipboard

Data augmentation strategies

Open fire opened this issue 1 year ago • 56 comments

In https://github.com/lucidrains/meshgpt-pytorch/pull/6

For each mesh I generate augments_per_item (like 200), then I use it to index into the dataset.

Using a seed I augment using this strategy.

What do you think?

scale = random.uniform(0.8, 1.2)  # Uniform scaling
rotation = R.from_euler('y', random.uniform(-180, 180), degrees=True)  # Random rotation around y-axis
translation = np.array([random.uniform(-0.5, 0.5) for _ in [0, 2]])  # Random translation in x and z directions

The goal is for a chair item to be rotated, moved or scaled, but upright.

Edited:

The idea is to have a chair be displaced but under gravity so it keeps its lowest vertex position.

fire avatar Dec 13 '23 17:12 fire

yup sounds good! just put all the functions into one file, say augment.py, and if you want to go the distance, have ways to compose / chain any number of augmentations

lucidrains avatar Dec 13 '23 18:12 lucidrains

@fire scale and rotation will go a long way

lucidrains avatar Dec 13 '23 18:12 lucidrains

image

Here's what my current augments do.

fire avatar Dec 13 '23 18:12 fire

vs original

image

Edited:

There's a bias near the center D:

fire avatar Dec 13 '23 18:12 fire

image

The bias is removed.

fire avatar Dec 13 '23 18:12 fire

I have to go for now.

https://github.com/lucidrains/meshgpt-pytorch/pull/6/files#diff-bb1e7e12bca15c4f2fd0faa464db85f6e8cb35c55454247f94c31bfc1483c3bbR100-R150

See def augment_mesh(self, base_mesh, augment_count, augment_idx):

Edited: removed seed

fire avatar Dec 13 '23 19:12 fire

@lucidrains Can you post something for me to extract the resulting mesh from the autoencoder?

fire avatar Dec 13 '23 19:12 fire

You mentioned the topic of overfitting as a first step.

I added the Blender monkey as a validation of mesh input through an autoencoder as an initial step.

I want send another monkey to the autoencoder and get the same monkey out again. How do I do that?

fire avatar Dec 13 '23 20:12 fire

I was able to train a 1 step that outputs garbage glb 🎉

fire avatar Dec 13 '23 20:12 fire

You mentioned the topic of overfitting as a first step.

I added the Blender monkey as a validation of mesh input through an autoencoder as an initial step.

I want send another monkey to the autoencoder and get the same monkey out again. How do I do that?

I have been using Marcus provided Notebook file to try that, I am also getting bad obj results. I am going to try the latest @lucidrains changes tomorrow in this notebook, maybe you can try, give a look; or maybe you might be ahead of what I am using. 😆 Thanks! https://drive.google.com/file/d/1gpLjbnH1WUH6U50MJKrw-8BV6S_-3KH1/view?usp=sharing

adeerAI avatar Dec 13 '23 21:12 adeerAI

image

I am getting bad mesh results too, but it's trying. The selected is the output, the background is the base mesh.

fire avatar Dec 13 '23 22:12 fire

Just for testing purposes; give it a go without the data augment. I think there needs to be some more improvements with the model + it will take a long time to train with the data augment. In the paper they used 28 000 shapes and trained the encoder on 2x A100 for 2 days and 4x A100 for 5 days for the transformer. So it will need lots of training data and time.

When I have been successful, the encoder loss was less 0.200- 0.250 and the loss for the transformer was around 0.00007. So if you can get the loss using the data augmentation down to those levels it probably work but that will require lots of training

bild

Here is some details from the paper, they only use scalar and jitter-shift. So remove translation & rotation and see if that helps.

MarcusLoppe avatar Dec 13 '23 22:12 MarcusLoppe

I am currently at:

loss: 1.255
loss: 1.500
loss: 1.786
loss: 1.596
loss: 1.941
loss: 1.583
loss: 1.895
loss: 1.904

So maybe I can dream about 0.200 - 0.250 loss.

fire avatar Dec 13 '23 22:12 fire

I am currently at:

loss: 1.255
loss: 1.500
loss: 1.786
loss: 1.596
loss: 1.941
loss: 1.583
loss: 1.895
loss: 1.904

So maybe I can dream about 0.200 - 0.250 loss.

How many steps are that at? I require about 2000 steps since 200 x10 epochs = 2000. Also implement tqdm since print can slow down quite alot.

Try only doing scalar and see, probably will go better.

You can give it a go with my forked version @ https://github.com/MarcusLoppe/meshgpt-pytorch/tree/main

The data MeshDataset expect is a array of:

obj_data = {"texts": "chair", "vertices": vertices, "faces": faces} 
import torch
from torch.utils.data import Dataset, DataLoader 
from tqdm import tqdm

class MeshDataset(Dataset): 
    def __init__(self, obj_data): 
        self.obj_data = obj_data
        print(f"Got {len(obj_data)} data")

    def __len__(self):
        return len(self.obj_data)

    def __getitem__(self, idx):
       return  self.obj_data[idx] 

from meshgpt_pytorch import (
    MeshTransformerTrainer,
    MeshAutoencoderTrainer
)

autoencoder_trainer = MeshAutoencoderTrainer(model = autoencoder,learning_rate = 1e-3, warmup_steps = 10,dataset = dataset,batch_size=4,grad_accum_every=1,num_train_steps=1)

autoencoder_trainer.train(10, True)

max_length =  max(len(d["faces"]) for d in dataset if "faces" in d)
max_seq =  max_length * 6
print(max_length)
print(max_seq)
transformer = MeshTransformer(
    autoencoder,
    dim = 16,
    max_seq_len = max_seq,
    #condition_on_text = True
)
 
 
trainer = MeshTransformerTrainer(model = transformer,warmup_steps = 10, dataset = dataset,learning_rate = 1e-2,batch_size=2,grad_accum_every=1,num_train_steps=1)
trainer.train(10)

MarcusLoppe avatar Dec 13 '23 23:12 MarcusLoppe

These are my current settings which is 200 steps. The outlined is the output mesh. You can see my code in the pull request.

run = wandb.init(
    project="meshgpt-pytorch",
    
    config={
        "learning_rate": 1e-2,
        "architecture": "MeshGPT",
        "dataset": dataset_directory,
        "num_train_steps": 200,
        "warmup_steps": 1,
        "batch_size": 4,
        "grad_accum_every": 1,
        "checkpoint_every": 20,
        "device": str(device),
        "autoencoder": {
            "dim": 512,
            "encoder_depth": 6,
            "decoder_depth": 6,
            "num_discrete_coors": 128,
        },
        "dataset_size": dataset.__len__(),
    }
)

image

image

fire avatar Dec 13 '23 23:12 fire

You are right that I should ensure that we're in unit square distance and do less augmentations though.

fire avatar Dec 13 '23 23:12 fire

You are right that I should ensure that we're in unit square distance and do less augmentations though.

I think that generating two objects are causing some issues, try using a singular box.

I tried your s_bed_full.glb file and the result was pretty good, it's not so smooth. Probably better result with data augmentation. The right side is the generated one.

bild bild

MarcusLoppe avatar Dec 14 '23 00:12 MarcusLoppe

https://imgsli.com/ is very good for image comparisons.

fire avatar Dec 14 '23 00:12 fire

Writing down an idea. It should be possible to go over the 10 million 3d item set and find a small set of items in a small set of classes similar to the paper and label them manually (like via path name).

fire avatar Dec 14 '23 00:12 fire

Writing down an idea. It should be possible to go over the 10 million 3d item set and find a small set of items in a small set of classes similar to the paper and label them manually (like via path name).

Training 10 million might be overkill and going over 28 000 shapes might cost a bit to much $$$. Shapenet got 50k 3d models with like almost a paragraph of description text.

Renting A100 at 0.79$ per hour: Training encoder on A100 x2 for 2 days: 75,84$ Training transformer on A100 x4 for 5 days: 379$

However H100 promises good performance but at like 2-3$ an hour.

https://imgsli.com/ is very good for image comparisons.

Seems pretty good, but probably not for 3D models

MarcusLoppe avatar Dec 14 '23 00:12 MarcusLoppe

I can't use shapenet, but I'm sure we can find 10 class of 100 models like Shapenet in that 10 million dataset.

fire avatar Dec 14 '23 00:12 fire

I can't use shapenet, but I'm sure we can find 10 class of 100 models like Shapenet in that 10 million dataset.

I think it's fine, there are many free sources, the trouble might be finding a dataset with descriptions. But that is in the future, I think someone can get access from Shapenet. But the bigger issue is the GPU bill, however Phil/lucidrains might be able to improve the models so much that the training time goes down dramatically.

But after the model is trained the issue the inference will be a big issue for users, if it's going to generate complex 3D models, it might not work on consumer hardware. But the recent performance boost is a good sign that the performance and effective is on the right track.

https://github.com/timzhang642/3D-Machine-Learning#3d_models

MarcusLoppe avatar Dec 14 '23 01:12 MarcusLoppe

I want to mention, getting the indices so they're in the right order and making sure they fit in the box and not inside out are problems too.

If you're interested in training the head it's image in the dataset. I can't get the autoencoder below 0.5 loss

fire avatar Dec 14 '23 18:12 fire

I want to mention, getting the indices so they're in the right order and making sure they fit in the box and not inside out are problems too.

If you're interested in training the head it's in the dataset. I can't get the autoencoder below 0.5 loss

How many examples/steps of the same 3d mesh did you train it on? I trained for 10-20 epochs @ 2000 examples and got 0.19 loss. I think you are training on too few examples, it needs massive amounts of data to model. And if you do data augmentation you'll need even more data, maybe 30-40 epochs or more.

I was able to generate a pretty good 3d mesh, it's not as smooth but very good result for such small amount of training data. The transformer & encoder isn't good at generalizing with low training data but that will resolve itself when training with much more data.

3D mesh: https://file.io/6JIueypFnRyT

bild

MarcusLoppe avatar Dec 14 '23 22:12 MarcusLoppe

I was using the wrong strategy. You were using many same copies of the mesh and then some augments. I was doing the opposite.

fire avatar Dec 15 '23 00:12 fire

I was using the wrong strategy. You were using many same copies of the mesh and then some augments. I was doing the opposite.

I might have worded that badly but no, I'm using the same model without any augmentations. But train for 10/20 epochs @ 2000 items per dataset and let me know. Kaggle has some awesome free GPU's.

MarcusLoppe avatar Dec 15 '23 01:12 MarcusLoppe

Here's what I interpreted it.

  1. model * multiple
  2. model * multiple * augments

You were doing 2000 (same) * 1 * 1.

I was trying 1 * 2000 (agumented) * 1.

Thanks for telling me! I'm trying your suggestion.

fire avatar Dec 15 '23 01:12 fire

Here's what I interpreted it.

1. model * multiple

2. model * multiple * augments

You were doing 2000 (same) * 1 * 1.

I was trying 1 * 2000 (agumented) * 1.

Thanks for telling me! I'm trying your suggestion.

No problem, I posted this in another issue but I think this might help you; according to the paper they sort the vertices in z-y-x order. Then sort the faces as per their lowest vertex index.

Also, I'm current training on like 6 3d mesh chairs. Each chair has 1500 examples, but it have 3 augmentation version . So each 3d mesh file have a total of 500 x 3 =1500 examples.

The total is 12 000 examples.

To give you some type of idea of why you need to train for 2 days on two A100, watch how slow the progress is (33 minutes running):


Epoch 1/20: 100%|██████████| 1125/1125 [03:29<00:00,  5.38it/s, loss=0.296]
Epoch 1 average loss: 0.7889469708336724
Epoch 2/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.52it/s, loss=0.307]
Epoch 2 average loss: 0.29623086002137927
Epoch 3/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.28] 
Epoch 3 average loss: 0.2731376721594069
Epoch 4/20: 100%|██████████| 1125/1125 [03:22<00:00,  5.54it/s, loss=0.248]
Epoch 4 average loss: 0.25995001827345954
Epoch 5/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.239]
Epoch 5 average loss: 0.251056260228157
Epoch 6/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.53it/s, loss=0.217]
Epoch 6 average loss: 0.24529405222998726
Epoch 7/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.227]
Epoch 7 average loss: 0.24055371418264176
Epoch 8/20: 100%|██████████| 1125/1125 [03:22<00:00,  5.54it/s, loss=0.221]
Epoch 8 average loss: 0.23791699058479732
Epoch 9/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.245]
Epoch 9 average loss: 0.23742892943488228
Epoch 10/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.208]
Epoch 10 average loss: 0.23614923742082383
Epoch 11/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.53it/s, loss=0.219]
Epoch 11 average loss: 0.23556399891111585

MarcusLoppe avatar Dec 15 '23 01:12 MarcusLoppe

https://github.com/lucidrains/meshgpt-pytorch/issues/11#issuecomment-1856353929 was the verification of z-y-x order and sort the faces as per their lowest vertex index. Note that I am using the convention that gives me that result like Y-Z-X, but it followed their requirement of sorted vertically.

fire avatar Dec 15 '23 02:12 fire

@MarcusLoppe on your branch, can you add a feature that on the first quit I save, on the second quit quit. Then, we can restart from a checkpoint.

fire avatar Dec 15 '23 02:12 fire