meshgpt-pytorch
meshgpt-pytorch copied to clipboard
Data augmentation strategies
In https://github.com/lucidrains/meshgpt-pytorch/pull/6
For each mesh I generate augments_per_item (like 200), then I use it to index into the dataset.
Using a seed I augment using this strategy.
What do you think?
scale = random.uniform(0.8, 1.2) # Uniform scaling
rotation = R.from_euler('y', random.uniform(-180, 180), degrees=True) # Random rotation around y-axis
translation = np.array([random.uniform(-0.5, 0.5) for _ in [0, 2]]) # Random translation in x and z directions
The goal is for a chair item to be rotated, moved or scaled, but upright.
Edited:
The idea is to have a chair be displaced but under gravity so it keeps its lowest vertex position.
yup sounds good! just put all the functions into one file, say augment.py, and if you want to go the distance, have ways to compose / chain any number of augmentations
@fire scale and rotation will go a long way
Here's what my current augments do.
vs original
Edited:
There's a bias near the center D:
The bias is removed.
I have to go for now.
https://github.com/lucidrains/meshgpt-pytorch/pull/6/files#diff-bb1e7e12bca15c4f2fd0faa464db85f6e8cb35c55454247f94c31bfc1483c3bbR100-R150
See def augment_mesh(self, base_mesh, augment_count, augment_idx):
Edited: removed seed
@lucidrains Can you post something for me to extract the resulting mesh from the autoencoder?
You mentioned the topic of overfitting as a first step.
I added the Blender monkey as a validation of mesh input through an autoencoder as an initial step.
I want send another monkey to the autoencoder and get the same monkey out again. How do I do that?
I was able to train a 1 step that outputs garbage glb 🎉
You mentioned the topic of overfitting as a first step.
I added the Blender monkey as a validation of mesh input through an autoencoder as an initial step.
I want send another monkey to the autoencoder and get the same monkey out again. How do I do that?
I have been using Marcus provided Notebook file to try that, I am also getting bad obj results. I am going to try the latest @lucidrains changes tomorrow in this notebook, maybe you can try, give a look; or maybe you might be ahead of what I am using. 😆 Thanks! https://drive.google.com/file/d/1gpLjbnH1WUH6U50MJKrw-8BV6S_-3KH1/view?usp=sharing
I am getting bad mesh results too, but it's trying. The selected is the output, the background is the base mesh.
Just for testing purposes; give it a go without the data augment. I think there needs to be some more improvements with the model + it will take a long time to train with the data augment. In the paper they used 28 000 shapes and trained the encoder on 2x A100 for 2 days and 4x A100 for 5 days for the transformer. So it will need lots of training data and time.
When I have been successful, the encoder loss was less 0.200- 0.250 and the loss for the transformer was around 0.00007. So if you can get the loss using the data augmentation down to those levels it probably work but that will require lots of training
Here is some details from the paper, they only use scalar and jitter-shift. So remove translation & rotation and see if that helps.
I am currently at:
loss: 1.255
loss: 1.500
loss: 1.786
loss: 1.596
loss: 1.941
loss: 1.583
loss: 1.895
loss: 1.904
So maybe I can dream about 0.200 - 0.250 loss.
I am currently at:
loss: 1.255 loss: 1.500 loss: 1.786 loss: 1.596 loss: 1.941 loss: 1.583 loss: 1.895 loss: 1.904So maybe I can dream about 0.200 - 0.250 loss.
How many steps are that at? I require about 2000 steps since 200 x10 epochs = 2000. Also implement tqdm since print can slow down quite alot.
Try only doing scalar and see, probably will go better.
You can give it a go with my forked version @ https://github.com/MarcusLoppe/meshgpt-pytorch/tree/main
The data MeshDataset expect is a array of:
obj_data = {"texts": "chair", "vertices": vertices, "faces": faces}
import torch
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm
class MeshDataset(Dataset):
def __init__(self, obj_data):
self.obj_data = obj_data
print(f"Got {len(obj_data)} data")
def __len__(self):
return len(self.obj_data)
def __getitem__(self, idx):
return self.obj_data[idx]
from meshgpt_pytorch import (
MeshTransformerTrainer,
MeshAutoencoderTrainer
)
autoencoder_trainer = MeshAutoencoderTrainer(model = autoencoder,learning_rate = 1e-3, warmup_steps = 10,dataset = dataset,batch_size=4,grad_accum_every=1,num_train_steps=1)
autoencoder_trainer.train(10, True)
max_length = max(len(d["faces"]) for d in dataset if "faces" in d)
max_seq = max_length * 6
print(max_length)
print(max_seq)
transformer = MeshTransformer(
autoencoder,
dim = 16,
max_seq_len = max_seq,
#condition_on_text = True
)
trainer = MeshTransformerTrainer(model = transformer,warmup_steps = 10, dataset = dataset,learning_rate = 1e-2,batch_size=2,grad_accum_every=1,num_train_steps=1)
trainer.train(10)
These are my current settings which is 200 steps. The outlined is the output mesh. You can see my code in the pull request.
run = wandb.init(
project="meshgpt-pytorch",
config={
"learning_rate": 1e-2,
"architecture": "MeshGPT",
"dataset": dataset_directory,
"num_train_steps": 200,
"warmup_steps": 1,
"batch_size": 4,
"grad_accum_every": 1,
"checkpoint_every": 20,
"device": str(device),
"autoencoder": {
"dim": 512,
"encoder_depth": 6,
"decoder_depth": 6,
"num_discrete_coors": 128,
},
"dataset_size": dataset.__len__(),
}
)
You are right that I should ensure that we're in unit square distance and do less augmentations though.
You are right that I should ensure that we're in unit square distance and do less augmentations though.
I think that generating two objects are causing some issues, try using a singular box.
I tried your s_bed_full.glb file and the result was pretty good, it's not so smooth. Probably better result with data augmentation. The right side is the generated one.
https://imgsli.com/ is very good for image comparisons.
Writing down an idea. It should be possible to go over the 10 million 3d item set and find a small set of items in a small set of classes similar to the paper and label them manually (like via path name).
Writing down an idea. It should be possible to go over the 10 million 3d item set and find a small set of items in a small set of classes similar to the paper and label them manually (like via path name).
Training 10 million might be overkill and going over 28 000 shapes might cost a bit to much $$$. Shapenet got 50k 3d models with like almost a paragraph of description text.
Renting A100 at 0.79$ per hour: Training encoder on A100 x2 for 2 days: 75,84$ Training transformer on A100 x4 for 5 days: 379$
However H100 promises good performance but at like 2-3$ an hour.
https://imgsli.com/ is very good for image comparisons.
Seems pretty good, but probably not for 3D models
I can't use shapenet, but I'm sure we can find 10 class of 100 models like Shapenet in that 10 million dataset.
I can't use shapenet, but I'm sure we can find 10 class of 100 models like Shapenet in that 10 million dataset.
I think it's fine, there are many free sources, the trouble might be finding a dataset with descriptions. But that is in the future, I think someone can get access from Shapenet. But the bigger issue is the GPU bill, however Phil/lucidrains might be able to improve the models so much that the training time goes down dramatically.
But after the model is trained the issue the inference will be a big issue for users, if it's going to generate complex 3D models, it might not work on consumer hardware. But the recent performance boost is a good sign that the performance and effective is on the right track.
https://github.com/timzhang642/3D-Machine-Learning#3d_models
I want to mention, getting the indices so they're in the right order and making sure they fit in the box and not inside out are problems too.
If you're interested in training the head it's
in the dataset. I can't get the autoencoder below 0.5 loss
I want to mention, getting the indices so they're in the right order and making sure they fit in the box and not inside out are problems too.
If you're interested in training the head it's in the dataset. I can't get the autoencoder below 0.5 loss
How many examples/steps of the same 3d mesh did you train it on? I trained for 10-20 epochs @ 2000 examples and got 0.19 loss. I think you are training on too few examples, it needs massive amounts of data to model. And if you do data augmentation you'll need even more data, maybe 30-40 epochs or more.
I was able to generate a pretty good 3d mesh, it's not as smooth but very good result for such small amount of training data. The transformer & encoder isn't good at generalizing with low training data but that will resolve itself when training with much more data.
3D mesh: https://file.io/6JIueypFnRyT
I was using the wrong strategy. You were using many same copies of the mesh and then some augments. I was doing the opposite.
I was using the wrong strategy. You were using many same copies of the mesh and then some augments. I was doing the opposite.
I might have worded that badly but no, I'm using the same model without any augmentations. But train for 10/20 epochs @ 2000 items per dataset and let me know. Kaggle has some awesome free GPU's.
Here's what I interpreted it.
- model * multiple
- model * multiple * augments
You were doing 2000 (same) * 1 * 1.
I was trying 1 * 2000 (agumented) * 1.
Thanks for telling me! I'm trying your suggestion.
Here's what I interpreted it.
1. model * multiple 2. model * multiple * augmentsYou were doing 2000 (same) * 1 * 1.
I was trying 1 * 2000 (agumented) * 1.
Thanks for telling me! I'm trying your suggestion.
No problem, I posted this in another issue but I think this might help you; according to the paper they sort the vertices in z-y-x order. Then sort the faces as per their lowest vertex index.
Also, I'm current training on like 6 3d mesh chairs. Each chair has 1500 examples, but it have 3 augmentation version . So each 3d mesh file have a total of 500 x 3 =1500 examples.
The total is 12 000 examples.
To give you some type of idea of why you need to train for 2 days on two A100, watch how slow the progress is (33 minutes running):
Epoch 1/20: 100%|██████████| 1125/1125 [03:29<00:00, 5.38it/s, loss=0.296]
Epoch 1 average loss: 0.7889469708336724
Epoch 2/20: 100%|██████████| 1125/1125 [03:23<00:00, 5.52it/s, loss=0.307]
Epoch 2 average loss: 0.29623086002137927
Epoch 3/20: 100%|██████████| 1125/1125 [03:23<00:00, 5.54it/s, loss=0.28]
Epoch 3 average loss: 0.2731376721594069
Epoch 4/20: 100%|██████████| 1125/1125 [03:22<00:00, 5.54it/s, loss=0.248]
Epoch 4 average loss: 0.25995001827345954
Epoch 5/20: 100%|██████████| 1125/1125 [03:23<00:00, 5.54it/s, loss=0.239]
Epoch 5 average loss: 0.251056260228157
Epoch 6/20: 100%|██████████| 1125/1125 [03:23<00:00, 5.53it/s, loss=0.217]
Epoch 6 average loss: 0.24529405222998726
Epoch 7/20: 100%|██████████| 1125/1125 [03:23<00:00, 5.54it/s, loss=0.227]
Epoch 7 average loss: 0.24055371418264176
Epoch 8/20: 100%|██████████| 1125/1125 [03:22<00:00, 5.54it/s, loss=0.221]
Epoch 8 average loss: 0.23791699058479732
Epoch 9/20: 100%|██████████| 1125/1125 [03:23<00:00, 5.54it/s, loss=0.245]
Epoch 9 average loss: 0.23742892943488228
Epoch 10/20: 100%|██████████| 1125/1125 [03:23<00:00, 5.54it/s, loss=0.208]
Epoch 10 average loss: 0.23614923742082383
Epoch 11/20: 100%|██████████| 1125/1125 [03:23<00:00, 5.53it/s, loss=0.219]
Epoch 11 average loss: 0.23556399891111585
https://github.com/lucidrains/meshgpt-pytorch/issues/11#issuecomment-1856353929 was the verification of z-y-x order and sort the faces as per their lowest vertex index. Note that I am using the convention that gives me that result like Y-Z-X, but it followed their requirement of sorted vertically.
@MarcusLoppe on your branch, can you add a feature that on the first quit I save, on the second quit quit. Then, we can restart from a checkpoint.