meshgpt-pytorch icon indicating copy to clipboard operation
meshgpt-pytorch copied to clipboard

ImportError: cannot import name 'MeshDataset' from 'meshgpt_pytorch'

Open StephenYangjz opened this issue 11 months ago • 5 comments

Hi, I am running the demo and it seems like MeshDataset can not be imported from meshgpt_pytorch. Any help @lucidrains would be greatly appreciated!

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[5], [line 4](vscode-notebook-cell:?execution_count=5&line=4)
      [2](vscode-notebook-cell:?execution_count=5&line=2) import gc     
      [3](vscode-notebook-cell:?execution_count=5&line=3) import os
----> [4](vscode-notebook-cell:?execution_count=5&line=4) from meshgpt_pytorch import MeshDataset 
      [6](vscode-notebook-cell:?execution_count=5&line=6) project_name = "demo_mesh" 
      [8](vscode-notebook-cell:?execution_count=5&line=8) working_dir = f'.\{project_name}'

ImportError: cannot import name 'MeshDataset' from 'meshgpt_pytorch' (/home/stephen/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/__init__.py)

StephenYangjz avatar Mar 11 '24 01:03 StephenYangjz

Hi Stephen,

So MeshDataset is a class which I created for meshgpt_pytorch, I made a pull request for it but not sure why it wasn't accepted. What ever the reason; the difference between my fork and meshgpt is just the modified trainer class (train by epochs instead and get progress reports from tdqm) and MeshDataset.

If you'd like to use my MeshDataset you can install my fork or just copy and paste MeshDataset into your code.

MarcusLoppe avatar Mar 11 '24 15:03 MarcusLoppe

That resolves it (init also needs to be updated) Thanks!

StephenYangjz avatar Mar 12 '24 01:03 StephenYangjz

Hi @MarcusLoppe , I didnt think I have this issue before but now im getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], [line 11](vscode-notebook-cell:?execution_count=10&line=11)
      [1](vscode-notebook-cell:?execution_count=10&line=1) # autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [2](vscode-notebook-cell:?execution_count=10&line=2) #                                              batch_size=8,
      [3](vscode-notebook-cell:?execution_count=10&line=3) #                                              grad_accum_every=2,
      [4](vscode-notebook-cell:?execution_count=10&line=4) #                                              learning_rate = 1e-2) 
      [5](vscode-notebook-cell:?execution_count=10&line=5) # loss = autoencoder_trainer.train(280,stop_at_loss = 0.7, diplay_graph= True)   
      [7](vscode-notebook-cell:?execution_count=10&line=7) autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [8](vscode-notebook-cell:?execution_count=10&line=8)                                              batch_size=8,
      [9](vscode-notebook-cell:?execution_count=10&line=9)                                              grad_accum_every=2,
     [10](vscode-notebook-cell:?execution_count=10&line=10)                                              learning_rate = 4e-3) 
---> [11](vscode-notebook-cell:?execution_count=10&line=11) loss = autoencoder_trainer.train(280,stop_at_loss = 0.28, diplay_graph= True)     

TypeError: train() got an unexpected keyword argument 'stop_at_loss'

Do you by any chance have any pointers? Thank you!

StephenYangjz avatar Mar 12 '24 19:03 StephenYangjz

Hi @MarcusLoppe , I didnt think I have this issue before but now im getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], [line 11](vscode-notebook-cell:?execution_count=10&line=11)
      [1](vscode-notebook-cell:?execution_count=10&line=1) # autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [2](vscode-notebook-cell:?execution_count=10&line=2) #                                              batch_size=8,
      [3](vscode-notebook-cell:?execution_count=10&line=3) #                                              grad_accum_every=2,
      [4](vscode-notebook-cell:?execution_count=10&line=4) #                                              learning_rate = 1e-2) 
      [5](vscode-notebook-cell:?execution_count=10&line=5) # loss = autoencoder_trainer.train(280,stop_at_loss = 0.7, diplay_graph= True)   
      [7](vscode-notebook-cell:?execution_count=10&line=7) autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [8](vscode-notebook-cell:?execution_count=10&line=8)                                              batch_size=8,
      [9](vscode-notebook-cell:?execution_count=10&line=9)                                              grad_accum_every=2,
     [10](vscode-notebook-cell:?execution_count=10&line=10)                                              learning_rate = 4e-3) 
---> [11](vscode-notebook-cell:?execution_count=10&line=11) loss = autoencoder_trainer.train(280,stop_at_loss = 0.28, diplay_graph= True)     

TypeError: train() got an unexpected keyword argument 'stop_at_loss'

Do you by any chance have any pointers? Thank you!

Oh, I'm not sure, the train function is: def train(self, num_epochs, stop_at_loss = None, diplay_graph = False):

Python has some weird issues so have you give it a go restarting the notebook kernel?

I'm currently running the below and it's working. Btw, I should have removed one of the autoencoder_trainer so there is only one. I found it better for the model to start training at a low learning rate since this will ensure the commit loss will be steadier and I don't really notice any improvements by having a higher learning rate at the start.

Also, target a batch size of 64, if you got enough VRAM, set the batch size to 64 and grad_accum_every to 1. Larger batch size equals faster the training time. For training on a large dataset, you can set the commit_loss_weight to 0.25 otherwise it will shoot up to 100s. This way it puts pressure on the encoder to compress the tokens better.

Otherwise try to get a total effective batch size of 64 by changing grad_accum_every so it will equal 64: batch_size * grad_accum_every = 64

save_name = "16k_2_4" 
batch_size=16
   
autoencoder.commit_loss_weight = 0.25  
autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 100, dataset = dataset, num_train_steps=100,
                                             batch_size=batch_size,
                                             grad_accum_every=4,
                                             learning_rate = 1e-4,
                                             checkpoint_every_epoch= 1) 
loss = autoencoder_trainer.train(480,stop_at_loss = 0.2, diplay_graph= True)  

MarcusLoppe avatar Mar 12 '24 20:03 MarcusLoppe

Thank you! Reinstalled the package and it went away, just seems to be kernel/package issues

StephenYangjz avatar Mar 13 '24 16:03 StephenYangjz