Chen-Hsuan Lin comments

Results 28 comments of


                                            Chen-Hsuan Lin

Multiple GPU training is slow.

Hi @Ziba-li, the multi-GPU setup (i.e. distributed training) enables training with larger batch sizes. It doesn't increase the per-iteration training speed, but it will be much faster to train each...

Multiple GPU training is slow.

I'm not sure about the communication overhead of 4090, but we didn't see such issue with A100. If you could help pinpoint where the additional overhead is coming from (and...

My test results on LEGO

In our toy example and the Colab, we use the test set of the Lego sequence instead of the training set. This is to simulate a smooth camera trajectory that...

Bad quality using official Neuralangelo compared to sdfstudio Implementation

Hi @xiemeilong, in addition to the above @mli0603 mentioned, we also have a fix (#41) on the scripts. If you were extracting the mesh from an earlier checkpoint, please pull...

Bad quality using official Neuralangelo compared to sdfstudio Implementation

@xiemeilong @zz7379 we have pushed an update to `main` yesterday that fixed a checkpoint issue which may be related. Could you pull and try running the pipeline again? Please let...

The lego mesh result is not well，How to adjust parameters and get better results

@yuxuJava789 if you are training with the default config, this is expected at 20k iterations. You would need to run to 500k iterations to get the final results. If you...

The result mesh is not good

@derrick-xwp your results look fine. Could you elaborate what the concern is?

BrokenPipeError occurs with wandb option

Hi @ZirongChan could you post the full error log? Thanks!

BrokenPipeError occurs with wandb option

This seems to be an issue on the W&B side. We don't support Tensorboard right now, but PRs are welcome if you'd like to help add this support.

BrokenPipeError occurs with wandb option

To disable distributed training, you can run `python train.py --single_gpu ...` instead of `torchrun --nproc_per_node=1 train.py ...` and it should work.