OpenLRM Pretrained models config files

Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large.

May 06 '24 08:05 lincior

Hi,

You can simply change the model configs and dataset configs based on the differences described in the model_card.md. Here's an example.

May 06 '24 09:05 ZexinHe

Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large. Every published model contains a config.json file with the info See here for example: https://huggingface.co/zxhezexin/openlrm-obj-small-1.1/tree/main

You can also fetch this configuration with the following code:

import transformers
model_config = transformers.PretrainedConfig.from_pretrained("zxhezexin/openlrm-obj-base-1.1")
print(model_config)

PretrainedConfig {
  "camera_embed_dim": 1024,
  "encoder_feat_dim": 768,
  "encoder_freeze": false,
  "encoder_model_name": "dinov2_vitb14_reg",
  "encoder_type": "dinov2",
  "rendering_samples_per_ray": 96,
  "transformer_dim": 768,
  "transformer_heads": 12,
  "transformer_layers": 12,
  "transformers_version": "4.28.1",
  "triplane_dim": 48,
  "triplane_high_res": 64,
  "triplane_low_res": 32
}

Personaly for pretraining I've changed the code to load the pretrained model directly

from openlrm.utils.hf_hub import wrap_model_hub

class LRMTrainer(Trainer):

    ...

    def _build_model(self, cfg):
        assert (
            cfg.experiment.type == "lrm"
        ), f"Config type {cfg.experiment.type} does not match with runner {self.__class__.__name__}"
        from openlrm.models import ModelLRM

        model_class = wrap_model_hub(ModelLRM)
        model = model_class.from_pretrained(cfg.experiment.pretrained)
        return model

you can replace cfg.experiment.pretrained with "zxhezexin/openlrm-obj-base-1.1" or add a pretrained key to your config

May 23 '24 06:05 da2r-20

@da2r-20 This is amazing! Thanks!

May 29 '24 09:05 ZexinHe

Hi @ZexinHe , thank you for your advice. I'm wondering if I can modify the resolutions much higher than 336, such as 1008(since the patch value is 14)? My goal is to increase the inference result's texture quality. I'm finetuning openlrm-mix-large-1.1, with 1000 pairs of data. But the training result is not good.

training data

There are 1000 custom glb files, all processed through blender_script.py properly. I know the number of data is not enough, so I'm currently just trying overfitting.

train-sample.yaml


experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 128
    transformer_dim: 1024
    transformer_layers: 16
    transformer_heads: 16
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 80
    encoder_type: dinov2
    encoder_model_name: dinov2_vitb14_reg
    encoder_feat_dim: 768
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/home/ubuntu/training-tokyo/OpenLRM/views"
            meta_path:
                train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json"
                val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 1008 # higher resolution
    render_image:
        low: 512 # higher resolution
        high: 1008 # higher resolution
        region: 64
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16  # REPLACE THIS BASED ON GPU TYPE
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 3  # reduced it because of the CUDA OOM error
    accum_steps: 1
    epochs: 2000  # modified it for overfitting
    debug_global_steps: null

val:
    batch_size: 2 # modified
    global_step_period: 1000
    debug_batches: null

saver:
    auto_resume: true
    load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1" 
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

training result

[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|█| 60000/60000 [15:40:28<00:00,  1.06s/it]

as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good.

F_AAA_22FW_T002-ezgif com-optimize

F_AAB_23SS_O001-ezgif com-optimize

previous trained inference result

F_KOC_22SS_T004-ezgif com-optimize

M_AAA_22FW_T004-ezgif com-optimize

I really need to increase the texture resolution. Could you please give me an advice for that?

May 30 '24 08:05 hayoung-jeremy

Hi @ZexinHe , thank you for your advice. I'm wondering if I can modify the resolutions much higher than 336, such as 1008(since the patch value is 14)? My goal is to increase the inference result's texture quality. I'm finetuning openlrm-mix-large-1.1, with 1000 pairs of data. But the training result is not good.

training data

There are 1000 custom glb files, all processed through blender_script.py properly. I know the number of data is not enough, so I'm currently just trying overfitting.

train-sample.yaml

experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 128
    transformer_dim: 1024
    transformer_layers: 16
    transformer_heads: 16
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 80
    encoder_type: dinov2
    encoder_model_name: dinov2_vitb14_reg
    encoder_feat_dim: 768
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/home/ubuntu/training-tokyo/OpenLRM/views"
            meta_path:
                train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json"
                val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 1008 # higher resolution
    render_image:
        low: 512 # higher resolution
        high: 1008 # higher resolution
        region: 64
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16  # REPLACE THIS BASED ON GPU TYPE
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 3  # reduced it because of the CUDA OOM error
    accum_steps: 1
    epochs: 2000  # modified it for overfitting
    debug_global_steps: null

val:
    batch_size: 2 # modified
    global_step_period: 1000
    debug_batches: null

saver:
    auto_resume: true
    load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1" 
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

training result

[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|█| 60000/60000 [15:40:28<00:00,  1.06s/it]

as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good.

F_AAA_22FW_T002-ezgif com-optimize

F_AAB_23SS_O001-ezgif com-optimize

previous trained inference result

F_KOC_22SS_T004-ezgif com-optimize

M_AAA_22FW_T004-ezgif com-optimize

I really need to increase the texture resolution. Could you please give me an advice for that?

Hi boss Can I ask a question related to fune tuning ?

May 30 '24 08:05 joshkiller

Hi @JoshKiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can!

May 30 '24 09:05 hayoung-jeremy

Hi @joshkiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can!

I Was wondering if someone can fine tune a model and not change the general behavior of the model. Like i find that a model as stable diffusion generate some time images that can't be use for 3D reconstruction. How and with what kind of data can we remediate to that problem. so that the model cool generate only total and unique objects ? I'm doing my master intership program with text to 3d pipeline

May 30 '24 09:05 joshkiller

@joshkiller

I Was wondering if someone can fine tune a model and not change the general behavior of the model.

Usually what you are describing can be achived with LoRA and it's deriviatives. but I'm not sure OpenLRM can help you. OpenLRM is single_image->3d-reconstruction

text->3d is a different yet related task, there are other models available for this task.

If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM. There has been some work in different tasks stating that synthetic data pairs generated by a trained model could be beneficial, But in the task of text->3d my opinion is that you need to get good data to improve.

May 30 '24 10:05 da2r-20

@joshkiller

I Was wondering if someone can fine tune a model and not change the general behavior of the model.

Usually what you are describing can be achived with LoRA and it's deriviatives. but I'm not sure OpenLRM can help you. OpenLRM is single_image->3d-reconstruction

text->3d is a different yet related task, there are other models available for this task.

If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM. There has been some work in different tasks stating that synthetic data pairs generated by a trained model could be beneficial, But in the task of text->3d my opinion is that you need to get good data to improve.

Thanks a lot for your answers. I will try to delve into LoRA more than I did for now

May 30 '24 10:05 joshkiller

@hayoung-jeremy I'm also trying to finetune the same model Currenly it manages to overfit but with some issues. I manage to overfit to the shape of the object well but the textures get's lost and the overall look of the infered object appear blurry compared to the pretrain.

@ZexinHe I've also noticed that the original paper uses perceptual_weight=2.0

Training with this weight didn't improve my results though

May 30 '24 10:05 da2r-20

OpenLRM OpenLRM copied to clipboard

Pretrained models config files

training data

train-sample.yaml

training result

previous trained inference result

training data

train-sample.yaml

training result

previous trained inference result

OpenLRM
OpenLRM copied to clipboard