OpenLRM
OpenLRM copied to clipboard
Pretrained models config files
Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large.
Hi,
You can simply change the model configs and dataset configs based on the differences described in the model_card.md.
Here's an example.
Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large. Every published model contains a
config.jsonfile with the info See here for example: https://huggingface.co/zxhezexin/openlrm-obj-small-1.1/tree/main
You can also fetch this configuration with the following code:
import transformers
model_config = transformers.PretrainedConfig.from_pretrained("zxhezexin/openlrm-obj-base-1.1")
print(model_config)
PretrainedConfig {
"camera_embed_dim": 1024,
"encoder_feat_dim": 768,
"encoder_freeze": false,
"encoder_model_name": "dinov2_vitb14_reg",
"encoder_type": "dinov2",
"rendering_samples_per_ray": 96,
"transformer_dim": 768,
"transformer_heads": 12,
"transformer_layers": 12,
"transformers_version": "4.28.1",
"triplane_dim": 48,
"triplane_high_res": 64,
"triplane_low_res": 32
}
Personaly for pretraining I've changed the code to load the pretrained model directly
from openlrm.utils.hf_hub import wrap_model_hub
class LRMTrainer(Trainer):
...
def _build_model(self, cfg):
assert (
cfg.experiment.type == "lrm"
), f"Config type {cfg.experiment.type} does not match with runner {self.__class__.__name__}"
from openlrm.models import ModelLRM
model_class = wrap_model_hub(ModelLRM)
model = model_class.from_pretrained(cfg.experiment.pretrained)
return model
you can replace cfg.experiment.pretrained with "zxhezexin/openlrm-obj-base-1.1" or add a pretrained key to your config
@da2r-20 This is amazing! Thanks!
Hi @ZexinHe , thank you for your advice. I'm wondering if I can modify the resolutions much higher than 336, such as 1008(since the patch value is 14)? My goal is to increase the inference result's texture quality. I'm finetuning openlrm-mix-large-1.1, with 1000 pairs of data. But the training result is not good.
training data
There are 1000 custom glb files, all processed through blender_script.py properly.
I know the number of data is not enough, so I'm currently just trying overfitting.
train-sample.yaml
experiment:
type: lrm
seed: 42
parent: lrm-objaverse
child: small-dummyrun
model:
camera_embed_dim: 1024
rendering_samples_per_ray: 128
transformer_dim: 1024
transformer_layers: 16
transformer_heads: 16
triplane_low_res: 32
triplane_high_res: 64
triplane_dim: 80
encoder_type: dinov2
encoder_model_name: dinov2_vitb14_reg
encoder_feat_dim: 768
encoder_freeze: false
dataset:
subsets:
- name: objaverse
root_dirs:
- "/home/ubuntu/training-tokyo/OpenLRM/views"
meta_path:
train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json"
val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json"
sample_rate: 1.0
sample_side_views: 3
source_image_res: 1008 # higher resolution
render_image:
low: 512 # higher resolution
high: 1008 # higher resolution
region: 64
normalize_camera: true
normed_dist_to_center: auto
num_train_workers: 4
num_val_workers: 2
pin_mem: true
train:
mixed_precision: bf16 # REPLACE THIS BASED ON GPU TYPE
find_unused_parameters: false
loss:
pixel_weight: 1.0
perceptual_weight: 1.0
tv_weight: 5e-4
optim:
lr: 4e-4
weight_decay: 0.05
beta1: 0.9
beta2: 0.95
clip_grad_norm: 1.0
scheduler:
type: cosine
warmup_real_iters: 3000
batch_size: 3 # reduced it because of the CUDA OOM error
accum_steps: 1
epochs: 2000 # modified it for overfitting
debug_global_steps: null
val:
batch_size: 2 # modified
global_step_period: 1000
debug_batches: null
saver:
auto_resume: true
load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1"
checkpoint_root: ./exps/checkpoints
checkpoint_global_steps: 1000
checkpoint_keep_level: 5
logger:
stream_level: WARNING
log_level: INFO
log_root: ./exps/logs
tracker_root: ./exps/trackers
enable_profiler: false
trackers:
- tensorboard
image_monitor:
train_global_steps: 100
samples_per_log: 4
compile:
suppress_errors: true
print_specializations: true
disable: true
training result
[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|█| 60000/60000 [15:40:28<00:00, 1.06s/it]
as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good.
previous trained inference result
I really need to increase the texture resolution. Could you please give me an advice for that?
Hi @ZexinHe , thank you for your advice. I'm wondering if I can modify the resolutions much higher than 336, such as 1008(since the patch value is 14)? My goal is to increase the inference result's texture quality. I'm finetuning openlrm-mix-large-1.1, with 1000 pairs of data. But the training result is not good.
training data
There are 1000 custom glb files, all processed through
blender_script.pyproperly. I know the number of data is not enough, so I'm currently just trying overfitting.train-sample.yaml
experiment: type: lrm seed: 42 parent: lrm-objaverse child: small-dummyrun model: camera_embed_dim: 1024 rendering_samples_per_ray: 128 transformer_dim: 1024 transformer_layers: 16 transformer_heads: 16 triplane_low_res: 32 triplane_high_res: 64 triplane_dim: 80 encoder_type: dinov2 encoder_model_name: dinov2_vitb14_reg encoder_feat_dim: 768 encoder_freeze: false dataset: subsets: - name: objaverse root_dirs: - "/home/ubuntu/training-tokyo/OpenLRM/views" meta_path: train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json" val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json" sample_rate: 1.0 sample_side_views: 3 source_image_res: 1008 # higher resolution render_image: low: 512 # higher resolution high: 1008 # higher resolution region: 64 normalize_camera: true normed_dist_to_center: auto num_train_workers: 4 num_val_workers: 2 pin_mem: true train: mixed_precision: bf16 # REPLACE THIS BASED ON GPU TYPE find_unused_parameters: false loss: pixel_weight: 1.0 perceptual_weight: 1.0 tv_weight: 5e-4 optim: lr: 4e-4 weight_decay: 0.05 beta1: 0.9 beta2: 0.95 clip_grad_norm: 1.0 scheduler: type: cosine warmup_real_iters: 3000 batch_size: 3 # reduced it because of the CUDA OOM error accum_steps: 1 epochs: 2000 # modified it for overfitting debug_global_steps: null val: batch_size: 2 # modified global_step_period: 1000 debug_batches: null saver: auto_resume: true load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1" checkpoint_root: ./exps/checkpoints checkpoint_global_steps: 1000 checkpoint_keep_level: 5 logger: stream_level: WARNING log_level: INFO log_root: ./exps/logs tracker_root: ./exps/trackers enable_profiler: false trackers: - tensorboard image_monitor: train_global_steps: 100 samples_per_log: 4 compile: suppress_errors: true print_specializations: true disable: truetraining result
[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|█| 60000/60000 [15:40:28<00:00, 1.06s/it]as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good.
previous trained inference result
I really need to increase the texture resolution. Could you please give me an advice for that?
Hi boss Can I ask a question related to fune tuning ?
Hi @JoshKiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can!
Hi @joshkiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can!
I Was wondering if someone can fine tune a model and not change the general behavior of the model. Like i find that a model as stable diffusion generate some time images that can't be use for 3D reconstruction. How and with what kind of data can we remediate to that problem. so that the model cool generate only total and unique objects ? I'm doing my master intership program with text to 3d pipeline
@joshkiller
I Was wondering if someone can fine tune a model and not change the general behavior of the model.
Usually what you are describing can be achived with LoRA and it's deriviatives. but I'm not sure OpenLRM can help you. OpenLRM is single_image->3d-reconstruction
text->3d is a different yet related task, there are other models available for this task.
If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM. There has been some work in different tasks stating that synthetic data pairs generated by a trained model could be beneficial, But in the task of text->3d my opinion is that you need to get good data to improve.
@joshkiller
I Was wondering if someone can fine tune a model and not change the general behavior of the model.
Usually what you are describing can be achived with LoRA and it's deriviatives. but I'm not sure OpenLRM can help you. OpenLRM is single_image->3d-reconstruction
text->3d is a different yet related task, there are other models available for this task.
If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM. There has been some work in different tasks stating that synthetic data pairs generated by a trained model could be beneficial, But in the task of text->3d my opinion is that you need to get good data to improve.
Thanks a lot for your answers. I will try to delve into LoRA more than I did for now
@hayoung-jeremy I'm also trying to finetune the same model Currenly it manages to overfit but with some issues. I manage to overfit to the shape of the object well but the textures get's lost and the overall look of the infered object appear blurry compared to the pretrain.
@ZexinHe I've also noticed that the original paper uses perceptual_weight=2.0
Training with this weight didn't improve my results though