Ray Gao comments

Results 16 comments of


                                            Ray Gao

Step counter reset upon resuming training job

W&B step count warning is normal because we only resume at the last checkpoint, ie: latest checkpoint at step 5000, job crashed at 5100, then it should resume at step...

Setting specific GPUs

for CUDA_VISIBLE_DEVICES to work you need to have it in the slurm environment, we dont use this right now and instead just assign the device for the rank using [torch.cuda.set_device](https://github.com/facebookresearch/fairchem/blob/abcdf661926ce13e9d8cf3fe1d4484e58780a1fd/src/fairchem/core/common/distutils.py#L244)...

Questions Regarding Fine-Tuning UMA Models

Hi yes, in finetuning, we remove all the heads and re-initialize them from scratch (the weights from the backbone are retained) hence the accuracy will be lower, but it should...

Questions Regarding Fine-Tuning UMA Models

you can, this is basically just the same as continuing to train the same model, it would mean that your data needs to have the same level of DFT theory...

Questions Regarding Fine-Tuning UMA Models

to train the full model with all the heads, the easiest way is to train the original uma itself with the following yaml (if you are using uma-s): https://github.com/facebookresearch/fairchem/blob/main/configs/uma/training_release/uma_sm_conserve_finetune.yaml ie:...

Question about citing specific models

We would have the model artifact and the exact commit (or version) of the source code to run. The issue right now we're not tieing the model to the code...