omnipose
omnipose copied to clipboard
Trying to fine tune a model but it fails. I need help and guidance.
Hi I need some advice in fine tuning a model. For some reason I can train a model from scratch by using the following CLI:
If I try to explicitly write the --nclass 2, it crashes.
If I try the following command, it also crashes.
- could you guide me in addressing this issue?
- Could you also show me how to load the custom model ?
I don't have a problem using the model already provided. So It must be something I am missing.
Error description shown below:
(omnipose) C:\Users\fsa>python -m omnipose --train --dir C:\Users\fsa\Desktop\bact_phase\train_sorted\5I_crop --mask_filter _masks --n_epochs 10 --pretrained_model bact_phase_omni --learning_rate 0.05 --diameter 0 --batch_size 16 --save_every 50 --RAdam
!NEW LOGGING SETUP! To see cellpose progress, set --verbose
No --verbose => no progress or info printed
2023-11-02 23:41:24,034 [INFO] >>>> using CPU
2023-11-02 23:41:24,034 [INFO] This model uses boundary field, setting nclasses=3.
2023-11-02 23:41:24,034 [INFO] Training omni model. Setting nclasses=3, RAdam=True
2023-11-02 23:41:24,038 [INFO] not all flows are present, will run flow generation for all images
2023-11-02 23:41:24,042 [INFO] pretrained model C:\Users\fsa.cellpose\models\bact_phase_omnitorch_0 is being used
2023-11-02 23:41:24,042 [INFO] median diameter set to 0 => no rescaling during training
2023-11-02 23:41:24,186 [INFO] No precomuting flows with Omnipose. Computed during training.
2023-11-02 23:41:24,205 [INFO] >>> Using RAdam optimizer
2023-11-02 23:41:24,206 [INFO] >>>> training network with 2 channel input <<<<
2023-11-02 23:41:24,206 [INFO] >>>> LR: 0.05000, batch_size: 16, weight_decay: 0.00001
2023-11-02 23:41:24,206 [INFO] >>>> ntrain = 5
2023-11-02 23:41:24,206 [INFO] >>>> nimg_per_epoch = 5
2023-11-02 23:41:24,206 [INFO] >>>> Start time: 23:41:24
C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\omnipose\utils.py:220: RuntimeWarning: invalid value encountered in divide
return module.clip((Y-lower_val)/(upper_val-lower_val),0,1)
C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\omnipose\utils.py:53: RuntimeWarning: invalid value encountered in cast
return np.uint16(rescale(im)(2**16-1))
2023-11-02 23:41:27,116 [INFO] Train epoch: 0 | Time: 0.05min | last epoch: 0.00s | <sec/epoch>: 0.00s | <sec/batch>: 0.84s | <Batch Loss>: 1.140086 | <Epoch Loss>: 1.140086
2023-11-02 23:41:27,117 [INFO] saving network parameters to C:\Users\fsa\Desktop\bact_phase\train_sorted\5I_crop\models/cellpose_residual_on_style_on_concatenation_off_omni_abstract_nclasses_3_nchan_2_dim_2_5I_crop_2023_11_02_23_41_24.194542
Traceback (most recent call last):
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\omnipose_main.py", line 12, in
After spending some time diagnosing the problem, I believe I found the issue.
In Omnipose core.py, there is a line
# percentile clipping augmentation
if aug_choices[1]:
dp = .1 # changed this from 10 to .1, as usual pipleine uses 0.01, 10 was way too high for some images
dpct = np.random.triangular(left=0, mode=0, right=dp, size=2) # weighted toward 0
imgi[k] = utils.normalize99(imgi[k],upper=100-dpct[0],lower=dpct[1])
This routine is engaged on a normalized image. By normalizing a normalized image again, it creates NaN values thus affecting the rest of the code and exiting on an error. I hope this help the community if someone runs into the same issue.
@fsalfonzo thanks for reporting this. I haven't seen any issues with normalization, but I will check into it. Looks like you got this on the 5I_crop subset, so that is super helpful for debugging.