RunHarmony appears to be ignoring max_iter (and max.iter.harmony)
Issue Description
I was using a snipped from a couple vignettes and it was still using max.iter.harmony, but got the warning that max.iter.harmony was deprecated and also that RunHarmony did not converge after the default max of 25 (I had it set to 50). Initially I thought maybe it was counting the double pass, such that my 50/2 was still 25, so I set it to 100 and used max_iter instead. This had no impact on getting the warning about not converging in 25 iterations.
Reproducing Code Example
myobj <- RunHarmony(myobj, group.by.vars = "orig.ident", dims.use = RH_dims,
max_iter = 100, ncores = 8)
Error Message
Warning message:
did not converge in 25 iterations
Additional Comments
ncores also seems to have no effect, as despite blas being available the process still ran single-threaded. It was fast enough compared to other tasks, so this isn't as much of a concern about not being able to do iterations until it converges
HI @brianlamere,
there is also the parameter early_stop which you need to set to FALSE. Have you tried this?
ncores also seems to have no effect, as despite blas being available the process still ran single-threaded. It was fast enough compared to other tasks, so this isn't as much of a concern about not being able to do iterations until it converges
Unfortunately, no proper multithreading support exists, so we don't affect the memory requirements negatively. There are some benefits if you use the OPENMP supported blas. Sorry for the confusion, the instructions are out of date
there is also the parameter
early_stopwhich you need to set to FALSE. Have you tried this?
I didn't, however the function only says:
max_iter
Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step.
early_stop
Enable early stopping for harmony. The harmonization process will stop when the change of objective function between corrections drops below 1e-4
Neither mentions the other, if it had I would have set early_stop to false. early_stop just says it stops when it has effectively reached it's goal, but if I'm getting a warning that the data didn't converge, then that wouldn't be the case? I can set it regardless, but it's a merged data set and some of the merged items finish in 7 iterations, it's just a couple that don't finish by 25. Wouldn't setting early_stop to false mean that it would just keep running for 50 runs even if it finished in 7? Seems like early_stop and max_iter shouldn't be related/connected; one lets it stop at a minimum, another defines the maximum. They're sortof opposites.
I see:
Warning message: did not converge in 25 iterations
This is emitted from the k-means cluster centroid seeding. You can safely ignore this warning as this is only an initial step of the algorithm.
The proper way to see how many iterations it took to finish is by looking at the convergence plot see plot_convergence parameter.
Hope this helps.. let me know what are your actual iterations.
You can safely ignore this warning as this is only an initial step of the algorithm.
I'm converging in 8, however they aren't fully integrated still, and clusters made after still are difficult to group to actual cells, as several markers are found in multiple clusters (the batch effect). Which, even if the warning is safely able to be ignored, seems...to...be the only thing giving me a warning in my entire pipeline, and feels related to my lingering batch effects.
Note that I reverted to Seurat4 because of the layers; I had gone through quite a lot of acrobatics to use RunPCA and RunHarmony with Seurat5, and restarted with Seurat4 and have a much cleaner, faster running tool - that still has this same issue with RunHarmony on the RNA data, where it says it converges in 8 iterations, but then also says (immediately after), it didn't converge after 25 iterations, with a max_iter of 50.
Edit: I added plot_convergence = TRUE and it seems to have changed the behavior of the internal processes? I was tracing all of harmony, couldn't find how to change max_iter for that internal call, and with the plot turned on I no longer get the error about not converging in 25 iterations. If I close my session, revert my changes, rerun again, I get the error - simply adding that flag, removes the warning about 25 iterations. The traces I'm looking at aren't helpful to know why.
The convergence plot shows 8 integrations, 122 clustering steps.