umap icon indicating copy to clipboard operation
umap copied to clipboard

Increasing loss on ParametricUMAP training with Keras 3.7 on GPU

Open jacobgolding opened this issue 4 months ago • 1 comments

In testing various configurations for #1200 and #1180 I found some unusual behavior. I have access to two different environments at the moment (linux VM, mac with M2). All tests were done with the MNIST_Landmarks notebook. When training a normal ParametricUMAP model for the first time the loss increases over time on Keras 3.7 (after a short initial period of decreating) but works just fine on 3.6. I don't have time to dig into the version differences now - I don't think this is related to the above linked issues but it could be.

OS/Hardware Python Keras tensorflow torch umap Works?
Mac, M2 3.12.11 3.6 2.19.0 2.7.1 0.5.9.post2 Yes
Mac, M2 3.12.11 >=3.7 2.19.0 2.7.1 0.5.9.post2 No (loss increases)
Mac, cpu 3.12.11 3.6 2.19.0 2.7.1 0.5.9.post2 Yes
Mac, cpu 3.12.11 >=3.7 2.19.0 2.7.1 0.5.9.post2 Yes
Linux, cpu 3.12.2 3.6 2.17.0 2.6.0 0.5.9.post2 Yes
Linux, cpu 3.12.2 >=3.7 2.17.0 2.6.0 0.5.9.post2 Yes

jacobgolding avatar Jul 31 '25 10:07 jacobgolding

All up this seems very odd. Intermittent hardware and software dependent bugs are deeply confusing.

On Thu, Jul 31, 2025 at 6:14 AM jacobgolding @.***> wrote:

jacobgolding created an issue (lmcinnes/umap#1211) https://github.com/lmcinnes/umap/issues/1211

In testing various configurations for #1200 https://github.com/lmcinnes/umap/issues/1200 and #1180 https://github.com/lmcinnes/umap/issues/1180 I found some unusual behavior. I have access to two different environments at the moment (linux VM, mac with M2). All tests were done with the MNIST_Landmarks notebook. When training a normal ParametricUMAP model for the first time the loss increases over time on Keras 3.7 (after a short initial period of decreating) but works just fine on 3.6. I don't have time to dig into the version differences now - I don't think this is related to the above linked issues but it could be. OS/Hardware Python Keras tensorflow torch umap Works? Mac, M2 3.12.11 3.6 2.19.0 2.7.1 0.5.9.post2 Yes Mac, M2 3.12.11 >=3.7 2.19.0 2.7.1 0.5.9.post2 No (loss increases) Mac, cpu 3.12.11 3.6 2.19.0 2.7.1 0.5.9.post2 Yes Mac, cpu 3.12.11 >=3.7 2.19.0 2.7.1 0.5.9.post2 Yes Linux, cpu 3.12.2 3.6 2.17.0 2.6.0 0.5.9.post2 Yes Linux, cpu 3.12.2 >=3.7 2.17.0 2.6.0 0.5.9.post2 Yes

— Reply to this email directly, view it on GitHub https://github.com/lmcinnes/umap/issues/1211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3IUBIWIHVJJAC3K4B3GMT3LHUB7AVCNFSM6AAAAACCZOLIJGVHI2DSMVQWIX3LMV43ASLTON2WKOZTGI3TSOBYG44DOOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lmcinnes avatar Jul 31 '25 18:07 lmcinnes