circuit_training icon indicating copy to clipboard operation
circuit_training copied to clipboard

CT Tool for Ariane-133: Snapshot generation halt

Open val-terry opened this issue 11 months ago • 0 comments

Hello, I wanted to ask about the Ariane-133 design (from MacroPlacement). I ran the CT tool for 11 days (Intel Xeon, 132GB RAM, and I used 3 collect jobs, no GPUs), however it seemed that the tool stopped generating snapshots after day two. On Tensorboard, I also noticed that the losses plateaued around day two. Is there a reason for this? Additionally, is there a measure in place to know when the tool is done? It seems like it finished generating snapshots but continued to run. Should it stop at a certain point? Lastly, I am curious as to why the checkpoints directory was empty (no checkpoints created, even after 31k steps).

Thank you so much!

snapshot results taken on 01/26/25: Image

Tensorboard Results: plateau occurs after 1.782 days: Image

all jobs ended manually on day 11: Image

Image

Image

Image

Image

val-terry avatar Jan 30 '25 19:01 val-terry