dlrover icon indicating copy to clipboard operation
dlrover copied to clipboard

Torch Trainer Hook

Open Antlera opened this issue 11 months ago • 2 comments

For this issue, the objective is to create a hook or callback system in our PyTorch trainer that would allow it to invoke resource monitoring and time reporting at the start of training. This hook should be well-integrated into the training process and should not interfere with the main training tasks. We need to design this hook in a way that it can trigger our resources reporter, and potentially, other types of monitors we may add in the future. We can implement this manually or use callback mechanisms similar to what is available in PyTorch Lightning.

Antlera avatar Jul 28 '23 14:07 Antlera