Results 8 comments of Sijin Li

> So to be clear, things are working for a long while, and then suddenly they fail? > > Have you seen this happen just one time, or does it...

Is there any update ? @rb-determined-ai :)

> So far, I was not able to reproduce this but I will keep trying. Can you check the size of the object on S3 involved in the upload: `ml-checkpoint/51085318-5dd5-45c2-81fd-d3ad495f541c/tensorboard/experiment/729/trial/725/events.out.tfevents.1664468248.exp-729-trial-725-0-729.d7a76451-81d9-49e4-b2b2-61d46293cf29.6.390.0`...

> Please let us know which version of minion you are using. > > Until we come up with a more permanent solution, would you mind trying the following workaround:...

> Did you get a chance to try the workaround? We are trying about 3 days ago, and so far so good. By the way, the same error has happened...

> sharing this with our webui team now. Please comment here when the fixed version is released, thanks!

> > the training metrics can not be shown in the bottom table but it can be plotted on the line chart. > > We think this might be an...

After testing the workaround for more than one week with 4 different tasks, it works well. Thank you @mpkouznetsov ! So would you update it to master and release in...