Zhanghao Wu

Results 315 comments of Zhanghao Wu

Hi, Thank you for asking! I am not sure what causes the problem without the training command you use. Maybe you could check the training data and validation data to...

Thank you for asking! We will prepare the tensorboard file for release.

Thank you for asking! We did not support the convention of the model to ONNX format. We are appreciated it if anyone would like to contribute for that. ;)

This is a great catch! It is very hard to find. A quick question, if the cluster is ready at the first attempt, checking the actual termination state may cause...

Is this still a problem after PR #969 was merged @infwinston ?

> Seems like this can affect non-spot jobs as well? Yes, since they only run CPU tasks right now, and the resource requirement of a non-spot job is not configurable,...

> Is it possible for the user to organize their files into a local hierarchy however they want, then simply specify the root of the hierarchy in the source? >...

[Here](https://docs.google.com/document/d/1l0h1I4eB5wptVAFK137oKB5C_7LJRPkJAEu9_EkvFPY/edit?usp=sharing) are some of the possible solutions for how to deal with the `workdir` after a discussion with @concretevitamin and @romilbhardwaj , but for now, we can stick with modifying...

Related to this: `ray up` creates a new spot controller due to the private key changed in the ray yaml.