liuzhe-lz

Results 24 comments of liuzhe-lz

~~NNI does not support ARM platform. Please use x86 Python with Rosetta.~~ ~~Or you can try to build from source, by changing `x64` [here](https://github.com/microsoft/nni/blob/master/setup_ts.py#L79) to `arm64`. NNI code itself is...

> > > NNI does not support ARM platform. Please use x86 Python with Rosetta. > > > Or you can try to build from source, by changing `x64` [here](https://github.com/microsoft/nni/blob/master/setup_ts.py#L79)...

Please print `os.environ['CUDA_VISIBLE_DEVICES']` to log and tell us its value.

It is a per-machine config for remote mode. ```yaml maxTrialNumber: 20 trialCommand: python main.py trialCodeDirectory: . trialGpuNumber: 2 trialConcurrency: 4 tuner: name: TPE classArgs: optimize_mode: maximize trainingService: platform: remote reuseMode:...

How often does your trial code report intermediate result? If your epoch is short, you can try to report intermediate result per 10 epochs.

We'll consider add this feature in next release. You can open a PR if you want.

Discussion iteration 1 conclusions: ### Log files Each HPO experiment writes 3 files: - ~/nni-experiments/EXPERIMENT-ID/logs/experiment.log - ~/nni-experiments/EXPERIMENT-ID/logs/nnimanager.log - ~/nni-experiments/EXPERIMENT-ID/logs/dispatcher.log Each NAS multi-trial experiment writes 2 files: - ~/nni-experiments/EXPERIMENT-ID/logs/experiment.log - ~/nni-experiments/EXPERIMENT-ID/logs/nnimanager.log...

Please check you have sufficient disk space and have permission to write in ~/nni-experiments/ directory.

``` [2022-06-23 02:11:31] ERROR (NNIManager) Dispatcher error: read ECONNRESET [2022-06-23 02:11:31] ERROR (NNIManager) Error: Dispatcher stream error, tuner may have crashed. ``` Seems there's something wrong in tuner. What's the...