nni icon indicating copy to clipboard operation
nni copied to clipboard

use nni on colab:{"error":"File not found: /content/log/mo5q17fb/trials/L9du6/trial.log"}

Open ABChh26 opened this issue 2 years ago • 2 comments

Describe the issue: When i was using nni on colab , it always shown tip: {"error":"File not found: /content/log/mo5q17fb/trials/L9du6/trial.log"}

Environment:

  • NNI version:
  • Training service (local|remote|pai|aml|etc):
  • Client OS:
  • Server OS (for remote mode only):
  • Python version:
  • PyTorch/TensorFlow version:
  • Is conda/virtualenv/venv used?:
  • Is running in Docker?:

Configuration:

  • Experiment config (remember to remove secrets!):
  • Search space:

Log message:

  • nnimanager.log:
  • dispatcher.log:
  • nnictl stdout and stderr:

How to reproduce it?:

ABChh26 avatar Sep 10 '22 07:09 ABChh26

{"error":"File not found: /root/nni-experiments/rgiu2kbx/trials/rQUUZ/trial.log"}

ABChh26 avatar Sep 11 '22 04:09 ABChh26

Have you tried to cat /root/nni-experiments/rgiu2kbx/trials/rQUUZ/trial.log. Does it actually exist or not?

ultmaster avatar Sep 13 '22 02:09 ultmaster

hi @ABChh26 Could you help confirm the log directory?

Lijiaoa avatar Sep 22 '22 07:09 Lijiaoa

设置成你想放置的文件夹即可,format: experimentWorkingDirectory: "xx" xx最好是绝对路径

ABChh26 avatar Sep 22 '22 08:09 ABChh26

why did you write a few lines of css code? I think it's unuseful. You should reply this question. @ABChh26

Have you tried to cat /root/nni-experiments/rgiu2kbx/trials/rQUUZ/trial.log. Does it actually exist or not?

Lijiaoa avatar Sep 22 '22 09:09 Lijiaoa

设置成你想放置的文件夹即可,format: experimentWorkingDirectory: "xx" xx最好是绝对路径

其实我想跟你确认的是这个事情,当你执行命令 cat /root/nni-experiments/rgiu2kbx/trials/rQUUZ/trial.log 给你反馈的结果是什么呢?

Lijiaoa avatar Sep 23 '22 07:09 Lijiaoa

设置成你想放置的文件夹即可,format: experimentWorkingDirectory: "xx" xx最好是绝对路径

这个方法我尝试过,没有解决问题,在终端执行py文件,可以看到nni返回了中间结果,但就是生成不了trial.log文件,更换nni版本也不行,以前我使用nni==2.5.0是可以的,现在不行了

siwuxei avatar Oct 06 '22 02:10 siwuxei

设置成你想放置的文件夹即可,format: experimentWorkingDirectory: "xx" xx最好是绝对路径

其实我想跟你确认的是这个事情,当你执行命令 cat /root/nni-experiments/rgiu2kbx/trials/rQUUZ/trial.log 给你反馈的结果是什么呢?

没尝试过

ABChh26 avatar Oct 09 '22 12:10 ABChh26

设置成你想放置的文件夹即可,format: experimentWorkingDirectory: "xx" xx最好是绝对路径

这个方法我尝试过,没有解决问题,在终端执行py文件,可以看到nni返回了中间结果,但就是生成不了trial.log文件,更换nni版本也不行,以前我使用nni==2.5.0是可以的,现在不行了

可能还是路径问题吧,默认情况下,nni-experiments这个文件夹不在content下,我这个问题可能也是误打误撞解决了,抱歉啊家人!这个是我的config.yml,我每次都是路径切换到train_reg.py所在文件夹,然后在ipynb文件中运行: searchSpaceFile: space.json experimentName: pyg experimentWorkingDirectory: "/content/drive/MyDrive/exp" trialCommand: python train_reg.py trialGpuNumber: 1 trialConcurrency: 2 maxExperimentDuration: 24h tuner: name: TPE classArgs: optimize_mode: maximize

trainingService: platform: local useActiveGpu: true maxTrialNumberPerGpu: 2

ABChh26 avatar Oct 09 '22 12:10 ABChh26

设置成你想放置的文件夹即可,format: experimentWorkingDirectory: "xx" xx最好是绝对路径

这个方法我尝试过,没有解决问题,在终端执行py文件,可以看到nni返回了中间结果,但就是生成不了trial.log文件,更换nni版本也不行,以前我使用nni==2.5.0是可以的,现在不行了

可能还是路径问题吧,默认情况下,nni-experiments这个文件夹不在content下,我这个问题可能也是误打误撞解决了,抱歉啊家人!这个是我的config.yml,我每次都是路径切换到train_reg.py所在文件夹,然后在ipynb文件中运行: searchSpaceFile: space.json experimentName: pyg experimentWorkingDirectory: "/content/drive/MyDrive/exp" trialCommand: python train_reg.py trialGpuNumber: 1 trialConcurrency: 2 maxExperimentDuration: 24h tuner: name: TPE classArgs: optimize_mode: maximize

trainingService: platform: local useActiveGpu: true maxTrialNumberPerGpu: 2

这个问题我已经解决了, 请查看#5146,问题在于{trialGpuNumber: 1,trialConcurrency: 2}设置,请调整该配 {trialGpuNumber: 1, trialConcurrency: 1}即可,该问题可关闭

siwuxei avatar Oct 09 '22 13:10 siwuxei

设置成你想放置的文件夹即可,format: experimentWorkingDirectory: "xx" xx最好是绝对路径

这个方法我尝试过,没有解决问题,在终端执行py文件,可以看到nni返回了中间结果,但就是生成不了trial.log文件,更换nni版本也不行,以前我使用nni==2.5.0是可以的,现在不行了

可能还是路径问题吧,默认情况下,nni-experiments这个文件夹不在content下,我这个问题可能也是误打误撞解决了,抱歉啊家人!这个是我的config.yml,我每次都是路径切换到train_reg.py所在文件夹,然后在ipynb文件中运行: searchSpaceFile: space.json experimentName: pyg experimentWorkingDirectory: "/content/drive/MyDrive/exp" trialCommand: python train_reg.py trialGpuNumber: 1 trialConcurrency: 2 maxExperimentDuration: 24h tuner: name: TPE classArgs: optimize_mode: maximize trainingService: platform: local useActiveGpu: true maxTrialNumberPerGpu: 2

这个问题我已经解决了, 请查看#5146,问题在于{trialGpuNumber: 1,trialConcurrency: 2}设置,请调整该配 {trialGpuNumber: 1, trialConcurrency: 1}即可,该问题可关闭

@ABChh26 you could follow this suggestion to fix your issue. Thanks!

Lijiaoa avatar Oct 10 '22 01:10 Lijiaoa

I'll close this issue because this problem had been resolved. If you still have any questions please feel free to reopen it. Thanks.

Lijiaoa avatar Oct 12 '22 01:10 Lijiaoa