clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Question with local module

Open jing-bi opened this issue 4 years ago • 10 comments

I would say the automation together with the agent is amazing and I love this feature so much.

However, i got stuck at this point. The step I did:

  1. run task init in my train.py which import a few local modules like config.py
  2. after running, I got a complete task on server
  3. I cloned the task and put it into the queue
  4. the agent pulled the task and start installing the required pkgs everything here is fine until it starts running, the error shows like this:
Traceback (most recent call last):
  File "/user/.clearml/venvs-builds/3.9/code/train.py", line 8, in <module>
    from config import checkpoint
ModuleNotFoundError: No module named 'config'

Then I checked the code directory and got this:

ls .clearml/venvs-builds/3.9/code
>>> train.py

Seems the agent only copied the entry file, not the others. Can you please elaborate more about which files will be captured when running task init manually and how the agent decides which files will be copied to the venvs/code directory?

jing-bi avatar Dec 01 '21 04:12 jing-bi

besides, is there an option that the agent can use my existing conda env to run the code?

jing-bi avatar Dec 01 '21 13:12 jing-bi

Hi @jing-bi ,

How did you run your initial task? Was it from a Git repository containing your code? If you only executed a file not in a Git repository, ClearML will only store that specific file in the task's metadata and thus will only copy that file when running the task using the agent.

jkhenning avatar Dec 01 '21 16:12 jkhenning

Hi @jing-bi ,

How did you run your initial task? Was it from a Git repository containing your code? If you only executed a file not in a Git repository, ClearML will only store that specific file in the task's metadata and thus will only copy that file when running the task using the agent.

Thanks for the reply!

I was running the code under a sub folder of the repo, which might cause the problem.

Is there any way I can control how the agent rollouts the task, for example, can I ask it to not copy the code and run with my existing folder and can I ask it to run with my conda env instead of reinstalling all pkgs?

jing-bi avatar Dec 01 '21 16:12 jing-bi

I was running the code under a sub folder of the repo, which might cause the problem.

Check the task's Execution section - what does it contain?

jkhenning avatar Dec 01 '21 16:12 jkhenning

I know what's wrong, I use pycharm to run the code remotely, it's indeed a git repo on my local but on the remote it's not

jing-bi avatar Dec 01 '21 17:12 jing-bi

Hi, I am also facing the issue. What I understood from above is that it is fecting requirements.txt from git if it is there, am I correct?

Because I tried with Hyperparameter Optimisation and it created multiple experiment, and every experiment failed stating some module cannot be installed or module not found, once they fail, I am re cloning that experiment and manually editing installed packages section, also this installed packages is showing eg # Python 3.10.0 | packaged by conda-forge | (default, Nov 10 2021, 13:20:59) [MSC v.1916 64 bit (AMD64)] clearml @ git+https://github.com/allegroai/trains.git@eede2b6***************************#egg=clearml google_cloud_storage == 2.9.0 tensorflow == 2.12.0 tensorflow_intel == 2.12.0

I am not using google cloud storage at all.

I am not sure what is the issue here.

hotshotdragon avatar Jun 15 '23 10:06 hotshotdragon

@hotshotdragon can you attach the actual error you're seeing?

jkhenning avatar Jun 15 '23 15:06 jkhenning

resolved the issue, thanks

hotshotdragon avatar Jun 21 '23 13:06 hotshotdragon

What was it?

jkhenning avatar Jun 21 '23 14:06 jkhenning

hi, jing-bi, how did you resolve the problem?

Traceback (most recent call last): File "/user/.clearml/venvs-builds/3.9/code/train.py", line 8, in from config import checkpoint ModuleNotFoundError: No module named 'config'

GuudMan avatar Sep 30 '24 01:09 GuudMan