clearml icon indicating copy to clipboard operation
clearml copied to clipboard

'NoneType' object has no attribute 'get_logger' - clearML with pytorch distributed

Open sholevs66 opened this issue 2 years ago • 4 comments

Hi, I'm trying to follow your examples and use clearML with Pytorch distributed run.

my script looks as follows:

from clearml import Task, Logger
import argparse

  def main(args):

  if int(os.environ.get('LOCAL_RANK', 0)) == 0:
      task = Task.init(project_name='DETR', task_name='all_bn_detr')

  for epoch in range(args.start_epoch, args.epochs):
      train_stats = train_one_epoch(
          model, criterion, data_loader_train, optimizer, device, epoch,
          args.clip_max_norm)

     Task.current_task().get_logger().report_scalar("test", "mAP", iteration=epoch, value=a.stats[0])

if __name__ == '__main__':
    args = parser.parse_args()
    main(args)

I'm getting error messages saying:

File "main.py", line 275, in main Task.current_task().get_logger().report_scalar("test", "mAP", iteration=epoch, value=a.stats[0]) AttributeError: 'NoneType' object has no attribute 'get_logger'

When I try to change my report_scalar to: Logger.current_logger().report_scalar("train", "loss bbox", iteration=epoch, value=train_stats['loss_bbox']) I also get a similar message.

What am I missing?

Thanks

sholevs66 avatar Jun 07 '22 06:06 sholevs66

Hi,

if int(os.environ.get('LOCAL_RANK', 0)) == 0: task = Task.init(project_name='DETR', task_name='all_bn_detr')

How do you associate a task to your execution if the script doesn't enter into the conditionnal statement ? You have to be sure that you have a task associated to the execution before you call current_task or current_logger. If your code doesn't associate any task to his execution (it doesn't do any Task.init) then those functions will return None and will have this kind of error.

DavidNativ avatar Jun 07 '22 07:06 DavidNativ

Hey. Thanks for the reply! The reason I got into writing:

if int(os.environ.get('LOCAL_RANK', 0)) == 0: task = Task.init(project_name='DETR', task_name='all_bn_detr')

is that my run command is of the form: python -m torch.distributed.launch --nproc_per_node=8 If I am not using the above condition, I will get 8 different projects for a single run in the clearML API: image

sholevs66 avatar Jun 07 '22 08:06 sholevs66

I suggest you to launch the distributed training for within your main script file, so that all the nodes will be reported into one unique task, also created in the main.

Here is a detailed example : https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_distributed_example.py

DavidNativ avatar Jun 07 '22 08:06 DavidNativ

I suggest you to launch the distributed training for within your main script file, so that all the nodes will be reported into one unique task, also created in the main.

Here is a detailed example : https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_distributed_example.py

Thanks!

sholevs66 avatar Jun 07 '22 13:06 sholevs66