trackformer icon indicating copy to clipboard operation
trackformer copied to clipboard

Error while training custom dataset, in validation phase:

Open insookim43 opened this issue 1 year ago • 3 comments

Hi, thank you for great work. :>

When I try to train custom data, I face this error in validation phase inside an epoch training. image image

According to the error message, this error supposed to be because of factory.py don't have information of custom dataset's sequences. Thus I added custom dataset sequence name to factory.py with

# custom data
for split in [
                'video-BzZspxAweF8AnKhWK', 
                'video-FkqCGijjAKpABetZZ', 
                'video-PGdt7pJChnKoJDt35', 
                'video-RMxN6a4CcCeLGu4tA', 
                'video-YnfPeH8i2uBWmsSd2', 
                'video-dvZBYnphN2BwdMKBc', 
                'video-hnbGXq3nNPjBbc7CL', 
                'video-msNEBxJE5PPDqenBM']:
    DATASETS[split] = (lambda kwargs: [DemoSequence(**kwargs), ])
  

But then I face the following error.

Traceback (most recent call last):
  File "/app/TMOT/src/train.py", line 410, in <module>
    train(args)
  File "/app/TMOT/src/train.py", line 353, in train
    val_stats, _ = evaluate(
                   ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/TMOT/src/trackformer/engine.py", line 331, in evaluate
    eval_summary, eval_summary_str = evaluate_mot_accums(
                                     ^^^^^^^^^^^^^^^^^^^^
  File "/app/TMOT/src/trackformer/util/track_utils.py", line 407, in evaluate_mot_accums
    summary = mh.compute_many(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/motmetrics/metrics.py", line 310, in compute_many
    assert names is None or len(names) == len(dfs)
                            ^^^^^^^^^^^^^^^^^^^^^^

My command was this: python src/train.py with flir_adas_v2 deformable multi_frame tracking output_dir=models/flir_adas_v2_deformable_multi_frame resume=/app/TMOT/models/r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint_hidden_dim_288.pth epochs=20

In the error message I see the error is due to "evaluate_mot_accums'". My first thought was this function might be related to evaluating mot dataset, not custom dataset, so I may just skip some mot evaluation phase?

Now I going to look into the lines in the error message and try to resolve error line by line, but I think I am missing something here.
Any advice according to the problem will be appreciated. Thank you in advance.

Best, Insoo.

insookim43 avatar Nov 16 '23 04:11 insookim43

I generated train/val set using the logic from src/generating_coco_from_mot17.py. (but adapted to my custom dataset) I used different sequence in train and validation dataset.(like train_split contains {seqA, seqB, seqC} and val_split contains {seqD, seqE}. Must look into generating coco from mot17, I must have lost some detail when generating coco from custom data.

insookim43 avatar Nov 17 '23 01:11 insookim43

@insookim43 hi i am also facing the same issue ! have you resolved it ? can you help me please

ajaypediredla14 avatar Dec 15 '23 11:12 ajaypediredla14

@insookim43안녕하세요, 저도 같은 문제에 직면해 있습니다! 해결하셨나요? 도와 줄수있으세요

Hello, I am working on this now. If I solve the error, I will leave comment later. Sorry to reply late.

insookim43 avatar Jan 02 '24 09:01 insookim43