human_object_interaction icon indicating copy to clipboard operation
human_object_interaction copied to clipboard

the reproduce performance on cad-120

Open SISTMrL opened this issue 2 years ago • 3 comments

hello, when i reproduce the performance on cad-120, i found my performance is lower than the trained model you released

here is my test.log

Subject1 Affordance Prediction precision recall f1-score support

 movable     0.7432    0.6857    0.7133      3093

stationary 0.9241 0.9068 0.9154 18192 reachable 0.5653 0.6768 0.6161 3388 pourable 0.8354 0.8408 0.8381 471 pourto 0.9068 0.5372 0.6747 471 containable 0.8285 0.6708 0.7414 641 drinkable 0.7320 0.7774 0.7540 274 openable 0.7340 0.8104 0.7703 538 placeable 0.8115 0.7976 0.8045 2574 closeable 0.4842 0.8270 0.6108 185 cleanable 0.7322 0.9926 0.8428 135 cleaner 0.7600 0.8444 0.8000 135

accuracy                         0.8337     30097

macro avg 0.7548 0.7806 0.7568 30097 weighted avg 0.8423 0.8337 0.8360 30097

Affordance Recognition precision recall f1-score support

 movable     0.7447    0.8040    0.7732      3632

stationary 0.9114 0.9552 0.9328 20368 reachable 0.6744 0.6543 0.6642 2447 pourable 0.8473 0.6011 0.7033 554 pourto 0.8028 0.5217 0.6324 554 containable 0.4983 0.4639 0.4805 319 drinkable 0.9727 0.4944 0.6556 360 openable 0.8726 0.8930 0.8827 1243 placeable 0.7956 0.5303 0.6364 2033 closeable 0.8347 0.9120 0.8716 659 cleanable 0.9506 0.9371 0.9438 493 cleaner 0.9597 0.8702 0.9128 493

accuracy                         0.8628     33155

macro avg 0.8221 0.7198 0.7574 33155 weighted avg 0.8607 0.8628 0.8579 33155

Sub-activity Prediction precision recall f1-score support

reaching     0.7303    0.7755    0.7522      3484
  moving     0.7631    0.6850    0.7219      3470
 pouring     0.8688    0.8577    0.8632       471
  eating     0.3873    0.6000    0.4707       335
drinking     0.7168    0.7299    0.7233       274
 opening     0.7167    0.7900    0.7515       538
 placing     0.7746    0.7758    0.7752      2578
 closing     0.7017    0.9027    0.7896       185
    null     0.8816    0.6572    0.7531       884
cleaning     0.6176    0.9333    0.7434       135

accuracy                         0.7433     12354

macro avg 0.7159 0.7707 0.7344 12354 weighted avg 0.7530 0.7433 0.7450 12354

Sub-activity Recognition precision recall f1-score support

reaching     0.6706    0.7080    0.6888      2507
  moving     0.6927    0.8235    0.7524      4305
 pouring     0.8387    0.5632    0.6739       554
  eating     0.0000    0.0000    0.0000       272
drinking     1.0000    0.4667    0.6364       360
 opening     0.8500    0.8206    0.8350      1243
 placing     0.7773    0.6548    0.7108      2025
 closing     0.8029    0.9211    0.8580       659
    null     0.7732    0.7789    0.7761      1357
cleaning     0.9789    0.8458    0.9075       493

accuracy                         0.7424     13775

macro avg 0.7384 0.6583 0.6839 13775 weighted avg 0.7390 0.7424 0.7341 13775

[email protected] metric. Affordance Prediction [email protected]: 0.9047 Affordance Recognition [email protected]: 0.9091 Sub-activity Prediction [email protected]: 0.8923 Sub-activity Recognition [email protected]: 0.8687

[email protected] metric. Affordance Prediction [email protected]: 0.8561 Affordance Recognition [email protected]: 0.8703 Sub-activity Prediction [email protected]: 0.8506 Sub-activity Recognition [email protected]: 0.8289

[email protected] metric. Affordance Prediction [email protected]: 0.7222 Affordance Recognition [email protected]: 0.7585 Sub-activity Prediction [email protected]: 0.7069 Sub-activity Recognition [email protected]: 0.6869

Summary Performance for Cross-validation. affordance_prediction-micro_precision Values: [0.8337] Mean: 0.8337 Std: 0.0000 affordance_prediction-micro_recall Values: [0.8337] Mean: 0.8337 Std: 0.0000 affordance_prediction-micro_f1 Values: [0.8337] Mean: 0.8337 Std: 0.0000 affordance_prediction-macro_precision Values: [0.7548] Mean: 0.7548 Std: 0.0000 affordance_prediction-macro_recall Values: [0.7806] Mean: 0.7806 Std: 0.0000 affordance_prediction-macro_f1 Values: [0.7568] Mean: 0.7568 Std: 0.0000 affordance_recognition-micro_precision Values: [0.8628] Mean: 0.8628 Std: 0.0000 affordance_recognition-micro_recall Values: [0.8628] Mean: 0.8628 Std: 0.0000 affordance_recognition-micro_f1 Values: [0.8628] Mean: 0.8628 Std: 0.0000 affordance_recognition-macro_precision Values: [0.8221] Mean: 0.8221 Std: 0.0000 affordance_recognition-macro_recall Values: [0.7198] Mean: 0.7198 Std: 0.0000 affordance_recognition-macro_f1 Values: [0.7574] Mean: 0.7574 Std: 0.0000 sub-activity_prediction-micro_precision Values: [0.7433] Mean: 0.7433 Std: 0.0000 sub-activity_prediction-micro_recall Values: [0.7433] Mean: 0.7433 Std: 0.0000 sub-activity_prediction-micro_f1 Values: [0.7433] Mean: 0.7433 Std: 0.0000 sub-activity_prediction-macro_precision Values: [0.7159] Mean: 0.7159 Std: 0.0000 sub-activity_prediction-macro_recall Values: [0.7707] Mean: 0.7707 Std: 0.0000 sub-activity_prediction-macro_f1 Values: [0.7344] Mean: 0.7344 Std: 0.0000 sub-activity_recognition-micro_precision Values: [0.7424] Mean: 0.7424 Std: 0.0000 sub-activity_recognition-micro_recall Values: [0.7424] Mean: 0.7424 Std: 0.0000 sub-activity_recognition-micro_f1 Values: [0.7424] Mean: 0.7424 Std: 0.0000 sub-activity_recognition-macro_precision Values: [0.7384] Mean: 0.7384 Std: 0.0000 sub-activity_recognition-macro_recall Values: [0.6583] Mean: 0.6583 Std: 0.0000 sub-activity_recognition-macro_f1 Values: [0.6839] Mean: 0.6839 Std: 0.0000

Summary F1@k results. affordance_prediction Overlap: 0.1 Values: [0.9047] Mean: 0.9047 Std: 0.0000

Overlap: 0.25
Values: [0.8561]
Mean: 0.8561	Std: 0.0000

Overlap: 0.5
Values: [0.7222]
Mean: 0.7222	Std: 0.0000

affordance_recognition Overlap: 0.1 Values: [0.9091] Mean: 0.9091 Std: 0.0000

Overlap: 0.25
Values: [0.8703]
Mean: 0.8703	Std: 0.0000

Overlap: 0.5
Values: [0.7585]
Mean: 0.7585	Std: 0.0000

sub-activity_prediction Overlap: 0.1 Values: [0.8923] Mean: 0.8923 Std: 0.0000

Overlap: 0.25
Values: [0.8506]
Mean: 0.8506	Std: 0.0000

Overlap: 0.5
Values: [0.7069]
Mean: 0.7069	Std: 0.0000

sub-activity_recognition Overlap: 0.1 Values: [0.8687] Mean: 0.8687 Std: 0.0000

Overlap: 0.25
Values: [0.8289]
Mean: 0.8289	Std: 0.0000

Overlap: 0.5
Values: [0.6869]
Mean: 0.6869	Std: 0.0000

and the parameters setting is shown as below:

hs512_e40_bs16_lr0.001_sc-None_h2h-False_h2o-True_o2h-True_o2o-True_m-v2-v1-att-v3-False-True_sd-0.1-True_os-ind_dn-1-gs_pf-e0s0_c0_sp-0_ihs-False_ios-False_al-1.0_bl-False-1.0-1.0_sl-True-False-4.0-1.0_fl0-0.0_mt-False_pt-True_gc0.0_ds3_Subject1

metadata: model_name: assign input_type: multiple parameters: add_segment_length: 0 # length of the segment to the segment-level rnn. 0 is off and 1 is on. add_time_position: 0 # absolute time position to the segment-level rnn. 0 is off and 1 is on. time_position_strategy: s # input time position to segment [s] or discrete update [u]. positional_encoding_style: e # e [embedding] or p [periodic]. attention_style: v3 # v1 [concat], v2 [dot-product], v3 [scaled_dot-product], v4 [general] bias: true cat_level_states: 0 # concatenate first and second level hidden states for predictors MLPs. 0 is off and 1 is on. discrete_networks_num_layers: 1 # depth of the state change detector MLP. discrete_optimization_strategy: gs # straight-through [st] or gumbel-sigmoid [gs] filter_discrete_updates: true #false # maxima filter for soft output of state change detector. hidden_size: 512 #2 message_humans_to_human: false #True # only meaningful for the bimanual dataset message_human_to_objects: true message_objects_to_human: true message_objects_to_object: true message_segment: true message_type: v2 # v1 [relational] or v2 [non-relational] message_granularity: v1 # v1 [generic] or v2 [specific] message_aggregation: att # mean_pooling [mp] or attention [att] object_segment_update_strategy: ind # same_as_human [sah], independent [ind], or conditional_on_human [coh] share_level_mlps: 0 # whether to share [1] or not [0] the prediction MLPs of the levels. update_segment_threshold: 0.1 #0.5 # [0.0, 1.0) optimization: batch_size: 16 #2 clip_gradient_at: 0.0 epochs: 40 #2 learning_rate: 1e-3 val_fraction: 0.1 misc: anticipation_loss_weight: 1.0 budget_loss: add: false human_weight: 1.0 object_weight: 1.0 first_level_loss_weight: 0.0 # if positive, first level does frame-level prediction impose_segmentation_pattern: 0 # 0 [no pattern], 1 [all ones] input_human_segmentation: false input_object_segmentation: false make_attention_distance_based: false # only meaningful if message_aggregation is attention multi_task_loss_learner: false pretrained: true #false # unfortunately I need two entries for the checkpoint name pretrained_path: null # specified parameters must match parameters of the pre-trained model segmentation_loss: add: true #false pretrain: false sigma: 4.0 #0.0 # Gaussian smoothing weight: 1.0 logging: root_log_dir: ${env:PWD}/outputs/${data.name}/${metadata.model_name} checkpoint_name: "hs${parameters.hidden_size}e${optimization.epochs}bs${optimization.batch_size}
lr${optimization.learning_rate}sc-${data.scaling_strategy}
h2h-${parameters.message_humans_to_human}

h2o-${parameters.message_human_to_objects}
o2h-${parameters.message_objects_to_human}

o2o-${parameters.message_objects_to_object}
m-${parameters.message_type}-${parameters.message_granularity}-${parameters.message_aggregation}
-${parameters.attention_style}-${misc.make_attention_distance_based}-${parameters.message_segment}

sd-${parameters.update_segment_threshold}-${parameters.filter_discrete_updates}
os-${parameters.object_segment_update_strategy}

dn-${parameters.discrete_networks_num_layers}-${parameters.discrete_optimization_strategy}
pf-${parameters.positional_encoding_style}${parameters.add_time_position}
${parameters.time_position_strategy}${parameters.add_segment_length}

c${parameters.cat_level_states}
sp-${misc.impose_segmentation_pattern}

ihs-${misc.input_human_segmentation}ios-${misc.input_object_segmentation}
al-${misc.anticipation_loss_weight}
bl-${misc.budget_loss.add}-${misc.budget_loss.human_weight}-${misc.budget_loss.object_weight}

sl-${misc.segmentation_loss.add}-${misc.segmentation_loss.pretrain}
-${misc.segmentation_loss.sigma}-${misc.segmentation_loss.weight}
fl${parameters.share_level_mlps}-${misc.first_level_loss_weight}

mt-${misc.multi_task_loss_learner}pt-${misc.pretrained}
gc${optimization.clip_gradient_at}ds${data.downsampling}${data.cross_validation_test_subject}" log_dir: ${logging.root_log_dir}/${logging.checkpoint_name}

Could you please tell me what parameters i set is wrong, i reproduce the performance on single v100 gpu Looking forward to your reply, thanks!

SISTMrL avatar Nov 25 '21 02:11 SISTMrL

Hi SISTMrL,

I'll check that and get back to you as soon as I can.

Regards, Romero

RomeroBarata avatar Nov 25 '21 21:11 RomeroBarata

Hi RomeroBarata, thanks!

SISTMrL avatar Nov 26 '21 03:11 SISTMrL

Hi @SISTMrL have you solved the problem?

Hi @RomeroBarata could you please just provide the original yaml? Why do we kind of need to "guess" the original config? Thank you :)

coldmanck avatar Dec 15 '21 10:12 coldmanck