human_object_interaction
human_object_interaction copied to clipboard
the reproduce performance on cad-120
hello, when i reproduce the performance on cad-120, i found my performance is lower than the trained model you released
here is my test.log
Subject1 Affordance Prediction precision recall f1-score support
movable 0.7432 0.6857 0.7133 3093
stationary 0.9241 0.9068 0.9154 18192 reachable 0.5653 0.6768 0.6161 3388 pourable 0.8354 0.8408 0.8381 471 pourto 0.9068 0.5372 0.6747 471 containable 0.8285 0.6708 0.7414 641 drinkable 0.7320 0.7774 0.7540 274 openable 0.7340 0.8104 0.7703 538 placeable 0.8115 0.7976 0.8045 2574 closeable 0.4842 0.8270 0.6108 185 cleanable 0.7322 0.9926 0.8428 135 cleaner 0.7600 0.8444 0.8000 135
accuracy 0.8337 30097
macro avg 0.7548 0.7806 0.7568 30097 weighted avg 0.8423 0.8337 0.8360 30097
Affordance Recognition precision recall f1-score support
movable 0.7447 0.8040 0.7732 3632
stationary 0.9114 0.9552 0.9328 20368 reachable 0.6744 0.6543 0.6642 2447 pourable 0.8473 0.6011 0.7033 554 pourto 0.8028 0.5217 0.6324 554 containable 0.4983 0.4639 0.4805 319 drinkable 0.9727 0.4944 0.6556 360 openable 0.8726 0.8930 0.8827 1243 placeable 0.7956 0.5303 0.6364 2033 closeable 0.8347 0.9120 0.8716 659 cleanable 0.9506 0.9371 0.9438 493 cleaner 0.9597 0.8702 0.9128 493
accuracy 0.8628 33155
macro avg 0.8221 0.7198 0.7574 33155 weighted avg 0.8607 0.8628 0.8579 33155
Sub-activity Prediction precision recall f1-score support
reaching 0.7303 0.7755 0.7522 3484
moving 0.7631 0.6850 0.7219 3470
pouring 0.8688 0.8577 0.8632 471
eating 0.3873 0.6000 0.4707 335
drinking 0.7168 0.7299 0.7233 274
opening 0.7167 0.7900 0.7515 538
placing 0.7746 0.7758 0.7752 2578
closing 0.7017 0.9027 0.7896 185
null 0.8816 0.6572 0.7531 884
cleaning 0.6176 0.9333 0.7434 135
accuracy 0.7433 12354
macro avg 0.7159 0.7707 0.7344 12354 weighted avg 0.7530 0.7433 0.7450 12354
Sub-activity Recognition precision recall f1-score support
reaching 0.6706 0.7080 0.6888 2507
moving 0.6927 0.8235 0.7524 4305
pouring 0.8387 0.5632 0.6739 554
eating 0.0000 0.0000 0.0000 272
drinking 1.0000 0.4667 0.6364 360
opening 0.8500 0.8206 0.8350 1243
placing 0.7773 0.6548 0.7108 2025
closing 0.8029 0.9211 0.8580 659
null 0.7732 0.7789 0.7761 1357
cleaning 0.9789 0.8458 0.9075 493
accuracy 0.7424 13775
macro avg 0.7384 0.6583 0.6839 13775 weighted avg 0.7390 0.7424 0.7341 13775
[email protected] metric. Affordance Prediction [email protected]: 0.9047 Affordance Recognition [email protected]: 0.9091 Sub-activity Prediction [email protected]: 0.8923 Sub-activity Recognition [email protected]: 0.8687
[email protected] metric. Affordance Prediction [email protected]: 0.8561 Affordance Recognition [email protected]: 0.8703 Sub-activity Prediction [email protected]: 0.8506 Sub-activity Recognition [email protected]: 0.8289
[email protected] metric. Affordance Prediction [email protected]: 0.7222 Affordance Recognition [email protected]: 0.7585 Sub-activity Prediction [email protected]: 0.7069 Sub-activity Recognition [email protected]: 0.6869
Summary Performance for Cross-validation. affordance_prediction-micro_precision Values: [0.8337] Mean: 0.8337 Std: 0.0000 affordance_prediction-micro_recall Values: [0.8337] Mean: 0.8337 Std: 0.0000 affordance_prediction-micro_f1 Values: [0.8337] Mean: 0.8337 Std: 0.0000 affordance_prediction-macro_precision Values: [0.7548] Mean: 0.7548 Std: 0.0000 affordance_prediction-macro_recall Values: [0.7806] Mean: 0.7806 Std: 0.0000 affordance_prediction-macro_f1 Values: [0.7568] Mean: 0.7568 Std: 0.0000 affordance_recognition-micro_precision Values: [0.8628] Mean: 0.8628 Std: 0.0000 affordance_recognition-micro_recall Values: [0.8628] Mean: 0.8628 Std: 0.0000 affordance_recognition-micro_f1 Values: [0.8628] Mean: 0.8628 Std: 0.0000 affordance_recognition-macro_precision Values: [0.8221] Mean: 0.8221 Std: 0.0000 affordance_recognition-macro_recall Values: [0.7198] Mean: 0.7198 Std: 0.0000 affordance_recognition-macro_f1 Values: [0.7574] Mean: 0.7574 Std: 0.0000 sub-activity_prediction-micro_precision Values: [0.7433] Mean: 0.7433 Std: 0.0000 sub-activity_prediction-micro_recall Values: [0.7433] Mean: 0.7433 Std: 0.0000 sub-activity_prediction-micro_f1 Values: [0.7433] Mean: 0.7433 Std: 0.0000 sub-activity_prediction-macro_precision Values: [0.7159] Mean: 0.7159 Std: 0.0000 sub-activity_prediction-macro_recall Values: [0.7707] Mean: 0.7707 Std: 0.0000 sub-activity_prediction-macro_f1 Values: [0.7344] Mean: 0.7344 Std: 0.0000 sub-activity_recognition-micro_precision Values: [0.7424] Mean: 0.7424 Std: 0.0000 sub-activity_recognition-micro_recall Values: [0.7424] Mean: 0.7424 Std: 0.0000 sub-activity_recognition-micro_f1 Values: [0.7424] Mean: 0.7424 Std: 0.0000 sub-activity_recognition-macro_precision Values: [0.7384] Mean: 0.7384 Std: 0.0000 sub-activity_recognition-macro_recall Values: [0.6583] Mean: 0.6583 Std: 0.0000 sub-activity_recognition-macro_f1 Values: [0.6839] Mean: 0.6839 Std: 0.0000
Summary F1@k results. affordance_prediction Overlap: 0.1 Values: [0.9047] Mean: 0.9047 Std: 0.0000
Overlap: 0.25
Values: [0.8561]
Mean: 0.8561 Std: 0.0000
Overlap: 0.5
Values: [0.7222]
Mean: 0.7222 Std: 0.0000
affordance_recognition Overlap: 0.1 Values: [0.9091] Mean: 0.9091 Std: 0.0000
Overlap: 0.25
Values: [0.8703]
Mean: 0.8703 Std: 0.0000
Overlap: 0.5
Values: [0.7585]
Mean: 0.7585 Std: 0.0000
sub-activity_prediction Overlap: 0.1 Values: [0.8923] Mean: 0.8923 Std: 0.0000
Overlap: 0.25
Values: [0.8506]
Mean: 0.8506 Std: 0.0000
Overlap: 0.5
Values: [0.7069]
Mean: 0.7069 Std: 0.0000
sub-activity_recognition Overlap: 0.1 Values: [0.8687] Mean: 0.8687 Std: 0.0000
Overlap: 0.25
Values: [0.8289]
Mean: 0.8289 Std: 0.0000
Overlap: 0.5
Values: [0.6869]
Mean: 0.6869 Std: 0.0000
and the parameters setting is shown as below:
hs512_e40_bs16_lr0.001_sc-None_h2h-False_h2o-True_o2h-True_o2o-True_m-v2-v1-att-v3-False-True_sd-0.1-True_os-ind_dn-1-gs_pf-e0s0_c0_sp-0_ihs-False_ios-False_al-1.0_bl-False-1.0-1.0_sl-True-False-4.0-1.0_fl0-0.0_mt-False_pt-True_gc0.0_ds3_Subject1
metadata:
model_name: assign
input_type: multiple
parameters:
add_segment_length: 0 # length of the segment to the segment-level rnn. 0 is off and 1 is on.
add_time_position: 0 # absolute time position to the segment-level rnn. 0 is off and 1 is on.
time_position_strategy: s # input time position to segment [s] or discrete update [u].
positional_encoding_style: e # e [embedding] or p [periodic].
attention_style: v3 # v1 [concat], v2 [dot-product], v3 [scaled_dot-product], v4 [general]
bias: true
cat_level_states: 0 # concatenate first and second level hidden states for predictors MLPs. 0 is off and 1 is on.
discrete_networks_num_layers: 1 # depth of the state change detector MLP.
discrete_optimization_strategy: gs # straight-through [st] or gumbel-sigmoid [gs]
filter_discrete_updates: true #false # maxima filter for soft output of state change detector.
hidden_size: 512 #2
message_humans_to_human: false #True # only meaningful for the bimanual dataset
message_human_to_objects: true
message_objects_to_human: true
message_objects_to_object: true
message_segment: true
message_type: v2 # v1 [relational] or v2 [non-relational]
message_granularity: v1 # v1 [generic] or v2 [specific]
message_aggregation: att # mean_pooling [mp] or attention [att]
object_segment_update_strategy: ind # same_as_human [sah], independent [ind], or conditional_on_human [coh]
share_level_mlps: 0 # whether to share [1] or not [0] the prediction MLPs of the levels.
update_segment_threshold: 0.1 #0.5 # [0.0, 1.0)
optimization:
batch_size: 16 #2
clip_gradient_at: 0.0
epochs: 40 #2
learning_rate: 1e-3
val_fraction: 0.1
misc:
anticipation_loss_weight: 1.0
budget_loss:
add: false
human_weight: 1.0
object_weight: 1.0
first_level_loss_weight: 0.0 # if positive, first level does frame-level prediction
impose_segmentation_pattern: 0 # 0 [no pattern], 1 [all ones]
input_human_segmentation: false
input_object_segmentation: false
make_attention_distance_based: false # only meaningful if message_aggregation is attention
multi_task_loss_learner: false
pretrained: true #false # unfortunately I need two entries for the checkpoint name
pretrained_path: null # specified parameters must match parameters of the pre-trained model
segmentation_loss:
add: true #false
pretrain: false
sigma: 4.0 #0.0 # Gaussian smoothing
weight: 1.0
logging:
root_log_dir: ${env:PWD}/outputs/${data.name}/${metadata.model_name}
checkpoint_name: "hs${parameters.hidden_size}e${optimization.epochs}bs${optimization.batch_size}
lr${optimization.learning_rate}sc-${data.scaling_strategy}
h2h-${parameters.message_humans_to_human}
h2o-${parameters.message_human_to_objects}
o2h-${parameters.message_objects_to_human}
o2o-${parameters.message_objects_to_object}
m-${parameters.message_type}-${parameters.message_granularity}-${parameters.message_aggregation}
-${parameters.attention_style}-${misc.make_attention_distance_based}-${parameters.message_segment}
sd-${parameters.update_segment_threshold}-${parameters.filter_discrete_updates}
os-${parameters.object_segment_update_strategy}
dn-${parameters.discrete_networks_num_layers}-${parameters.discrete_optimization_strategy}
pf-${parameters.positional_encoding_style}${parameters.add_time_position}
${parameters.time_position_strategy}${parameters.add_segment_length}
c${parameters.cat_level_states}
sp-${misc.impose_segmentation_pattern}
ihs-${misc.input_human_segmentation}ios-${misc.input_object_segmentation}
al-${misc.anticipation_loss_weight}
bl-${misc.budget_loss.add}-${misc.budget_loss.human_weight}-${misc.budget_loss.object_weight}
sl-${misc.segmentation_loss.add}-${misc.segmentation_loss.pretrain}
-${misc.segmentation_loss.sigma}-${misc.segmentation_loss.weight}
fl${parameters.share_level_mlps}-${misc.first_level_loss_weight}
mt-${misc.multi_task_loss_learner}pt-${misc.pretrained}
gc${optimization.clip_gradient_at}ds${data.downsampling}${data.cross_validation_test_subject}"
log_dir: ${logging.root_log_dir}/${logging.checkpoint_name}
Could you please tell me what parameters i set is wrong, i reproduce the performance on single v100 gpu Looking forward to your reply, thanks!
Hi SISTMrL,
I'll check that and get back to you as soon as I can.
Regards, Romero
Hi RomeroBarata, thanks!
Hi @SISTMrL have you solved the problem?
Hi @RomeroBarata could you please just provide the original yaml? Why do we kind of need to "guess" the original config? Thank you :)