pytorch-vsumm-reinforce icon indicating copy to clipboard operation
pytorch-vsumm-reinforce copied to clipboard

How did user_summary (binary vectors) generated?

Open leon20121005 opened this issue 4 years ago • 7 comments

Hi, as the title, there's a key called user_summary in dataset eccv16_dataset_tvsum_google_pool5.h5.

I am wondering how to convert 20 annotations, originally provided in TVSum, into that 20 binary vectors?

Thank you.

leon20121005 avatar Dec 22 '19 00:12 leon20121005

@leon20121005

Hi~!! Could you show 20 binary vectors?

SinDongHwan avatar Dec 22 '19 04:12 SinDongHwan

Yes, for example, I printed 20 binary vectors for video 10 with the following code:

target = h5py.File('eccv16_dataset_tvsum_google_pool5.h5', 'r')['video_10']['user_summary']
for index in range(len(target)):
    each_target = target[index]
    print(each_target)
    unique, counts = numpy.unique(each_target, return_counts = True)
    print(dict(zip(unique, counts)))

And the result will be:

[0. 0. 0. ... 0. 0. 0.]
{0.0: 3443, 1.0: 552}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3425, 1.0: 570}
[0. 0. 0. ... 1. 1. 1.]
{0.0: 3398, 1.0: 597}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3409, 1.0: 586}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3402, 1.0: 593}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3416, 1.0: 579}
[0. 0. 0. ... 1. 1. 1.]
{0.0: 3398, 1.0: 597}
[0. 0. 0. ... 1. 1. 1.]
{0.0: 3402, 1.0: 593}
[0. 0. 0. ... 1. 1. 1.]
{0.0: 3397, 1.0: 598}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3418, 1.0: 577}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3396, 1.0: 599}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3443, 1.0: 552}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3410, 1.0: 585}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3406, 1.0: 589}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3431, 1.0: 564}
[0. 0. 0. ... 0. 0. 0.]
{0.0: 3413, 1.0: 582}
[0. 0. 0. ... 1. 1. 1.]
{0.0: 3396, 1.0: 599}
[0. 0. 0. ... 1. 1. 1.]
{0.0: 3414, 1.0: 581}
[0. 0. 0. ... 1. 1. 1.]
{0.0: 3414, 1.0: 581}
[0. 0. 0. ... 1. 1. 1.]
{0.0: 3397, 1.0: 598}

The first row is a binary vector, and the second one is the value counts for it.

leon20121005 avatar Dec 22 '19 22:12 leon20121005

@leon20121005 Hi~!!

"eccv16_dataset_tvsum_google_pool5.h5', 'r')['video_10']['user_summary']" is 20 binary vector.

[0. 0. 0. ... 0. 0. 0.] {0.0: 3443, 1.0: 552}

number of '0.0' is 3443. number of '1.0' is 552.

index having '1' is ground truth in [0. 0. 0. ... 0. 0. 0.].

why convert to 20 binary?

SinDongHwan avatar Dec 23 '19 15:12 SinDongHwan

@SinDongHwan

Hi, what I mean is, the original user annotations from TVSum is {1, 2, 3, 4, 5} for each frames. And I found that this dataset is {0, 1} for each frames. So there's might be a threshold of something to convert it?

leon20121005 avatar Dec 23 '19 22:12 leon20121005

@leon20121005

Hi, i got it!! i think {0,1} is "user_summary", {1,2,3,4,5} is "user_score". i don't know how to convert. so, you need to ask converting way to @KaiyangZhou. i guess threshold value of average about each frames is 0.5.

SinDongHwan avatar Dec 24 '19 01:12 SinDongHwan

Please refer https://github.com/anaghazachariah/video_summary_generaton.You can also refer my repo.I had implemented the project https://github.com/anaghazachariah/video_summary_generaton

anaghazachariah avatar Sep 16 '20 06:09 anaghazachariah

refer: Zhang, K.; Chao, W.-L.; Sha, F.; and Grauman, K. 2016b. Video summarization with long shortterm memory. In ECCV,766–782. Springer.

wss321 avatar Mar 25 '21 07:03 wss321