meshed-memory-transformer Can captions be generated for new images out from the coco dataset

Hi,

Can captions be generated for new images out from the coco dataset?

Say for example, i want to generate caption on my profile pic, is it possible with this code?

As i dont see documentation being much of an help, to pass image directly to test.py

Regards, Vinod

Sep 03 '20 15:09 vinod-13

The first step to achieving that is to do a feature extraction on the image with the following format :

Field Name	Field Info
boxes	An array of shape (N, 4) containing the [Top, Left, Bottom, Right] boundaries for all bounding boxes
cls_prob	An array of shape (N, 1601) containing the class probabilities of all bounding boxes
features	An array of shape (N, 2048) array containing the features of all bounding boxes

Feb 18 '21 12:02 beecadox

I am trying to perform the same task as @Vinod-13 and I generated the three field infos (boxes, cls_prob, and features). However I couldn't find a way to feed these infos to the model. should I prepare it in .hdf5 file format just like coco_detections.hdf5 file? And @beecadox, if this is the case could you tell me how to create that or could you provide me with the script to create the .hdf5 file?

Mar 02 '21 10:03 DesaleF

Hi,

Can captions be generated for new images out from the coco dataset?

Say for example, i want to generate caption on my profile pic, is it possible with this code?

As i dont see documentation being much of an help, to pass image directly to test.py

Regards, Vinod

did you managed to generate caption for new image?

Mar 02 '21 10:03 DesaleF

@DesaleF you can use h5py library to do so. Once you have your arrays for each image you can run something like

filename_to_write = "detections.hdf5"

with h5py.File(filename_to_write, "a") as data_file:
    data_file.create_dataset("boxes", data=boxes)
    data_file.create_dataset("cls_prob", data=cls_prob)
    data_file.create_dataset("features", data=features

where boxes, cls_prob and features are your three arrays corresponding to an image as described by @beecadox.

You then need to modify for sure the code of data/field.py/ImageDetectionsField/preprocess to make it draw from your file format and not from theirs. Then it's probably enough to follow their code in test.py/predict_captions where you want to return the generated sentence instead of the scores.

May 14 '21 08:05 eugeniotonanzi

Can anyone of you share the full code to do so? I'm quite loss and dumb.

May 20 '21 01:05 dinhanhx

hi,I understand that image_id and img_boxes are extracted from Faster R-CNN(with ResNet101).

But, I was wondering how you extracted the cls_prob and what it means. @DesaleF can you help me? could you provide me with the script to extract the cls_prob ?

Mar 15 '22 06:03 cxy990729

meshed-memory-transformer meshed-memory-transformer copied to clipboard

Can captions be generated for new images out from the coco dataset

meshed-memory-transformer
meshed-memory-transformer copied to clipboard