meshed-memory-transformer
meshed-memory-transformer copied to clipboard
Can captions be generated for new images out from the coco dataset
Hi,
Can captions be generated for new images out from the coco dataset?
Say for example, i want to generate caption on my profile pic, is it possible with this code?
As i dont see documentation being much of an help, to pass image directly to test.py
Regards, Vinod
The first step to achieving that is to do a feature extraction on the image with the following format :
| Field Name | Field Info |
|---|---|
| boxes | An array of shape (N, 4) containing the [Top, Left, Bottom, Right] boundaries for all bounding boxes |
| cls_prob | An array of shape (N, 1601) containing the class probabilities of all bounding boxes |
| features | An array of shape (N, 2048) array containing the features of all bounding boxes |
I am trying to perform the same task as @Vinod-13 and I generated the three field infos (boxes, cls_prob, and features). However I couldn't find a way to feed these infos to the model. should I prepare it in .hdf5 file format just like coco_detections.hdf5 file? And @beecadox, if this is the case could you tell me how to create that or could you provide me with the script to create the .hdf5 file?
Hi,
Can captions be generated for new images out from the coco dataset?
Say for example, i want to generate caption on my profile pic, is it possible with this code?
As i dont see documentation being much of an help, to pass image directly to test.py
Regards, Vinod
did you managed to generate caption for new image?
@DesaleF you can use h5py library to do so. Once you have your arrays for each image you can run something like
filename_to_write = "detections.hdf5"
with h5py.File(filename_to_write, "a") as data_file:
data_file.create_dataset("boxes", data=boxes)
data_file.create_dataset("cls_prob", data=cls_prob)
data_file.create_dataset("features", data=features
where boxes, cls_prob and features are your three arrays corresponding to an image as described by @beecadox.
You then need to modify for sure the code of data/field.py/ImageDetectionsField/preprocess to make it draw from your file format and not from theirs. Then it's probably enough to follow their code in test.py/predict_captions where you want to return the generated sentence instead of the scores.
Can anyone of you share the full code to do so? I'm quite loss and dumb.
hi,I understand that image_id and img_boxes are extracted from Faster R-CNN(with ResNet101).
But, I was wondering how you extracted the cls_prob and what it means. @DesaleF can you help me? could you provide me with the script to extract the cls_prob ?