EddieKro
EddieKro
Thanks for the reply, will be waiting for the weights!
I got all the models by running: `path/to/azcopy copy https://biglmdiag.blob.core.windows.net/vinvl/model_ckpts/* --resursive` nocaps models are located in `model_ckpts/image_captioning/` but it is surely not a good way to download things
It's probably VinVL, not VIVO
I've managed to run inference on custom images by extracting 2048 feature vector for each bbox, and then concatenating to it coordinates of a box divided by width and height...
@liutianling it's quite a process) 1. Extract image features for a folder of images using sg_benchmark as described [here](https://github.com/microsoft/scene_graph_benchmark/issues/7#issuecomment-819357369) (you'll have to create some `.tsv` and `.lineindex` files first and...