UniVL icon indicating copy to clipboard operation
UniVL copied to clipboard

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

Results 16 UniVL issues
Sort by recently updated
recently updated
newest added

please tell me where the code for multimodal sentiment analysis is,thank you!

I want to only input text feature or video feature in UniVL. In this paper, it said that one transformer combines text representation **T** and video representation **V**. Could you...

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the...

in dataloaders/README.md ``` This file is generated from `youcookii_annotations_trainval.json`, which can be downloaded from [official webpage](http://youcook2.eecs.umich.edu/download). ``` but, i download **youcookii_annotations_trainval.tar.gz** from ![image](https://user-images.githubusercontent.com/15980746/177303125-4a6c69d5-2d89-4db0-bf9d-050d81b1a17d.png) and extract youcookii_annotations_trainval.json, then found **youcookii_annotations_trainval.json has...

I followed the steps in downloading all the necessary dependencies and data to run the code. When running the code, this error is thrown: `in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command...

Hi! From your paper and readme.md file on (https://github.com/microsoft/UniVL)/dataloaders/, I could infer that the csv file you've used differ from the original csv file. It is mentioned that 1.2M videos...

Please accept this contribution adding the standard Microsoft SECURITY.MD :lock: file to help the community understand the security policy and how to safely report security issues. GitHub uses the presence...

Convert the input list of arrays to a numpy array, and negate it for further computation - code throws error otherwise.

Hello, I am trying to run your code but I keep running into issues with the distributed learning. Is it possible to run without this?

Hi, Impressive work! I want to ask how to extract features from my own video-text datasets for finetuning model?