CMU-MultimodalSDK
CMU-MultimodalSDK copied to clipboard
Implementing missing modality during test time
Hi Amir,
I have read about Co-learning where we can train model on 3 modalities however at test time we can use only one modality. I am struggling to understand how this will be implemented in a code. Once we train a model with 3 modalities, it will expect 3 modalities at test time. Do we need to handle this scenario by passing zero or some random values for modalities to be dropped. Please help, also any sample implementation of the same. Thanks a lot for all your support.
Hi @AnilRahate,
Yes, you train via 3 modalities and you pass zeros as the missing ones during test time. The whole idea here is we would expect the model to crash and burn since zeros will most likely cause instabilities. However, this does not happen, and the models actually outperform the scenario where only the train/test available modalities are used to train. The moral of the story is this:
The available (train/test) modalities with help from the parameters of the network can learn the dynamics of another modality to a certain theoretical bound explained in the paper.
An example: say we have no information regarding the meaning of the words "good" and "helpful". The word "good " is 90% of the time in a positive sentence in our dataset, but the word "helpful" happens identically between negative and positive in our dataset. Assume the two words don't occur in simialr sentences either. As far as an i.i.d approach would go, "good" is a positive word and "helpful" is a neutral word. However, say during training we observe a unique link between the two: The speakers smile when uttering the word "good" and "helpful". Therefore, now the two words are linked via the smile. The physical location of this link would be in the parameters of the network. The network may end up bringing the two representations closer together.
Hi @A2Zadeh, In the co-learning paper, IMDB data was used during test time. IMDB dataset, I came across is having text sample length of more than 100 words. However, MOSI dataset text length is 20 or 50 from SDK kit. I would like to know length of text sample used for IMDB for the accuracies reported in the Co-learning paper.