multimodal-deep-learning
multimodal-deep-learning copied to clipboard
modality issue
Can this model be executed using only audio and video modes? If so, how effective is it?