prompt2model Support for non-text modalities (images, speech, video)

Support for non-text modalities (images, speech, video)

Open neubig opened this issue 1 year ago • 4 comments

Currently prompt2model is limited to text input text output tasks. The underlying framework can certainly handle different modalities, and it would be great to see prompt2model be able to handle different types of tasks as well (such as image classification/generation, speech tasks, etc.).

But we'll probably need to think about several things such as:

How are we picking appropriate base models and datasets for the modality
What do we do about dataset generation?
In the case of non-text output, how do we adjust our evaluation?

We can start discussing the necessary steps on this issue, and implement the necessary pieces bit-by-bit. We'd be happy for contributions!

Sep 02 '23 16:09 neubig

prompt2model prompt2model copied to clipboard

Support for non-text modalities (images, speech, video)

prompt2model
prompt2model copied to clipboard