now
now copied to clipboard
feat: add audio-text search
Audio-Text Search
This feature adds audio as a new modality to jina now. It introduces an audio-text bi-modal search scenario, showcased using one demo dataset with music data. Further audio case might be added later using, e.g. environmental sounds which also have good support by pre-trained models (e.g. audio-clip).
The following is a running list of required changes to realize the audio-case in jina now:
- [ ] add case hierarchy (top-level is modality-combination, second-level is the specific data set) to the cli dialog (#107)
- [x] prepare dataset for music-text demo and upload to storage
- [x] prepare custom executor for the demo case and push to hub
- [x] update data loading logic to work with new dataset (#120)
- [ ] update fine-tuning logic to work with audio data
- [ ] update frontend app to work with the audio-text data (user can search with text or chose pre-selected songs to search similar to the existing image case)