erlexec
erlexec copied to clipboard
feat: combine embeddings
Right now it is painful to combine embeddings from different nesting levels and set them at the top level, especially until https://github.com/jina-ai/docarray/issues/461 is solved.
So this should do the following: It takes an access path (or list of access paths?) (e.g. @.[image, main_text]), and combines the embeddings in those docs using one of 'sum', 'mean', 'concat', or a provided model.
It should handle the following cases:
- [x] numpy embedding / predefined combiner
- [x] test
- [x] torch embedding / predefined combiner
- [x] test
- [ ] tf embedding / predefined combiner
- [ ] test
- [ ] paddle embedding / predefined combiner
- [ ] test
- [x] torch embedding model
- [ ] test
- [ ] tf embedding model
- [ ] test
- [ ] paddle embedding model
- [ ] test
- [ ] onnx embedding model
- [ ] test
- [x] numpy embedding / callable
- [ ] test
- [x] torch embedding / callable
- [ ] test
- [x] tf embedding / callable
- [ ] test
- [x] paddle embedding / callable
- [ ] test
Other ToDos:
- [ ] docs
- [ ] rename the method:
fuse_embeddings()? - [ ] refactor. Currently the code is quite ugly
- [ ] refactor examples in the docs
Possible follow-up PRs:
- [ ] enable a flag
uniform_nestingwhich tells us that every doc in the da has the same number of relevant chunks. This would allow us to vectorize the combine operation - [ ] implement to_numpy
- [ ] implement flag to discard chunk embeddings after root embedding has been set
Closes #512