Reimplement NN ensemble using Pytorch instead of TensorFlow

Open osma opened this issue 3 months ago • 1 comments

The current NN ensemble backend has been implemented using TensorFlow. It is the only part of the Annif codebase that depends on TensorFlow. I think it would make sense to try to get rid of the TensorFlow dependency. That means reimplementing the NN ensemble backend using Pytorch.

Reasons:

Pytorch and TensorFlow provide very similar functionality. Both are quite large libraries (hundreds of megabytes for a CPU-only variant; several GB for CUDA or ROCm variants). I'm not sure if they can be used at the same time from the same Python process. Using just one of them would make things easier.
The proposed PECOS / X(R)-Transformer backend, in PR #798, is based on Pytorch. We are also looking for a new implementation of fastText (see #795) and one of the candidates is built on Pytorch. Also, DNB is working on a new Embedding Based Matching backend (see #855) which uses Pytorch. So it looks like Annif will soon need to depend on Pytorch anyway.

Sep 16 '25 13:09 osma

I'm not sure if they can be used at the same time from the same Python process.

As it happens, they cannot. I encountered this when trying to train an nn_ensemble backend that was using an xtransformer source. I'm afraid I don't have the error presently, only a note that it referenced "daemon child process".

+1 for this feature

Sep 25 '25 19:09 mjsuhonos