JFastText icon indicating copy to clipboard operation
JFastText copied to clipboard

Add Java binding for `getSentenceVector`

Open bxshi opened this issue 6 years ago • 3 comments

This PR adds Java binding for the getSentenceVector. This method can return subword-based embeddings for OOV words. Comparing to getWordVector, even if the input for getSentenceVector is OOV, it still can compute the embeddings based on in-vocab subwords.

I also modified the test cases slightly to test output embeddings for OOV words.

This is a useful method in my use case, so I'm submitting a PR in case others also want this. Feel free to comment. Thanks!

bxshi avatar Jan 23 '19 05:01 bxshi

I am trying to merge all the pull requests on my fork. @bx If you like, please add a pull request there. However, I noticed that a check has failed.

carschno avatar Jul 23 '19 14:07 carschno

Hi @carschno, base on the CI error it seems that the CI environment trying to use java8 whereas the system only supports 9 to 13.

Installing oraclejdk8
$ export JAVA_HOME=~/oraclejdk8
$ export PATH="$JAVA_HOME/bin:$PATH"
$ ~/bin/install-jdk.sh --target "/Users/travis/oraclejdk8" --workspace "/Users/travis/.cache/install-jdk" --feature "8" --license "BCL"
install-jdk.sh 2019-01-18 II
Expected feature release number in range of 9 to 13, but got: 8
The command "~/bin/install-jdk.sh --target "/Users/travis/oraclejdk8" --workspace "/Users/travis/.cache/install-jdk" --feature "8" --license "BCL"" failed and exited with 3 during .

bxshi avatar Jul 23 '19 16:07 bxshi

Hi @carschno, after updating the java version in the TravisCI config, it passes all the tests. Thank you!

bxshi avatar Jul 23 '19 16:07 bxshi