JFastText icon indicating copy to clipboard operation
JFastText copied to clipboard

why JFastText allowed only model trained with JFastext?

Open ali3assi opened this issue 7 years ago • 25 comments

Hello,

How can read a pretrained model? I try to load the preexisting files .vec and .bin, but the load model raises an excpetion. Its looks like the format incompatible and JFastText allowed only model trained with JFastext.

ali3assi avatar Nov 24 '17 17:11 ali3assi

You can upgrade the fastText within the cpp folder to the released version. Then run mvn clean install. The compiled jar package with dependency will be compatible with newer pre-trained models.

lidalei avatar Jan 05 '18 08:01 lidalei

@lidalei I got errors in upgrade fastText as below. Could you check this in your convenience time? Thanks.

In file included from /Users/xichen/Desktop/NLP project/JFastText/target/classes/com/github/jfasttext/jniFastTextWrapper.cpp:102: In file included from /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fasttext_wrapper_javacpp.h:13: /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fasttext_wrapper.cc:83:18: warning: 'getVector' is deprecated: getVector is being deprecated and replaced by getWordVector. [-Wdeprecated-declarations] fastText.getVector(vec, word); ^ /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fastText/src/fasttext.h:63:3: note: 'getVector' has been explicitly marked deprecated here FASTTEXT_DEPRECATED( ^ /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fastText/src/utils.h:15:50: note: expanded from macro 'FASTTEXT_DEPRECATED'

define FASTTEXT_DEPRECATED(msg) attribute((deprecated(msg)))

                                             ^

In file included from /Users/xichen/Desktop/NLP project/JFastText/target/classes/com/github/jfasttext/jniFastTextWrapper.cpp:102: In file included from /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fasttext_wrapper_javacpp.h:13: /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fasttext_wrapper.cc:84:38: error: 'data_' is a protected member of 'fasttext::Vector' return std::vector(vec.data_, vec.data_ + vec.m_); ^ /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fastText/src/vector.h:26:23: note: declared protected here std::vector data_; ^ In file included from /Users/xichen/Desktop/NLP project/JFastText/target/classes/com/github/jfasttext/jniFastTextWrapper.cpp:102: In file included from /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fasttext_wrapper_javacpp.h:13: /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fasttext_wrapper.cc:84:49: error: 'data_' is a protected member of 'fasttext::Vector' return std::vector(vec.data_, vec.data_ + vec.m_); ^ /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fastText/src/vector.h:26:23: note: declared protected here std::vector data_; ^ In file included from /Users/xichen/Desktop/NLP project/JFastText/target/classes/com/github/jfasttext/jniFastTextWrapper.cpp:102: In file included from /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fasttext_wrapper_javacpp.h:13: /Users/xichen/Desktop/NLP project/JFastText/src/main/java/../cpp/fasttext_wrapper.cc:84:61: error: no member named 'm_' in 'fasttext::Vector' return std::vector(vec.data_, vec.data_ + vec.m_);

xikunlun001 avatar Feb 02 '18 18:02 xikunlun001

'getVector' is deprecated: getVector is being deprecated and replaced by getWordVector. Besides, class Vector was rewritten. You cannot access data_ or m_ member of a vector. Instead, you have to use vector.data() and vector.size(). I'd suggest have a look at my fork https://github.com/lidalei/JFastText

lidalei avatar Feb 02 '18 21:02 lidalei

@lidalei Thanks. I just used the code in your fork but got the following error in loadModel as following, could you have a look:

Exception in thread "main" java.lang.UnsatisfiedLinkError: com.github.jfasttext.FastTextWrapper$FastTextApi.checkModel(Ljava/lang/String;)Z at com.github.jfasttext.FastTextWrapper$FastTextApi.checkModel(Native Method) at com.github.jfasttext.JFastText.loadModel(JFastText.java:29) at com.github.jfasttext.JFastText.main(JFastText.java:203)

xikunlun001 avatar Feb 03 '18 00:02 xikunlun001

Could you release you code?

lidalei avatar Feb 03 '18 10:02 lidalei

Hello Sir @lidalei I just install your code : https://github.com/lidalei/JFastText

I take the generated two jar JFastText/target/ and added them to buildinf path in eclipse.

In my testDriver method i declared:

import com.github.jfasttext.JFastText;

public class TestDriver {

	public static void main(String[]args){
		JFastText jft = new JFastText();
		
		
	}
}

So, runing the code i get the follwing exception:

Exception in thread "main" java.lang.UnsatisfiedLinkError: no jniFastTextWrapper in java.library.path
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
	at java.lang.Runtime.loadLibrary0(Runtime.java:870)
	at java.lang.System.loadLibrary(System.java:1122)
	at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1191)
	at org.bytedeco.javacpp.Loader.load(Loader.java:953)
	at org.bytedeco.javacpp.Loader.load(Loader.java:854)
	at com.github.jfasttext.FastTextWrapper.<clinit>(FastTextWrapper.java:11)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.bytedeco.javacpp.Loader.load(Loader.java:913)
	at org.bytedeco.javacpp.Loader.load(Loader.java:854)
	at com.github.jfasttext.FastTextWrapper$FastTextApi.<clinit>(FastTextWrapper.java:442)
	at com.github.jfasttext.JFastText.<init>(JFastText.java:23)
	at TestDriver.main(TestDriver.java:6)

Any idea how to solve this issue please?

ali3assi avatar Apr 13 '18 15:04 ali3assi

You should merely use jfasttext-0.1.0-jar-with-dependencies.jar, which can be generated by running mvn clean install.

lidalei avatar Apr 13 '18 17:04 lidalei

Btw, you should clone the subfolder 'src/main/cpp/fastText' to compile a native library. @TamouzeAssi @xikunlun001 https://github.com/lidalei/fastText

lidalei avatar Apr 13 '18 17:04 lidalei

Unfortunately, it is not working under windows

ali3assi avatar Apr 13 '18 21:04 ali3assi

The problem still existing. We try to load pre-trained model, When we read this model by jft.loadModel(path/to/pretarined_model) we get the following exception

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Model file's format is not compatible with this JFastText version!

Note that we get the the new fork of JFastText then we delete the file in src/cpp/ fasttext and clone agian this fasttext then run mvn clean install.

Any idea on solving this problem

ali3assi avatar Apr 13 '18 23:04 ali3assi

@TamouzeAssi Model file's format is not compatible with this JFastText version means you should train your model with the corresponding fastText or JFastText. Don't use pip to install fasttext which is not official. Follow this to install Python binding, https://github.com/facebookresearch/fastText/tree/master/python.

lidalei avatar Apr 15 '18 11:04 lidalei

@lidalei Thank you first for your cooperation. I will try to clone the fastext in your mentionned link to the subfolder cpp in JFastext please correct me if im wrong.

I want to use the pretrained model bi the library fasttext like wiki.en. So this model trained by fastext which is different from JFasttext. I dont want to train again due to several reason. Thank you

ali3assi avatar Apr 15 '18 14:04 ali3assi

You can download word embeddings from https://fasttext.cc/docs/en/pretrained-vectors.html. I haven't tried but believe they work. JFastText relies on fastText. If JFastText complains, it means the model was trained with a non-compatible version fastTex with the fastText JFastText is using.

lidalei avatar Apr 15 '18 21:04 lidalei

@lidalei Sorry but still not working. The same exception is raised when i try to load word embeddings from l.

I clone your fork for JFastext then delete the folder cpp/fastText and clone again this file from where you said and then mvn clean install. and the exception still existing.

Can you please descrive the step or try to load a pretained model using JFastText?

ali3assi avatar Apr 18 '18 07:04 ali3assi

@TamouzeAssi I guess you were trying to load a word embedding. It cannot! Try to load a model from https://fasttext.cc/docs/en/language-identification.html.

lidalei avatar Apr 18 '18 16:04 lidalei

@TamouzeAssi If it did not work, try to use python interface of fastText to load your model.

lidalei avatar Apr 18 '18 19:04 lidalei

@lidalei: the model lid.176.bin from https://fasttext.cc/docs/en/language-identification.html can be loaded in JFastext without any error.

But withe wiki.en from https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md the model generate the incompatible file format.

By the way do you have any good reference to a model learned on wikipedia. Im looking to use the vector embedding to can cover the OOV. please

ali3assi avatar Apr 19 '18 14:04 ali3assi

@TamouzeAssi There is no problem with this. pretrained vectors are just a word embedding that represents a word as a vector. It does not do any classification task. A classifier is built on the word embedding. For example, you can represent a sentence as mean vector of its words's vectors and train a classifier to classify an unknown sentence.

I don't have a model for you. It really depends on your task. What do you want to achieve?

lidalei avatar Apr 19 '18 15:04 lidalei

@lidalei for the moment i want just to compute the similarity between two sentences where some noise exists (miss typo). So, i used word2vec but i get bad result due to OOV so i go to use fastText to can use the subword information.

ali3assi avatar Apr 19 '18 15:04 ali3assi

@TamouzeAssi Have a look at https://radimrehurek.com/gensim/models/word2vec.html#module-gensim.models.word2vec

lidalei avatar Apr 19 '18 15:04 lidalei

@lidalei i was used gensim to get the word2vec model but i developped my algo in java, and gensim can be used with java even we use jython language.

ali3assi avatar Apr 19 '18 15:04 ali3assi

@TamouzeAssi I will add the function to my JfastText repo and tell you as soon as I complete.

lidalei avatar Apr 19 '18 15:04 lidalei

@TamouzeAssi It won't help you soon. I'd suggest you check

void FastText::loadVectors(std::string filename) {
  std::ifstream in(filename);
  std::vector<std::string> words;
  std::shared_ptr<Matrix> mat; // temp. matrix for pretrained vectors
  int64_t n, dim;
  if (!in.is_open()) {
    throw std::invalid_argument(filename + " cannot be opened for loading!");
  }
  in >> n >> dim;
  if (dim != args_->dim) {
    throw std::invalid_argument(
        "Dimension of pretrained vectors (" + std::to_string(dim) +
        ") does not match dimension (" + std::to_string(args_->dim) + ")!");
  }
  mat = std::make_shared<Matrix>(n, dim);
  for (size_t i = 0; i < n; i++) {
    std::string word;
    in >> word;
    words.push_back(word);
    dict_->add(word);
    for (size_t j = 0; j < dim; j++) {
      in >> mat->at(i, j);
    }
  }
  in.close();

  dict_->threshold(1, 0);
  input_ = std::make_shared<Matrix>(dict_->nwords()+args_->bucket, args_->dim);
  input_->uniform(1.0 / args_->dim);

  for (size_t i = 0; i < n; i++) {
    int32_t idx = dict_->getId(words[i]);
    if (idx < 0 || idx >= dict_->nwords()) continue;
    for (size_t j = 0; j < dim; j++) {
      input_->at(idx, j) = mat->at(i, j);
    }
  }
}

and write some Java code to read pretrained vectors.

lidalei avatar Apr 19 '18 15:04 lidalei

@lidalei Thank you i will try to write similar code. By the way let me know when you add the function to your JFastText repo please

ali3assi avatar Apr 19 '18 15:04 ali3assi

val fasttext = new JFastText() fasttext.loadModel("/home/work/XX/model/model.bin")

java.lang.IllegalArgumentException: Model file doesn't exist!

renzherl avatar Nov 20 '18 09:11 renzherl