wiseowl icon indicating copy to clipboard operation
wiseowl copied to clipboard

Arabic support

Open mzeidhassan opened this issue 7 years ago • 6 comments

First, let me thank you for such great project. It really looks promising.

I see that you use StanfordCoreNLP which apparently supports Arabic. Does this mean that WiseOwl can handle Arabic Q&A? If yes, can you please let me know what is needed it to make it work for Arabic?

Thanks again!

mzeidhassan avatar Mar 09 '17 03:03 mzeidhassan

I am glad that you liked it.

We are currently using Stanford English models only. It is possible to port it to support Arabic Language. But it would require few models to be trained. Including Stanford Models for Arabic you can use one provided by Stanford.. You will have to train a model for answer type classification using Apache openNLP (MaxEnt). I am not sure if solr is able to index arabic text.

asmehra95 avatar Mar 09 '17 16:03 asmehra95

Thank you so much for your reply. Solr can index Arabic without problems. Is there any guide or tutorial on how to train a model using OpenNLP? That would make things easier for me.

Thanks again for sharing your great project with us.

mzeidhassan avatar Mar 10 '17 04:03 mzeidhassan

I suggest you start from the documentation of OpenNLP at: https://opennlp.apache.org/documentation/1.7.2/manual/opennlp.html Focus on training part of the code it should be easy to get. Your major task would be to find a dataset of questions with corresponding answer types. We used a very simple version taken from Taming Text. You can find out more about it at chapter 8 of taming text.
If you are not able to find the corpus, you can generate your own but make sure you have enough questions so that it may perform well.

asmehra95 avatar Mar 10 '17 09:03 asmehra95

Thanks a million for your support and for guiding me to the right direction. I appreciate it.

I will try to get the dataset first for Arabic and see how it goes.

Thanks again!

mzeidhassan avatar Mar 12 '17 21:03 mzeidhassan

Your welcome! Let me know if you need further help and tell me you are able to obtain a dataset for it.

asmehra95 avatar Mar 13 '17 17:03 asmehra95

Hi @asmehra95 : When I ran this code this.numDocuments = (int) dfCounter.getCount("__ all__"); It always return 0, is that correct ?

Thanks.

hohuynh avatar Dec 27 '17 15:12 hohuynh