Carmen Heger

Results 10 comments of Carmen Heger

One way to "easily" get multilingual data is to machine-translate. `pip install googletrans` (and then use `Translator(service_urls=["translate.google.com/gen204"])`) These are older Google Translate Versions, and worse quality than prod, but it's...

Multilingual resource can also easily be found using linguee and checking the sources of the found sentences in the language pairs, e.g. for DE: https://www.linguee.com/english-german/search?source=auto&query=coronavirus

For fun, just added simple BLEU scoring to the results: https://public-mlflow.deepset.ai/#/experiments/55/runs/be5705cb1ddb4326a10f262732f5bd96 (BLEU is ngram-based (1-4grams) comparison between strings, it can be done on sentence level but with counting +1 and...

There is a possibility to use PPDB to generate additional paraphrased questions: http://paraphrase.org/#/download

Yes, that's an option. Query translation quality could suffer though from short lengths. I'm currently exploring translation quality. Thanks!

The `googletrans` lib does not work reliably, so I made a free trial account on MS Azure, also because they offer up to 2M characters of translation for free per...

And the MS translator: https://github.com/stedomedo/COVID-QA/blob/auto_translators/data/translators/ms_translate.py MS Translator is supposed to be quite good for Arabic. For other languages, Google or DeepL are better options (afaik they don't offer free credits)...

@tholor @Timoeller I have a question on the (desired) search workflow. Is it: user query -> match query to question with BERT -> search with elastic (tfidf, bm25) ? So...

One idea for "simple" transfer learning: In Machine Translation [this technique]( https://www.aclweb.org/anthology/W18-6325/) is commonly used when you have a low resource language. Basically, you build a model for language Y...