helo-word
helo-word copied to clipboard
Domain specific corpus
Hi Could explain a way to incorporate domain specific corpus to train the model? My work involves identifying n-grams prevalent in medical texts, such as "sudden infant death syndrome" which appears only across handful instances in the corpus files. Are there any scripts we can tweak to include files and how? Or otherwise, can the current model perform across domains?
Hi Could explain a way to incorporate domain specific corpus to train the model? My work involves identifying n-grams prevalent in medical texts, such as "sudden infant death syndrome" which appears only across handful instances in the corpus files. Are there any scripts we can tweak to include files and how? Or otherwise, can the current model perform across domains?
I am having the same problem. Did you solve the issue?