fastrtext
fastrtext copied to clipboard
This application has requested the Runtime to terminate it
I am running the unsupervised learning algorithm on
- 350k Mails and 55 Categories
execute(
commands = c("supervised", "-input", train_tmp_file_txt, "-output", tmp_file_model, "-dim", 100, "-lr", 1, "-epoch", 20, "-wordNgrams", 2, "-verbose", 1)
)
Its difficult to make a reproducible example, since data is confidential. I also made a test with dummy data, where the code did not fail with over 2m "text elements". But i am very willing to do my best, to make it more reproducible if possible.
Issue: When running the execute part of the code Rstudio crashes with:
This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. terminate called recursively
Is this a known error?
What i tried:
- running with smaller sample size. 50 categories still works. So it seems size dependent. (Transfer to a larger machine is difficult for me).
- researching the error log. It seems to happen fpr wrong variable names: https://stackoverflow.com/questions/38970756/r-3-3-1-crashing-with-dplyr, or error in data.table package: https://stackoverflow.com/questions/8616862/r-terminates-with-error-message-this-application-has-requested-the-runtime-to-t. As i am "just feeding more data" it does not seem related.
Maybe there is a way to test the code easily outside R.
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.2 tools_3.4.2 packrat_0.5.0
Hardware info:
I had sometimes strange behaviour, each time not because of dataset size but few lines with encoding errors. Have tried to segment your dataset in parts and check each of them?
Thank you so much for the fast response. I have made further analysis on the error:
- Yes, i chunked the data in two parts: Chunk 1: Categories 1:50, Chunk 2: 35:85 (total categories 85). Then it works. So it seems that all data is processed successfully.
- Therefore, It looked to me like a hardware bottleneck. I tried to monitor CPU and memory usage. From 12-13 categories on the CPU usage fastly increases to 100%. Memory usage seems to be fine. So its hard to tell for me if the CPU is the bottleneck.
- Tomorrow i will get access to two additional machines which might have more power.
Is it easy to extract the c++ code that is run? Then i could try to run without Rstudio and exclude it as a potential error source?