fastrtext This application has requested the Runtime to terminate it

I am running the unsupervised learning algorithm on

350k Mails and 55 Categories

  execute(
    commands = c("supervised", "-input", train_tmp_file_txt, "-output", tmp_file_model, "-dim", 100, "-lr", 1, "-epoch", 20, "-wordNgrams", 2, "-verbose", 1)
  )

Its difficult to make a reproducible example, since data is confidential. I also made a test with dummy data, where the code did not fail with over 2m "text elements". But i am very willing to do my best, to make it more reproducible if possible.

Issue: When running the execute part of the code Rstudio crashes with:

This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. terminate called recursively

Is this a known error?

What i tried:

running with smaller sample size. 50 categories still works. So it seems size dependent. (Transfer to a larger machine is difficult for me).
researching the error log. It seems to happen fpr wrong variable names: https://stackoverflow.com/questions/38970756/r-3-3-1-crashing-with-dplyr, or error in data.table package: https://stackoverflow.com/questions/8616862/r-terminates-with-error-message-this-application-has-requested-the-runtime-to-t. As i am "just feeding more data" it does not seem related.

Maybe there is a way to test the code easily outside R.




> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.2 tools_3.4.2    packrat_0.5.0

Hardware info:

grafik

Dec 11 '19 10:12 Toniiiio

I had sometimes strange behaviour, each time not because of dataset size but few lines with encoding errors. Have tried to segment your dataset in parts and check each of them?

Dec 11 '19 20:12 pommedeterresautee

Thank you so much for the fast response. I have made further analysis on the error:

Yes, i chunked the data in two parts: Chunk 1: Categories 1:50, Chunk 2: 35:85 (total categories 85). Then it works. So it seems that all data is processed successfully.
Therefore, It looked to me like a hardware bottleneck. I tried to monitor CPU and memory usage. From 12-13 categories on the CPU usage fastly increases to 100%. Memory usage seems to be fine. So its hard to tell for me if the CPU is the bottleneck.
Tomorrow i will get access to two additional machines which might have more power.

Is it easy to extract the c++ code that is run? Then i could try to run without Rstudio and exclude it as a potential error source?

Dec 11 '19 23:12 Toniiiio

fastrtext fastrtext copied to clipboard

This application has requested the Runtime to terminate it

fastrtext
fastrtext copied to clipboard