Rserve
Rserve copied to clipboard
Rserve for real-time performance tips
I am developing a REST service that uses a Machine Learning model of R. This service will be used by a web page of well-known company, so the traffic will be a high and it must be a 24/7 service. My idea is to use the RServe to calculate the predictions based of the model using R. Although I have played some time with RServe sharing the connection among consecutive requests, I have some doubts:
- Could the performance degenerate over the time?
- Is a good practice to share the session among consecutive requests?
- Is a good approach to use Rserve for this real-time service?
Q: Could the performance degenerate over the time? A: The performance doesn't degrade (to my best knowledge) because each connection starts from the same, clean process. We have Rserve servers running for months without issues.
Q: Is a good practice to share the session among consecutive requests? A: For REST, no. It is useful when you have a state you want to keep server-side, but then that wouldn't be the case for REST. However, you do want to share the initial state (e.g., loaded model) with all requests and that's fine.
Q: Is a good approach to use Rserve for this real-time service? A: Yes, if you know what you're doing. Rserve itself is stable and performant, so it really depends on the code you will be executing. Make sure you know your math - it's all about the CPU cycles needed for your code - Rserve will spread it across cores by connection but that is what truly determines scalability. Note that there is a small serial forking overhead (typically single-digit ms) which limits the serving rate per process. Typically in production we use a proxy (e.g., nginx) backed by a scalable, distributed Rserve farm.
Thanks so much for the clear response
Complimenting this thread: got exactly the same use case as @alvsanand but RServe doesn't seem to be performing well as stated. If I go higher than 4 requests/second, RServe basically hangs.
I am by no means an expert in R, so I can be doing something wrong, but I wanted your opinion.
I have a randomForest model that I run behind a Rserve. I send data as json string to it through the java client and when the data is converted to proper R structure using jsonlite fromJSON
.
The first time I ran this code, fromJSON
takes ~150ms. If I maintain the same connection to RServe, the second time it takes only ~4ms. That's great and I can create some connection pool to maintain connections open, but the problem arrises when I start having more than 4 connections in parallel. The service basically hangs and requests start taking seconds.
So my questions are:
- Is there any way to remove this loading for every new connection when I call
fromJSON
? Do you have any idea of what could it be? - Do you have any performance test using RServe? I would like to know how many requests per second it can handle for different cases.
First, this is really an R question, not Rserve question. Rserve itself has a very small overhead (more below) so all the time spent is purely related to R and your code. It's really impossible to give you precise answer due to lack of details (e.g., there are at least three different packages implementing fromJSON()
), but it looks as if the time in your code is mostly spent loading and attaching packages. If you know which packages you'll use, you should pre-load them before you start the service (see source
and eval
configuration directives).
Second, Rserve can easily handle hundreds of connections, we use it routinely that way and it had been tested in many big companies. So the real question is what is your R code and waht resources are you using? Again, regular R rules apply, it has nothing to do with Rserve itself. If you use R at 100% CPU and have only 4 cores then obviously, they get saturated at 4 connections. So profile your R code (unrelated to Rserve) and that will give you the estimate of expected performance.
In practice, the user R code takes the vast majority of compute time, so Rserve's overhead is negligible. However, there are very special cases if you use very fast scripts (we're talking single-digit milliseconds). On a typical server machine the forking overhead of a connection is in the order of 3ms. After that each connection is an independent process, so you're only bound by the number of cores you have. However, the first 3ms are serial as the forking comes from the server process, so you have an effectively limited rate of ~30 new connections per second (note: that doesn't limit the number of parallel connections). If that is not enough, you can use the fork
config directive to start multiple parallel servers to spread the load.
thanks for the quick response simon. I thought it was probably somethingto R code, but I wanted to make sure.
I was using the source
directive, but now I am loading the library in a different manner. My fromJSON
is from jsonlite
. The way I run RServe looks like:
library(jsonlite)
library(Rserve)
Rserve::run.Rserve(debug = FALSE, port = 6311, remote=TRUE, auth=FALSE, args="--no-save", config.file = "/etc/Rserve.conf", maxinbuf = 4000000)
I found the email list, so I sent there a more detailed version of this question.
I have a similar problem related to near-real-time predictions as stated in this thread and stated here.
I am running a Java application which contacts the RServer to score some data. I want to do this near-real-time. If I am using a 1 thread java application to send sequential requests to 1 Rserve instance, I am achieving an average response time between 5 and 6 ms. If I now increase the Java threads, thus increasing the requests and also simulating parallel requests, the performance trops significantly to 25ms per request.
Is this expected due to request handeling or did I configure somthing wrong at the server? I also tried to manually increase the number of Rserver instances with different ports, but the results are the same. Is there a better solution for my intentions?
What is exactly your setup? The fixed cost I mentioned above is associated with the fork-on connect, i.e., if you create multiple RConnection
objects to the same Rserve instance, you're getting hit for creating those connection, but once they are up, you can issue parallel requests independently to any of them with incurring any additional cost.
I did some testing and I cannot replicate any issues related to parallel connection. With a simple test that merely does eval("NULL")
I see
connect: 10553µs
eval: 7068µs
eval: 211µs
eval: 78µs
eval: 73µs
eval: 69µs
eval: 73µs
eval: 69µs
and eventually it settles around 40µs. The numbers don't change as I'm increasing the number of parallel connections until you start saturating all the cores (it started getting close to saturation at 12 parallel connections on a 16 core (32HT) machine when the 40µs were slowly rising toward 50µs). Note that Rserve
was running at ~85% CPU usage while java
was at ~40% during the test.
This suggests that Rserve itself has very minimal overhead if used over an existing connection (40µs are really negligible compared to any actual work in R), so everything really depends on the R code. The relatively expensive first eval
is likely due to the fact that everything in R has to get into the cache (parser, environments etc.) before you get the fast performance as well as possibly JIT in Java.
Also I would expect the numbers to be lower if you use C since there is quite a bit of Java overhead involved as well, but I didn't run that test. Finally, Java cannot use unix sockets, so I suspect the reason why the initial connect is closer to 10ms is probably due to Java, not Rserve, since the typical fork cost on that machine is below 4ms. The Java code:
long t1 = System.nanoTime();
RConnection c = new RConnection();
long t2 = System.nanoTime();
System.out.println("connect: " + (t2 - t1) / 1000 +"µs");
REngine eng = (REngine) c;
long i = 0;
while (true) {
long t3 = System.nanoTime();
c.parseAndEval("NULL");
long t4 = System.nanoTime();
i++;
if (i < 8)
System.out.println("eval: " + (t4 - t3) / 1000 +"µs");
else if ((i & 0xfff) == 0)
System.out.print("eval: " + (t4 - t3) / 1000 +"µs \r");
}
I conducted the test on Ubuntu 14.04.3 LTS (Linux 3.13.0), Rserve 1.8-5 (one instance running on loopback TCP/IP), R 3.2.3, Oracle Java 1.8.0_101, 2 x E5-2667 v2 @ 3.30GHz
PS: Just to clarify, the suggestion for low-latency jobs to use multiple Rserve instances is not meant one instance per connection as every instance has the same connect cost, but it is intended to spread the serial fork cost across multiple cores, so it really only makes sense for servers with many cores (let's if you have one server instance maxed at 15 conn/s, then with two you'll have 30 conn/s, with 4 you'll get 60 conn/s etc. - but you must leave room for the actual R code CPU usage as well so you typically want at least twice as many cores as you have instances - in most cases even more).
Thank you for taking the time and looking into this :)
I am with you with the forking and I think I understood that. To clarify: I am setting up a connection to the Rserver as you do. Then I am iterating over a json list which I parse and evaluate in R. My list contains 500 Json strings. In R I am transforming the string into a dataframe and then use a preloaded model to score it. The result is returned. Its a 3 liner in R and if I preload everything in R, it scores very fast < 1 ms.
Printing out the processing results for this example in ms:
[23.8396], [11.590236], [7.09057], [6.886749], [6.90031], [7.266447], [5.715192], ...
It stablizes at about 5 ms with some outliers reaching up to 10 and going down to 3 ms. Processing all 500 Strings takes about 3083 ms ~ 3 sek. Very good results and thats how I want them. (ping is about 1 ms and probably explains some of the variance)
However, I am performing each scoring in a sequential order. This is a limitation which I would like to overcome. Thus, I wanted to test the performance when increasing the incoming requests by putting them in parallel. In the results above I am using 1 Java Thread iterating sequentially over the json list. Therfore, I suspect that Rserve only uses 1 CPU Core during the 3 sek runtime. (Max. 2 while forking the Connection in the beginning) Is this correct? At least that is what I suspect from my Java evalutation statment. It waits until the results are given back, then sending the next json string with the next eval statment.
I simply added another thread doing the same thing. Connecting to Rserve and then iterating sequentially over the json list. For this test I only took 250 json strings for each of the threads to see how they compare to the solution with 1 thread. The results are worse:
Thread 1: [36.647827], [39.746638], [41.766351], [12.736725], [28.635956], [29.959145], [17.718], Thread 2: [45.432654], [28.606369], [23.555237], [29.09784], [11.35765], [30.898528], [29.064965],
Total preocessing time: 6,5 sek. I expected the time to be close to 2 sek. I don't really know what is happening here. Is my suggestion, that Rserve only utilzes 1 Core while answering the sequential requests, simply wrong? I can also provide you with the java + R code if you would like to.
System: CentOS release 6.7 (Final), Rserve 1.8.5, R version 3.2.2, Java 1.8.0_101, 1x Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz (maybe I alreade reached the limit of the CPU with only 1 Thread?!)
Again thank you for helping me!
R is always sequential, it doesn't support threading. If you want parallelism, you have to open multiple connections and use them in parallel, Rserve supports that.