Logging: the request timings being logged are wrong
Each time you make a HTTP request to OpenRefine it logs (what I assume is) its response time:
07:39:07.484 [ refine] GET /command/core/get-version (5019ms)
These timings does, however, not seem to work (again assuming it's response time).
To Reproduce
Steps to reproduce the behavior:
- Make a HTTP request to
/command/core/get-versioneach ~five seconds
Script
#!/bin/bash
URL="http://127.0.0.1:3333/command/core/get-version"
while true; do
start_time=$(date +%s.%N)
response=$(curl -s -w "\n%{time_total}\n" -o /dev/null "$URL")
end_time=$(date +%s.%N)
response_time=$(echo "$response" | tail -n 1)
printf "Response Time: %.5f seconds\n" "$response_time"
sleep 5
done
Current Results
OpenRefine prints values around ~5000ms to the terminal / log
Expected Behavior
OpenRefine prints values that match the time it takes for a request to finish.
Screenshots
OpenRefine output next to the output of the script above:
Versions
- Operating System: Fedora 37
- JRE or JDK Version: 17.0.6
- OpenRefine: 3.8/main
Additional context
I might be completely wrong assuming the intended result is for OpenRefine to print the response time but if so the output is confusing.
It's not response time. It's execution time for the entire command to complete, along with log4j to apply any filtering to the appender.
get-version, has to do a lot of JVM system calls starting here:
https://github.com/OpenRefine/OpenRefine/blob/b82ce2a341a8d051a9e082379be6d50bea35c8c9/main/src/com/google/refine/commands/GetVersionCommand.java#L58-L67
We could give the true http response time, then further down this would need to change to acknowledge only that it completed acquiring the GET and logging that time. Which is not a true Request -> "do some work" -> Response but would just be a Request then immediately Respond and acknowledge that we got a request: https://github.com/OpenRefine/OpenRefine/blob/b82ce2a341a8d051a9e082379be6d50bea35c8c9/main/src/com/google/refine/commands/GetVersionCommand.java#L71-L73
For obvious reasons, we don't want to fake a timing to handle an entire command stack.
It's not response time. It's execution time for the entire command to complete, along with log4j to apply any filtering to the appender.
It does not seem to have anything to do with execution time either, as shown above. If I modify the script above to send a request every 30 seconds OpenRefine will output a timing of around 30000ms.
Indeed it's not the response time, it's the time elapsed since the last request. Printing the response time would feel more useful to me.
I would like to work on this issue. Please assign it to me
Indeed it's not the response time, it's the time elapsed since the last request. Printing the response time would feel more useful to me.
It's actually the time elapsed since the last log message. In the context of the logging module, higher level information like response time isn't available. The best way to think of this number is as a convenience helper to keep from having to compute the delta between two timestamps in your head.
07:39:02.465 [
07:39:07.484 [
At higher logging levels where you've got things like:
Starting request
> Starting subtask1
> Ending subtask1
> Starting subtask2
> Ending subtask2
Responding to request
the delta times actually provide more useful information (but then you need to add up the subtask times in your head, so you have the opposite problem).
In the context of command processing, the response time could be added to that specific log message if deemed valuable. That would be an easy change that we'd be happy to review a PR for.
The logging code is here:
https://github.com/OpenRefine/OpenRefine/blob/74069eaa1b66d97d06d5916c7f91e883da4eba0f/modules/core/src/main/java/com/google/refine/RefineServlet.java#L170-L181
note that in the default configuration that the logging is for the request, not the response. Enabling trace for the refine logger will provide before & after logging for request processing.