apm-agent-python
apm-agent-python copied to clipboard
Make OT binding support explicit timestamps.
What does this pull request do?
OpenTracing API allows to explicitly set start and end timestamps of spans but the implementation in this project does not allow that. That patch fixes it.
Closes #1126
💚 CLA has been signed
:green_heart: Build Succeeded
the below badges are clickable and redirect to their specific view in the CI or DOCS
![]()
![]()
![]()
![]()
![]()
Expand to view the summary
Build stats
-
Start Time: 2021-10-13T16:27:38.163+0000
-
Duration: 31 min 37 sec
-
Commit: d16b3cc19033da48037cb31b1ef44d5decb690de
Test stats :test_tube:
| Test | Results |
|---|---|
| Failed | 0 |
| Passed | 10007 |
| Skipped | 8977 |
| Total | 18984 |
:green_heart: Flaky test report
Tests succeeded.
:robot: GitHub comments
To re-run your PR in the CI, just comment with:
/test: Re-trigger the build.
jenkins run the tests please
Also, @AndreiPashkin, please sign the CLA when you get a chance.
Also, @AndreiPashkin, please sign the CLA when you get a chance.
Hmm, but I've signed it, here is the PDF: https://www.dropbox.com/s/oufsg1mc17hlqvy/Please_DocuSign_Individual_Contributor_Licen%20%281%29.pdf?dl=0
Why doesn't the check turn green?
I'm also puzzled - how can I run tests so that I can use PDB for debugging locally? Could you please advise? I'd like to add one test case.
Looks like CLA went green, maybe it was delayed.
I'm also puzzled - how can I run tests so that I can use PDB for debugging locally?
I don't tend to use PDB, but running the tests should be as simple as installing the requirements file that is most relevant (at minimum, reqs-base.txt), and then run py.test to run all the tests. I'm guessing PDB can attach to pytest runs without issue -- most of my debugging is either via logging or via the VSCode debugger.
Let me actually write a test.
@basepi, @beniwohli, I've observed some strange when tested my changes.
Here is my test-script:
import logging
import sys
import time
from elasticapm import Client
from elasticapm.contrib.opentracing import Tracer
client = Client({
'SERVICE_NAME': 'test-with_changes',
'SERVER_URL': 'http://localhost:8888'
})
tracer = Tracer(client_instance=client)
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
START = time.time()
span_1 = tracer.start_span('parent', start_time=START)
span_2 = tracer.start_span('child', child_of=span_1, start_time=START + 2)
span_2.finish(finish_time=START + 10)
span_1.finish(finish_time=START + 15)
When I install the latest release with pip install -I elastic-apm and execute it - I see such a picture:
Both parent and child are visible but they have incorrect timestamps recorded (and making it recorded correctly is the purpose of this PR).
When I install my changes (pip install -I https://github.com/AndreiPashkin/apm-agent-python/archive/explicit-timestamps-opentracing.zip) and re-run my script (changing the service name to better observe the differences) - I see the following:

- Only the parent is seen.
- Timestamps have gigantic values.
I've also captured payload that was sent to the Elastic APM instance in both cases.
Without changes:
{"metadata": {"service": {"name": "test-without_changes", "environment": null, "version": null, "agent": {"name": "python", "version": "5.6.0"}, "language": {"name": "python", "version": "3.7.7"}, "runtime": {"name": "CPython", "version": "3.7.7"}}, "process": {"pid": 30721, "ppid": 2071, "argv": ["test_elastic.py"], "title": null}, "system": {"hostname": "andrei-yoga514", "architecture": "x86_64", "platform": "linux"}}}
{"span": {"id": "da6e65e8787355b7", "transaction_id": "f5c2f6d6740a0cd4", "trace_id": "4366c9a4a3cbc6c54abbf84e10d4d8a5", "parent_id": "f5c2f6d6740a0cd4", "name": "child", "type": "code", "subtype": "custom", "action": null, "sync": true, "timestamp": 1589642748818084, "duration": 0.07035498856566846}}
{"transaction": {"id": "f5c2f6d6740a0cd4", "trace_id": "4366c9a4a3cbc6c54abbf84e10d4d8a5", "name": "parent", "type": "custom", "duration": 0.24842699349392205, "result": "", "timestamp": 1589642748818004, "sampled": true, "span_count": {"started": 1, "dropped": 0}, "context": {"tags": {}}}}
With changes:
{"metadata": {"service": {"name": "test-with_changes", "environment": null, "version": null, "agent": {"name": "python", "version": "5.6.0"}, "language": {"name": "python", "version": "3.7.7"}, "runtime": {"name": "CPython", "version": "3.7.7"}}, "process": {"pid": 31801, "ppid": 2071, "argv": ["test_elastic.py"], "title": null}, "system": {"hostname": "andrei-yoga514", "architecture": "x86_64", "platform": "linux"}}}
{"span": {"id": "a1a2d1d77c17595e", "transaction_id": "0af7966685c1e27a", "trace_id": "7adccfb5c60bad87b80e290334e6ee53", "parent_id": "0af7966685c1e27a", "name": "child", "type": "code", "subtype": "custom", "action": null, "sync": true, "timestamp": 3179173112604596, "duration": 8000.0, "stacktrace": [{"abs_path": "test_elastic.py", "filename": "test_elastic.py", "module": "__main__", "function": "<module>", "lineno": 21, "library_frame": false}]}}
{"transaction": {"id": "0af7966685c1e27a", "trace_id": "7adccfb5c60bad87b80e290334e6ee53", "name": "parent", "type": "custom", "duration": 1589530226905.7947, "result": "", "timestamp": 1589642898698801, "sampled": true, "span_count": {"started": 1, "dropped": 0}, "context": {"tags": {}}}}
Does anyone have any ideas about what could be wrong?
@beniwohli, can you also please explain this code: http://github.com/elastic/apm-agent-python/blob/master/elasticapm/traces.py#L140-L140
Why is start provided both self.timestamp and self.start_time are assigned to it but if start is not provided self.timestamp is assigned epoch timestamp and self.start_time is assigned result of time.perf_time() which is usually equals to time.monotonic()?
@beniwohli
it's not bulletproof as far as I can tell (e.g. the system clock could jump right after you measured the delta, but before applying it to the non-monotonic timestamp).
But if system clock is jumping then the agent is going to send skewed data in any case - so there is no bullet-proofness against that in current master already and my PR does not change that.
TimestampConversionUtil simply converts timestamps between clocks without adding or removing anything in regards to protection against clock jumps.
As such, I'm a bit wary to introduce this into a major code path of the agent. However, I'd be open to it if you moved the usage of TimestampConversionUtil to the opentracing bridge.
Should I add another private argument for system-clock timestamp and pass it from the OT bridge along with the monotonic timestamp?
@beniwohli Could you please reply on my prev. comment?
:green_heart: Flaky test report
Tests succeeded.
Expand to view the summary
Test stats :test_tube:
| Test | Results |
|---|---|
| Failed | 0 |
| Passed | 9189 |
| Skipped | 6837 |
| Total | 16026 |
@basepi, @beniwohli, could you please look into it?
@beniwohli As far as I can tell, the new TimestampConversionUtil is only used within the OT bridge, and even then only if a start time is defined, so I don't think this is actually going to introduce that additional complexity across core transaction/span duration measurements.
For my sanity, @AndreiPashkin, can you give me the tl;dr of why that class is necessary? Are we just trying to avoid the unlikely, but possible chance that we could measure the difference between the clocks at the point of a daylight saving change, and thus convert with an hour mistake in the duration? Are there other examples of jumps in non-monotonic clocks that we're trying to detect/prevent?
Hey @AndreiPashkin -- I'd still like to get this in. I ended up writing a similar (but a bit simpler) timestamp conversion util for some AWS lambda work I was doing. It was just merged in #1366. Can you merge from upstream/master and switch to that new util? We can then get this merged.
Thanks!
Closing this due to lack of reply.