apm-agent-python Make OT binding support explicit timestamps.

What does this pull request do?

OpenTracing API allows to explicitly set start and end timestamps of spans but the implementation in this project does not allow that. That patch fixes it.

Closes #1126

May 08 '20 20:05 AndreiPashkin

💚 CLA has been signed

May 08 '20 20:05 cla-checker-service[bot]

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2021-10-13T16:27:38.163+0000
Duration: 31 min 37 sec
Commit: d16b3cc19033da48037cb31b1ef44d5decb690de

Test stats :test_tube:

Test	Results
Failed	0
Passed	10007
Skipped	8977
Total	18984

:green_heart: Flaky test report

Tests succeeded.

:robot: GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

May 08 '20 20:05 apmmachine

jenkins run the tests please

May 08 '20 21:05 basepi

Also, @AndreiPashkin, please sign the CLA when you get a chance.

May 08 '20 21:05 basepi

Also, @AndreiPashkin, please sign the CLA when you get a chance.

Hmm, but I've signed it, here is the PDF: https://www.dropbox.com/s/oufsg1mc17hlqvy/Please_DocuSign_Individual_Contributor_Licen%20%281%29.pdf?dl=0

Why doesn't the check turn green?

I'm also puzzled - how can I run tests so that I can use PDB for debugging locally? Could you please advise? I'd like to add one test case.

May 09 '20 05:05 AndreiPashkin

Looks like CLA went green, maybe it was delayed.

I'm also puzzled - how can I run tests so that I can use PDB for debugging locally?

I don't tend to use PDB, but running the tests should be as simple as installing the requirements file that is most relevant (at minimum, reqs-base.txt), and then run py.test to run all the tests. I'm guessing PDB can attach to pytest runs without issue -- most of my debugging is either via logging or via the VSCode debugger.

May 11 '20 17:05 basepi

Let me actually write a test.

May 13 '20 10:05 AndreiPashkin

@basepi, @beniwohli, I've observed some strange when tested my changes.

Here is my test-script:

import logging
import sys
import time


from elasticapm import Client
from elasticapm.contrib.opentracing import Tracer


client = Client({
    'SERVICE_NAME': 'test-with_changes',
    'SERVER_URL': 'http://localhost:8888'
})
tracer = Tracer(client_instance=client)

logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)

START = time.time()

span_1 = tracer.start_span('parent', start_time=START)
span_2 = tracer.start_span('child', child_of=span_1, start_time=START + 2)
span_2.finish(finish_time=START + 10)
span_1.finish(finish_time=START + 15)

When I install the latest release with pip install -I elastic-apm and execute it - I see such a picture: Both parent and child are visible but they have incorrect timestamps recorded (and making it recorded correctly is the purpose of this PR).

When I install my changes (pip install -I https://github.com/AndreiPashkin/apm-agent-python/archive/explicit-timestamps-opentracing.zip) and re-run my script (changing the service name to better observe the differences) - I see the following:

Only the parent is seen.
Timestamps have gigantic values.

I've also captured payload that was sent to the Elastic APM instance in both cases.

Without changes:

{"metadata": {"service": {"name": "test-without_changes", "environment": null, "version": null, "agent": {"name": "python", "version": "5.6.0"}, "language": {"name": "python", "version": "3.7.7"}, "runtime": {"name": "CPython", "version": "3.7.7"}}, "process": {"pid": 30721, "ppid": 2071, "argv": ["test_elastic.py"], "title": null}, "system": {"hostname": "andrei-yoga514", "architecture": "x86_64", "platform": "linux"}}}
{"span": {"id": "da6e65e8787355b7", "transaction_id": "f5c2f6d6740a0cd4", "trace_id": "4366c9a4a3cbc6c54abbf84e10d4d8a5", "parent_id": "f5c2f6d6740a0cd4", "name": "child", "type": "code", "subtype": "custom", "action": null, "sync": true, "timestamp": 1589642748818084, "duration": 0.07035498856566846}}
{"transaction": {"id": "f5c2f6d6740a0cd4", "trace_id": "4366c9a4a3cbc6c54abbf84e10d4d8a5", "name": "parent", "type": "custom", "duration": 0.24842699349392205, "result": "", "timestamp": 1589642748818004, "sampled": true, "span_count": {"started": 1, "dropped": 0}, "context": {"tags": {}}}}

With changes:

{"metadata": {"service": {"name": "test-with_changes", "environment": null, "version": null, "agent": {"name": "python", "version": "5.6.0"}, "language": {"name": "python", "version": "3.7.7"}, "runtime": {"name": "CPython", "version": "3.7.7"}}, "process": {"pid": 31801, "ppid": 2071, "argv": ["test_elastic.py"], "title": null}, "system": {"hostname": "andrei-yoga514", "architecture": "x86_64", "platform": "linux"}}}
{"span": {"id": "a1a2d1d77c17595e", "transaction_id": "0af7966685c1e27a", "trace_id": "7adccfb5c60bad87b80e290334e6ee53", "parent_id": "0af7966685c1e27a", "name": "child", "type": "code", "subtype": "custom", "action": null, "sync": true, "timestamp": 3179173112604596, "duration": 8000.0, "stacktrace": [{"abs_path": "test_elastic.py", "filename": "test_elastic.py", "module": "__main__", "function": "<module>", "lineno": 21, "library_frame": false}]}}
{"transaction": {"id": "0af7966685c1e27a", "trace_id": "7adccfb5c60bad87b80e290334e6ee53", "name": "parent", "type": "custom", "duration": 1589530226905.7947, "result": "", "timestamp": 1589642898698801, "sampled": true, "span_count": {"started": 1, "dropped": 0}, "context": {"tags": {}}}}

Does anyone have any ideas about what could be wrong?

May 16 '20 15:05 AndreiPashkin

@beniwohli, can you also please explain this code: http://github.com/elastic/apm-agent-python/blob/master/elasticapm/traces.py#L140-L140

Why is start provided both self.timestamp and self.start_time are assigned to it but if start is not provided self.timestamp is assigned epoch timestamp and self.start_time is assigned result of time.perf_time() which is usually equals to time.monotonic()?

May 16 '20 16:05 AndreiPashkin

@beniwohli

it's not bulletproof as far as I can tell (e.g. the system clock could jump right after you measured the delta, but before applying it to the non-monotonic timestamp).

But if system clock is jumping then the agent is going to send skewed data in any case - so there is no bullet-proofness against that in current master already and my PR does not change that. TimestampConversionUtil simply converts timestamps between clocks without adding or removing anything in regards to protection against clock jumps.

As such, I'm a bit wary to introduce this into a major code path of the agent. However, I'd be open to it if you moved the usage of TimestampConversionUtil to the opentracing bridge.

Should I add another private argument for system-clock timestamp and pass it from the OT bridge along with the monotonic timestamp?

Jun 06 '20 09:06 AndreiPashkin

@beniwohli Could you please reply on my prev. comment?

Aug 26 '20 09:08 AndreiPashkin

:green_heart: Flaky test report

Tests succeeded.

Expand to view the summary

Test stats :test_tube:

Test	Results
Failed	0
Passed	9189
Skipped	6837
Total	16026

Oct 27 '20 17:10 apmmachine

@basepi, @beniwohli, could you please look into it?

Mar 17 '21 11:03 AndreiPashkin

@beniwohli As far as I can tell, the new TimestampConversionUtil is only used within the OT bridge, and even then only if a start time is defined, so I don't think this is actually going to introduce that additional complexity across core transaction/span duration measurements.

For my sanity, @AndreiPashkin, can you give me the tl;dr of why that class is necessary? Are we just trying to avoid the unlikely, but possible chance that we could measure the difference between the clocks at the point of a daylight saving change, and thus convert with an hour mistake in the duration? Are there other examples of jumps in non-monotonic clocks that we're trying to detect/prevent?

Jun 10 '21 17:06 basepi

Hey @AndreiPashkin -- I'd still like to get this in. I ended up writing a similar (but a bit simpler) timestamp conversion util for some AWS lambda work I was doing. It was just merged in #1366. Can you merge from upstream/master and switch to that new util? We can then get this merged.

Thanks!

Oct 29 '21 14:10 basepi

Closing this due to lack of reply.

Aug 25 '22 18:08 basepi

apm-agent-python apm-agent-python copied to clipboard

Make OT binding support explicit timestamps.

What does this pull request do?

:green_heart: Build Succeeded

Build stats

Test stats :test_tube:

:green_heart: Flaky test report

:robot: GitHub comments

:green_heart: Flaky test report

Test stats :test_tube:

apm-agent-python
apm-agent-python copied to clipboard