opencensus-python Handle unicode and str in exporters for py2

Before this commit, in py2 only bytes strings were exported and in py3 only unicode strings were exported.

I'm not sure I'm doing it right, that's at least an opening for a discussion.

Fixes #273

Aug 23 '18 06:08 guewen

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

Aug 23 '18 06:08 googlebot

I signed it!

Hey, I signed the CLA :)

Aug 23 '18 07:08 guewen

CLAs look good, thanks!

Aug 23 '18 07:08 googlebot

I know the tests don't pass on py3 but before refining I'd like a validation that I understood correctly the goal and I'm not heading in the wrong direction.

Aug 23 '18 07:08 guewen

Thanks for the PR @guewen. Using the six types to solve the problem of exporting unicode attribute values looks right to me.

In general though this problem is a big can of worms, and this PR makes it clear that we need to be more careful about internal use of the str type.

If I understand correctly: it looks like the library assumes string-valued attribute values are always strs. This is usually a safe assumption, but it means that we can't store non-ASCII characters in python 2.x. This is a problem for code that naively uses unicodes in place of strs since we'll silently fail to export these attributes.

So in python 2.x, strs are ASCII-encoded byte strings. Decoding a byte string with any valid encoding gets you a unicode... which is not a str:

>>> type(b'abc')
str

>>> type(b'abc'.decode('utf-8'))
unicode

>>> isinstance(b'abc'.decode('utf-8'), str)
False

And in python 3.x strs are effectively python 2.x's unicodes, and byte strings are demoted to bytes with no implicit encoding:

>>> type(b'abc')
bytes

>>> type(b'abc'.decode('utf-8'))
str

We have to support both versions of python, and have to support non-ASCII characters in attribute values. But the spec also says to truncate these strings to 256 bytes without specifying an encoding.

In 2/3 decoding a byte string with any valid encoding gets you a unicode/str, which is itself stored internally as unicode, using up to 4 bytes per character depending on the python implementation. Among other problems, this means that we might truncate a 265 character string down to 64 characters even if it's possible to encode it with ASCII. This is a moot point now since it doesn't look like we're actually truncating these strings, but does suggest we have to be careful making changes like this that add decode calls where byte strings would otherwise stay byte strings.

Jan 22 '19 23:01 c24t

Which is all to say: the direction looks good, but there may be some unintended consequences.

Jan 22 '19 23:01 c24t

Thanks for your detailed answer, particularly, I wasn't aware of the 256 bytes truncation (new to the subject).

Feb 13 '19 16:02 guewen

opencensus-python opencensus-python copied to clipboard

Handle unicode and str in exporters for py2

What to do if you already signed the CLA

Individual signers

Corporate signers

opencensus-python
opencensus-python copied to clipboard