oci-cli icon indicating copy to clipboard operation
oci-cli copied to clipboard

OCI CLI json output is not unicode as required by spec

Open triatic opened this issue 1 year ago • 16 comments

When executing commands such as oci compute instance list, json_decode() in php can fail when decoding the json output. This is because the json output can contain non-ascii characters, and it is not unicode as required by specification.

OCI version 3.50.0 (msi package) Windows 10 version 10.0.19045.5011

triatic avatar Nov 12 '24 13:11 triatic

CLI outputs do not generate non unicode characters, if you have any such example please share, we will investigate.

NupurGupta3101 avatar Nov 12 '24 13:11 NupurGupta3101

ASCII encoding from oci compute instance list. Note, this json output contained non-ascii characters which were not unicode.

C:\>php -r "var_dump(mb_detect_encoding(shell_exec('oci compute instance list --compartment-id ocid1.tenancy.oc1..removed')));"
string(5) "ASCII"

triatic avatar Nov 12 '24 14:11 triatic

Can you please share the output of the oci commands or the start of it shows data.., is there any errors or warning ?

{
  "data": {
    "items": [
      {

Which python version do you use ?

adizohar avatar Nov 12 '24 14:11 adizohar

I'm using the newest Windows oci msi package downloaded from Github, which bundles Python. The json is formatted correctly, other than the non unicode characters.

The line that breaks things is this:

"processor-description": "3.0 GHz Ampere® Altra™",

Start of output:

C:\>oci compute instance list --compartment-id ocid1.tenancy.oc1..removed
{
  "data": [
    {
      "agent-config": {
... etc

triatic avatar Nov 12 '24 14:11 triatic

I asked Python version :) I tried to run and didn't see any non ascii, I will wait for OCI CLI team to respond

adizohar avatar Nov 12 '24 15:11 adizohar

I asked Python version :)

Whatever the MSI package installs? I can see python38.dll in the installation directory, and I do not have Python globally installed in Windows.

triatic avatar Nov 12 '24 15:11 triatic

Thank you for that

adizohar avatar Nov 12 '24 15:11 adizohar

I tried to run and didn't see any non ascii

"3.0 GHz Ampere® Altra™" contains non ASCII characters, the ® and ™ characters. The problem for me is that they are also not produced in unicode by oci as required by json spec.

triatic avatar Nov 12 '24 15:11 triatic

Understood, it is the processor type, Nupur, please take it with OCI CLI team "processor-description": "3.0 GHz Ampere® Altra™"

adizohar avatar Nov 12 '24 15:11 adizohar

@adizohar just to clarify, are you are saying only ASCII characters should be returned by oci's json output, and the expected fix is to remove ® and ™ from the json output?

triatic avatar Nov 12 '24 17:11 triatic

No, I don't believe this is a bug or an issue that needs to be fixed. I have asked the OCI CLI team to take a look. In the meantime, you can filter out the non-ASCII characters before ingesting the JSON, or use the OCI Python SDK to read and handle these characters.

adizohar avatar Nov 12 '24 17:11 adizohar

Ok. At the moment I am converting oci's output from ASCII to UTF-8 where the ® and ™ characters are present, which prevents json_decode() from failing.

triatic avatar Nov 12 '24 17:11 triatic

According to https://thesmsworks.co.uk/unicode-detector ® and ™ are unicode characters.

NupurGupta3101 avatar Nov 13 '24 14:11 NupurGupta3101

According to https://thesmsworks.co.uk/unicode-detector ® and ™ are unicode characters.

They can be encoded in unicode. But OCI CLI encodes them in Windows-1252 which is not valid for json: https://en.wikipedia.org/wiki/Windows-1252

triatic avatar Nov 13 '24 19:11 triatic

Can you please share the output recieved (without any further parsing) from oci-cli when you trigger this command (or via a script). It will be more clear then.

NupurGupta3101 avatar Nov 14 '24 08:11 NupurGupta3101

Are you happy for me to edit out unique identifiers from the output?

triatic avatar Nov 14 '24 14:11 triatic