dbx
dbx copied to clipboard
Formatting Emoji Symbol messages in Azure Devops with Windows agent
Expected Behavior
Getting issue with the below command when running in Azure DevOps with Windows Agent (Working well with Linux Agent) "dbx deploy --deployment-file config/adb_deployment.yaml --workflow training-pipeline"
I am feeling "Emoji" symbol (Snake symbol) causing this issue where the windows system is unable to format those
Current Behavior
| 16 | | 17 class IncrementalEncoder(codecs.IncrementalEncoder): | | 18 def encode(self, input, final=False): | | > 19 return codecs.charmap_encode(input,self.errors,encoding_table | | 20 | | 21 class IncrementalDecoder(codecs.IncrementalDecoder): | | 22 def decode(self, input, final=False): | | | | +-------------------------------- locals ---------------------------------+ | | | final = False | | | | input = '[dbx][2022-09-07 07:39:26.949] \U0001f40d Building a Python-based | | | | project\r\n' | | | | self = <encodings.cp1252.IncrementalEncoder object at | | | | 0x0000021BF10E9160> | | | +-------------------------------------------------------------------------+ | +-----------------------------------------------------------------------------+ UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f40d' in position 31: character maps to <undefined>
Steps to Reproduce (for bugs)
Running Latest dbx==0.7.4 package in Azure DevOps with Windows Agent
Your Environment
Azure DevOps with Windows Latest Agent
- dbx version used: 0.7.4
- Databricks Runtime version: 10.4.x-scala2.12
In a testing analysis, Until dbx==0.6.12 works well. The issue is starting from the dbx==0.7.0 version to the latest (0.7.4)
Fwiw, I've seen this issue in Azure Devops pipelines with windows agents in other OSS, notably Prefect. I ended up monkey patching their code to remove the offending character before it was printed. Here is the offending line in their code, also a unicode character.
I'm not sure what's the root cause of this. Linking the ticket in the relevant library.
I know a ticket was opened on Typer's GH, but I just thought it was worth pointing this out - this same bug happens in the build pipeline for this repo. It just gets handled better so it doesn't cause the pipeline to fail.
I've been chasing this down all week and have determined that:
a) this is a Windows only problem and mainly on Windows agents (as opposed to the general Python user's workstation) because commands ran by the Windows agent will have their output redirected to a file. This affects Windows agents used by Azure DevOps and GitHub Actions, I can't speak to any other CI tool's images.
b) there are a few solutions you can use to resolve this, but they all have to happen in the pipeline, and cannot be implemented by libraries (so far as I can tell, at least)
- You can add the environment variable 'PYTHONUTF8' to your pipeline with the value of 1
- You can add the environment variable 'PYTHONIOENCODING' to your pipeline with the value of 'utf8'
- You can call python/python.exe with the argument -X utf8 before your script (e.g.
python -X utf8 ./path/to/python_file.py
)
Note: If using an environment variable you will need to set this on the pipeline itself, not in a run command.
variables:
- name: PYTHONUTF8
value: 1
env:
PYTHONIOENCODING: "utf8"
Quick explanation
This is caused by the windows agent running any commands provided in a way that pipes output to a file. You can cause the same error in your own terminal (if you have a Windows machine) by running a python command or a python library CLI and using >
to redirect the output to a file.
PS C:\Users\UserName> python -c "print('â””')"
â””
PS C:\Users\UserName> python -c "print('â””')" > test_file.txt
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to <undefined>
Because the output is sent to a file and not to the Windows console, all of the tricks that libraries like Click, Typer, and Rich employ to print Unicode on Windows consoles are not applicable. And because the file that the output is being redirected to is opened outside of user control, you cannot specify an encoding of UTF8 to resolve this. The file will be opened using the preferred locale (locale.getpreferredencoding(False)
) which is usually not a Unicode compatible code page. For the hosted Windows agents it is cp1525, which is why the error messages show something similar to File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
.
This could be resolved if the Windows agents did not redirect the output of commands to files or if Windows had a default code page that was unicode compatible. I believe the latter may be happening in Windows 11, although it seems to be introducing problems of its own.
Thanks a lot, @NodeJSmith & Kevin Deldyck . Your solution helped me a lot by adding the below in the pipeline task
env: PYTHONIOENCODING: "utf8"
Thanks a lot, @NodeJSmith & Kevin Deldyck . Your solution helped me a lot by adding the below in the pipeline task
env: PYTHONIOENCODING: "utf8"
Hi @sasi143 , can you please show me how to add this to task in Prefect? I don't know to add that.