google-batch provider % logging missing default
With the recent google-batch provider, I am noticing that any stdout / err with % is broken. Maybe it is parsed as old style strings?
For example when I print nvidia-smi which shows % gpu usage, I see this:
Thu Jan 2 22:40:08 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 66C P8 14W / 70W | 3MiB / 15360MiB | 0%!D(MISSING)efault |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
0% is accurate but 0%!D(MISSING)efault is not
Thanks for reporting @rivershah . I'm able to reproduce something similar this with a quick test:
dsub \
--provider google-batch \
--project "${MY_PROJECT}" \
--logging "${MY_BUCKET}" \
--regions us-central1 \
--command 'echo "%hel%lo%"'
In the resulting stdout file I see:
%!h(MISSING)el%!l(MISSING)o%!
Will update here as I learn more.
Any further updates please?
Hi rivershah@, I've neglected to follow-up with the Batch API team on this - will ping on that thread.
Can this please be fixed; some critical logs are malformed because of this. Thank you
Hi @rivershah!
I just pinged the Batch API team again. This time, I had included over this script which demonstrates the issue within the Batch logging, and doesn't use dsub.
I had also filed this bug in the Compute Engine Issue tracker: https://partnerissuetracker.corp.google.com/issues/440126124 . I filed in Compute Engine because I didn't see a component specific for the Batch API in the trackers list. Feel free to comment or +1 on this issue.