Dipannita Shaw

Results 5 issues of Dipannita Shaw

This change comprises of: - Adding the consumption of optional use_sample_trusted_cert flag for embedded devices. - Fixing some logging

Integrate Goodput library with MaxText This PR includes: - Install Goodput dependency (ml-goodput-measurement in requirements.txt) - Add config options to enable Goodput - Update MaxText's train.py to use Goodput APIs...

Quick prototype to compute Goodput based on total step time and job time on MaxText.

This changes adds the following: - Allows creating on a monitor object that spins up a secondary "monitor & upload" thread to query Goodput of the job using the ml-goodput-measurement...

Turning `enable_goodput_recording` and `monitor_goodput` on by default. Tested - [x] GCE ~1k steps [run](https://screenshot.googleplex.com/6wFY2wqp8YHMmDC) - [x] GKE ~1k steps [run](https://screenshot.googleplex.com/3TVyAVhSPx2KCFA), - [x] Example [logs](https://pantheon.corp.google.com/logs/query;query=resource.type%3D%22k8s_container%22%0Aresource.labels.project_id%3D%22cloud-tpu-multipod-dev%22%0Aresource.labels.location%3D%22us-central2%22%0Aresource.labels.cluster_name%3D%22dishaw-xpk-test-3%22%0Aresource.labels.namespace_name%3D%22default%22%0Alabels.k8s-pod%2Fjobset_sigs_k8s_io%2Fjobset-name%3D%22dishaw-goodput-maxtext-job-12%22%20severity%3E%3DDEFAULT;storageScope=project;cursorTimestamp=2024-09-27T00:10:55.977598170Z;startTime=2024-09-25T16:06:01.570Z;endTime=2024-09-27T22:25:38.299Z?e=13803378&mods=allow_workbench_image_override&project=cloud-tpu-multipod-dev)

pull ready