server icon indicating copy to clipboard operation
server copied to clipboard

got some problem when uploading files

Open Kenny-Ch opened this issue 5 months ago • 5 comments

when i do training, i found that wandb suddenly can't upload wandb-metadata.json. After training , I try to upload the file with wandb sync and I got these error. image

wandb sync wandb/run-20240826_190835-g7b6iqjc/
Find logs at: /home/JIng/kenny/Project/personal_copilot/training/wandb/debug-cli.JIng.log
Syncing: http://localhost:8080/charly/personal-code-copilot/runs/g7b6iqjc ... wandb: ERROR Error uploading "code/train.py": CommError, <Response [507]>
wandb: ERROR Error uploading "wandb-metadata.json": CommError, <Response [507]>
wandb: ERROR Error uploading "wandb-summary.json": CommError, <Response [507]>
wandb: ERROR Error uploading "conda-environment.yaml": CommError, <Response [507]>
wandb: ERROR Error uploading "output.log": CommError, <Response [507]>
wandb: ERROR Error uploading "requirements.txt": CommError, <Response [507]>
wandb: ERROR Error uploading "config.yaml": CommError, <Response [507]>

and I also got the error when I running wandb verify

Default host selected: http://localhost:8080
Find detailed logs for this test at: /tmp/tmp5033o82e/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................Traceback (most recent call last):
  File "/home/JIng/miniconda3/envs/starcode-3b/bin/wandb", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/wandb/cli/cli.py", line 2960, in verify
    url_success, url = wandb_verify.check_graphql_put(api, host)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/wandb/sdk/verify/verify.py", line 400, in check_graphql_put
    contents = read_file.read()
               ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'read'

here is some error log I found in /var/log

./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:34:12.313204066Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:00.058451284Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"task 24:garbage_collect_runs_v2 paused due to repeated failures"}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"task 24:garbage_collect_runs_v2 paused due to repeated failures"}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:00.058625177Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"task 33:FlatRunsMigrator paused due to repeated failures"}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"task 33:FlatRunsMigrator paused due to repeated failures"}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:12.317314097Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:36:12.317093934Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:37:12.316296925Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:38:12.315714385Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla.log:{"level":"ERROR","time":"2024-08-29T05:32:51.071134593Z","info":{"program":"gorilla","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":59},"data":{"dd.service":"gorilla","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b","http":{"url":"http://192.168.104.9/oidc/auth","method":"GET","headers":{"Host":"192.168.104.9","Connection":"close","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0","Accept-Encoding":"gzip, deflate","Accept-Language":"zh,en-US;q=0.9,en;q=0.8","X-Original-Uri":"/system-admin/static/css/main.c9951160.css.map","X-Forwarded-For":"192.168.104.9"}}},"message":"Not logged in","dd.trace_id":"10464612527120353434","error":{"kind":"*errors.errorString","message":"Not logged in"}}
./mysql.log:2024-08-29T05:33:11.670654Z 27 [Note] Aborted connection 27 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670658Z 22 [Note] Aborted connection 22 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670773Z 25 [Note] Aborted connection 25 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670709Z 23 [Note] Aborted connection 23 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670743Z 21 [Note] Aborted connection 21 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670767Z 24 [Note] Aborted connection 24 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670656Z 28 [Note] Aborted connection 28 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670788Z 17 [Note] Aborted connection 17 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670797Z 26 [Note] Aborted connection 26 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670889Z 20 [Note] Aborted connection 20 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670895Z 19 [Note] Aborted connection 19 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670958Z 18 [Note] Aborted connection 18 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.768660Z 7 [Note] Aborted connection 7 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194361Z 15 [Note] Aborted connection 15 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194462Z 8 [Note] Aborted connection 8 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194516Z 11 [Note] Aborted connection 11 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194523Z 9 [Note] Aborted connection 9 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194478Z 13 [Note] Aborted connection 13 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)

and here is the debug bundle: debug.zip

Kenny-Ch avatar Aug 29 '24 06:08 Kenny-Ch