bhack

Results 1417 comments of bhack

@andrewkho I have used your command with last nightly (DDP but not `torchrun`): `ps -eL | grep pt_ | nl` With 8 GPUs 10 `num_workers` I have: `5657`

In what part of the code we could check the `num_workers` creation logic? How we are reaching `5657` `num_workers`? /cc @d4l3k @ppwwyyxx

`TORCHINDUCTOR_COMPILE_THREADS=0 or 1` 1 GPU 15 workers: ` ps -eL | grep pt_ | nl` -> `167` `pt_data_worker`

/cc @ezyang in the case you have a feedback on this.

I suppose that with a little effort in the CI we could make a build (and the related sccache) quite aligned to the nightly container we are publishing every day...

> but we haven't been able to make it work with our S3 bucket IIRC (it keeps asking for AWS credentials). Is there any news on this? As I saw...

So if that comment is correct we could use `SCCACHE_S3_NO_CREDENTIALS=true`. The main point is to populate the bucket from a regularly distributed image like nightly-devel.

I have some doubts about the current `devcontainer` case as we are building the images on demand. It would work if we reference nightly images in `devcontainer`

I don't know if the env changes are going to invalidate the cache hit. In other projects the past experience with bazel, it was very fragile without exactly the same...

> Theoretically, after building nightly, we could upload the build cache in an one-way direction to S3, but it does sound complicated. As we compile nightlies from scratch it could...