Maozhou Ge

Results 13 comments of Maozhou Ge

@seujung I tried main.py, it failed when doing inference, error message is: "generation() missing 2 required positional arguments: 'path' and 'gen_size'". > $ python main.py --epochs=1 Namespace(batch_size=64, dilation_depth=10, epochs=1, generation=True,...

@meatybobby Thanks for the confirmation, any plan to update docs? BTW, seems the config of A100 is for 80GB, not 40GB, can you help to confirm too?

@merrymercy jax had switched to use OpenXLA repo instead of TensorFlow: https://github.com/google/jax/commit/172a831219aa7d3524c0c8b5779dc29597a05810

@merrymercy Thanks, I noticed that `auto_sharding` had been upstreamed to OpenXLA: https://github.com/openxla/xla/tree/main/xla/hlo/experimental/auto_sharding Will the new code base reuse this part?

@jiaodong Thanks for your inputs. I also noticed that inter-op only parallelism contributes more than intra-op only for the final performance based your A100 cluster currently. But ”the latest NVLink...

@jiaodong Thanks for your rely. I suppose we may get better parallelism stragegy with global search space than two level sub search spaces, but I agree that the two level...

> @GHGmc2 Feel free to design a new algorithm that can search over the global space (for the new H100 cluster)! I wish I could someday.. I believe we do...

> Thanks for the report. We recently reorganised the location of the python package (in #1526), but didn't update the documentation. > > Could you confirm that: > > ```...

> @GHGmc2 To help others diagnose, can you run the script `print_env.sh` from this repo and paste the results here? Thanks! Attached as below: [print_env.log](https://github.com/rapidsai/rmm/files/15165250/print_env.log) Besides, I found more link...

> RMM has no concept of distributed memory parallelism built in, nor does it need to. > > What you need to arrange is that the different ranks in your...