Maozhou Ge comments

Results 13 comments of


                                            Maozhou Ge

update wavenet codes

@seujung I tried main.py, it failed when doing inference, error message is: "generation() missing 2 required positional arguments: 'path' and 'gen_size'". > $ python main.py --epochs=1 Namespace(batch_size=64, dilation_depth=10, epochs=1, generation=True,...

[BERT/TF2] The hyperparameters used for BERT Large pretraining from the cmd in the doc is not aligned with the config script

@meatybobby Thanks for the confirmation, any plan to update docs? BTW, seems the config of A100 is for 80GB, not 40GB, can you help to confirm too?

Rebase to tensorflow mainline

@merrymercy jax had switched to use OpenXLA repo instead of TensorFlow: https://github.com/google/jax/commit/172a831219aa7d3524c0c8b5779dc29597a05810

Rebase to tensorflow mainline

@merrymercy Thanks, I noticed that `auto_sharding` had been upstreamed to OpenXLA: https://github.com/openxla/xla/tree/main/xla/hlo/experimental/auto_sharding Will the new code base reuse this part?

[Discussion]On the pillar of alpa's two-level hierarchical space of parallelism

@jiaodong Thanks for your inputs. I also noticed that inter-op only parallelism contributes more than intra-op only for the final performance based your A100 cluster currently. But ”the latest NVLink...

[Discussion]On the pillar of alpa's two-level hierarchical space of parallelism

@jiaodong Thanks for your rely. I suppose we may get better parallelism stragegy with global search space than two level sub search spaces, but I agree that the two level...

[Discussion]On the pillar of alpa's two-level hierarchical space of parallelism

> @GHGmc2 Feel free to design a new algorithm that can search over the global space (for the new H100 cluster)! I wish I could someday.. I believe we do...

[BUG] Cannot build from source with `pip install` way

> Thanks for the report. We recently reorganised the location of the python package (in #1526), but didn't update the documentation. > > Could you confirm that: > > ```...

[BUG] Cannot build from source with `pip install` way

> @GHGmc2 To help others diagnose, can you run the script `print_env.sh` from this repo and paste the results here? Thanks! Attached as below: [print_env.log](https://github.com/rapidsai/rmm/files/15165250/print_env.log) Besides, I found more link...

[DOC] Example of RMM Python API for DDP distributed training?

> RMM has no concept of distributed memory parallelism built in, nor does it need to. > > What you need to arrange is that the different ranks in your...