Doug J
Doug J
Run this line of code: `python3 scripts/demo.py --prompt "Mountain Rainier in van Gogh's world"` Got the following errors: [20:11:19] model_container.cpp:87: Init AITemplate Runtime with 1 concurrency Traceback (most recent call...
FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
Hi, I am receiving this warning `WARNING:absl:SaveArgs.aggregate is deprecated, please use custom TypeHandler (https://orbax.readthedocs.io/en/latest/custom_handlers.html#typehandler) or contact Orbax team to migrate before May 1st, 2024. If your Pytree has empty ([],...
In the tutorial, you mentioned we should use bf16 for TPU; but does bf16 also work for GPU?
### 🚀 The feature, motivation and pitch I want to benchmark the pre-training speed of llama 405B which means I don't need to download the pre-trained weights. I am wondering,...
### System Info Two issues: 1. It seems the current implementation of the flops counter is counting the total flops of all the training step which causes the tflops/s/gpu metric...
We tried multinode training with saving checkpoint. If dcn-dp = 2, dcn-fsdp=-1, the nodes in the first set of dcn-fsdp will always timeout. In the log below, we tried 16N...
Hi, It seems the package cannot calculate TFlops for training a multimodal model like meta-llama/Llama-3.2-11B-Vision-Instruct. Could you please add this functionality? Here is my code: ``` # Transformers Model, such...