Kyeongpil Kang
Kyeongpil Kang
I got same problem...
@HJ-harry Do you use horovod? It happens when I use horovod. I solved the problem by changing the position of warpping. ex) base_optimizer = Adam() base_optimizer = hvd.DistributedOptimizer(base_optimizer, named_parameters=...) hvd.broadcast_parameters(model.state_dict(),...
I got a same error when saving the Lookahead optimizer. Lookahead optimizer should have the state_dict function.
@xbelonogov Thank you for your kind response. I'll be looking forward this feature to be added. Because many language models such as BERT needs special tokens such as cls, sep,...
I have the same issue. #323 Is there any solution to solve this problem? @TimDettmers @prajdabre
There is another issue. When I applied FSDP cpu offload with Adam8bit, I got the following error: ``` Expected a cuda device, but got: cpu Traceback (most recent call last):...
I use accelerate with FSDP. The following is my accelerate config: ``` { "compute_environment": "LOCAL_MACHINE", "deepspeed_config": {}, "distributed_type": "FSDP", "downcast_bf16": "no", "dynamo_config": {}, "fsdp_config": { "fsdp_auto_wrap_policy": "TRANSFORMER_BASED_WRAP", "fsdp_backward_prefetch_policy": "BACKWARD_PRE", "fsdp_offload_params":...
@krikit 혹시 언제쯤 완료될지 알 수 있을까요? 0.5 버전을 빨리 쓰고 싶어서요 ㅎㅎ