kungfudante
kungfudante
Running on A100, driver version 551.78 Using the original workflow: data:image/s3,"s3://crabby-images/e8315/e831550e316f590ec3d2eab5df35df7140472d28" alt="image" Got error message as below: ``` got prompt model_type EPS Using pytorch attention in VAE Using pytorch attention in...
# 场景描述 *所有修改都针对snake-ai/main/train_cnn.py中CUDA的部分* 我想同时多开几个进程训练,于是NUM_ENV被我调成了128,但实际测试中显存占用并无明显增加: `NUM_ENV = 32`(图表中的PPO_7) data:image/s3,"s3://crabby-images/93e2b/93e2b746bcfd8fc7c956e8ae312e9989f401d73a" alt="image" `NUM_ENV = 128`(图表中的PPO_9) data:image/s3,"s3://crabby-images/6b539/6b539ce69ee5b65f391093732e3bd481f2af90ac" alt="image" 训练速度(步数除以时间)也没有显著提升(见最后的图表) 我又修改了batch_size这个参数,从原来的512改成了 512*8,显存利用有些许提升,训练速度反而慢了 ``` NUM_ENV = 128 batch_size=512*8 ``` (图表中的PPO_10) data:image/s3,"s3://crabby-images/11b85/11b85550a52747105df4e571b4b14321cbaea05d" alt="image" 训练图表: data:image/s3,"s3://crabby-images/5ea49/5ea4989a30268c4510469c738b65993107c01750" alt="image" # 问题 1. 我对NUM_ENV的理解和配置是否有误?batch_size这个参数是控制什么的? 2....
如何手动部署?
如题,我想部署在其他的云上,请问要如何操作?