FlagAI
FlagAI copied to clipboard
[Question]: `hostfile` configuration in aquila
Description
As described in here and here, the hostfile should accept aliases for ssh. However, in the following script: https://github.com/FlagAI-Open/FlagAI/blob/fa9b5a3498a71fd154d6224207b06d01f2b4864b/examples/Aquila/bmtrain_mgpu.sh#L28-L29
The slots value is obtained by comparing the ifconfig
results and the hostfile
, which only supports IP addresses.
Alternatives
use aliases in hostfile will return:
warnings.warn(
usage: launch.py [-h] [--nnodes NNODES] [--nproc-per-node NPROC_PER_NODE]
[--rdzv-backend RDZV_BACKEND] [--rdzv-endpoint RDZV_ENDPOINT]
[--rdzv-id RDZV_ID] [--rdzv-conf RDZV_CONF] [--standalone]
[--max-restarts MAX_RESTARTS]
[--monitor-interval MONITOR_INTERVAL]
[--start-method {spawn,fork,forkserver}] [--role ROLE] [-m]
[--no-python] [--run-path] [--log-dir LOG_DIR] [-r REDIRECTS]
[-t TEE] [--node-rank NODE_RANK] [--master-addr MASTER_ADDR]
[--master-port MASTER_PORT] [--local-addr LOCAL_ADDR]
[--use-env]
training_script ...
launch.py: error: argument --nproc-per-node/--nproc_per_node: expected one argument
Thank you for your feedback, we will fix it soon
We will adjust the Readme, it currently only supports IP address
先关闭,如有问题重新打开issue,谢谢