FlagAI icon indicating copy to clipboard operation
FlagAI copied to clipboard

[Question]: `hostfile` configuration in aquila

Open csyourui opened this issue 1 year ago • 2 comments

Description

As described in here and here, the hostfile should accept aliases for ssh. However, in the following script: https://github.com/FlagAI-Open/FlagAI/blob/fa9b5a3498a71fd154d6224207b06d01f2b4864b/examples/Aquila/bmtrain_mgpu.sh#L28-L29

The slots value is obtained by comparing the ifconfig results and the hostfile, which only supports IP addresses.

Alternatives

use aliases in hostfile will return:

  warnings.warn(
usage: launch.py [-h] [--nnodes NNODES] [--nproc-per-node NPROC_PER_NODE]
                 [--rdzv-backend RDZV_BACKEND] [--rdzv-endpoint RDZV_ENDPOINT]
                 [--rdzv-id RDZV_ID] [--rdzv-conf RDZV_CONF] [--standalone]
                 [--max-restarts MAX_RESTARTS]
                 [--monitor-interval MONITOR_INTERVAL]
                 [--start-method {spawn,fork,forkserver}] [--role ROLE] [-m]
                 [--no-python] [--run-path] [--log-dir LOG_DIR] [-r REDIRECTS]
                 [-t TEE] [--node-rank NODE_RANK] [--master-addr MASTER_ADDR]
                 [--master-port MASTER_PORT] [--local-addr LOCAL_ADDR]
                 [--use-env]
                 training_script ...
launch.py: error: argument --nproc-per-node/--nproc_per_node: expected one argument

csyourui avatar Jun 12 '23 09:06 csyourui

Thank you for your feedback, we will fix it soon

BAAI-OpenPlatform avatar Jun 13 '23 01:06 BAAI-OpenPlatform

We will adjust the Readme, it currently only supports IP address

Anhforth avatar Jun 15 '23 06:06 Anhforth

先关闭,如有问题重新打开issue,谢谢

ftgreat avatar Jun 22 '23 11:06 ftgreat