marker icon indicating copy to clipboard operation
marker copied to clipboard

allow to specify which gpu to use

Open knysfh opened this issue 8 months ago • 0 comments

Converting multiple files on multiple GPUs does not allow you to specify the GPU number. (Sometimes I can't get a continuous GPU card.)

I would recommend adding a CUDA_VISIBLE_DEVICES parameter to chunk_convert.sh to allow for specifying the GPU number.

New chunk_convert.sh:

#!/bin/bash

trap 'pkill -P $$' SIGINT

# Check if NUM_DEVICES is set
if [[ -z "$NUM_DEVICES" ]]; then
    echo "Please set the NUM_DEVICES environment variable."
    exit 1
fi

if [[ -z "$NUM_WORKERS" ]]; then
    echo "Please set the NUM_WORKERS environment variable."
    exit 1
fi

if [[ -z "$CUDA_VISIBLE_DEVICES" ]]; then
    CUDA_VISIBLE_DEVICES=$(seq -s ',' 0 $((NUM_DEVICES - 1)))
fi

# Get input folder and output folder from args
if [[ -z "$1" ]]; then
    echo "Please provide an input folder."
    exit 1
fi

if [[ -z "$2" ]]; then
    echo "Please provide an output folder."
    exit 1
fi

INPUT_FOLDER=$1
OUTPUT_FOLDER=$2

IFS=',' read -r -a DEVICES <<< "$CUDA_VISIBLE_DEVICES"
# Loop from 0 to NUM_DEVICES and run the Python script in parallel
for (( i=0; i<$NUM_DEVICES; i++ )); do
    DEVICE_NUM=$i
    export DEVICE_NUM
    export NUM_DEVICES
    export NUM_WORKERS
    echo "Running convert.py on GPU ${DEVICES[$i]}"
    cmd="CUDA_VISIBLE_DEVICES=${DEVICES[$i]} marker $INPUT_FOLDER $OUTPUT_FOLDER --num_chunks $NUM_DEVICES --chunk_idx $DEVICE_NUM --workers $NUM_WORKERS"
    [[ -n "$METADATA_FILE" ]] && cmd="$cmd --metadata_file $METADATA_FILE"
    [[ -n "$MIN_LENGTH" ]] && cmd="$cmd --min_length $MIN_LENGTH"
    eval $cmd &

    sleep 5
done

# Wait for all background processes to finish
wait

Start example:

CUDA_VISIBLE_DEVICES=0,3,5 METADATA_FILE=/xxx/metadata_file.json NUM_DEVICES=3 NUM_WORKERS=1 marker_chunk_convert /xxx/input_folder /opt/xxx/output_folder

knysfh avatar Jun 04 '24 06:06 knysfh