marker
marker copied to clipboard
allow to specify which gpu to use
Converting multiple files on multiple GPUs does not allow you to specify the GPU number. (Sometimes I can't get a continuous GPU card.)
I would recommend adding a CUDA_VISIBLE_DEVICES
parameter to chunk_convert.sh to allow for specifying the GPU number.
New chunk_convert.sh:
#!/bin/bash
trap 'pkill -P $$' SIGINT
# Check if NUM_DEVICES is set
if [[ -z "$NUM_DEVICES" ]]; then
echo "Please set the NUM_DEVICES environment variable."
exit 1
fi
if [[ -z "$NUM_WORKERS" ]]; then
echo "Please set the NUM_WORKERS environment variable."
exit 1
fi
if [[ -z "$CUDA_VISIBLE_DEVICES" ]]; then
CUDA_VISIBLE_DEVICES=$(seq -s ',' 0 $((NUM_DEVICES - 1)))
fi
# Get input folder and output folder from args
if [[ -z "$1" ]]; then
echo "Please provide an input folder."
exit 1
fi
if [[ -z "$2" ]]; then
echo "Please provide an output folder."
exit 1
fi
INPUT_FOLDER=$1
OUTPUT_FOLDER=$2
IFS=',' read -r -a DEVICES <<< "$CUDA_VISIBLE_DEVICES"
# Loop from 0 to NUM_DEVICES and run the Python script in parallel
for (( i=0; i<$NUM_DEVICES; i++ )); do
DEVICE_NUM=$i
export DEVICE_NUM
export NUM_DEVICES
export NUM_WORKERS
echo "Running convert.py on GPU ${DEVICES[$i]}"
cmd="CUDA_VISIBLE_DEVICES=${DEVICES[$i]} marker $INPUT_FOLDER $OUTPUT_FOLDER --num_chunks $NUM_DEVICES --chunk_idx $DEVICE_NUM --workers $NUM_WORKERS"
[[ -n "$METADATA_FILE" ]] && cmd="$cmd --metadata_file $METADATA_FILE"
[[ -n "$MIN_LENGTH" ]] && cmd="$cmd --min_length $MIN_LENGTH"
eval $cmd &
sleep 5
done
# Wait for all background processes to finish
wait
Start example:
CUDA_VISIBLE_DEVICES=0,3,5 METADATA_FILE=/xxx/metadata_file.json NUM_DEVICES=3 NUM_WORKERS=1 marker_chunk_convert /xxx/input_folder /opt/xxx/output_folder