[BFCL] Fix Hanging Inference for OSS Models on GPU Platforms

Open HuanzhiMao opened this issue 1 year ago • 0 comments

This PR addresses issues encountered when running locally-hosted models on GPU-renting platforms (e.g., Lambda Cloud). Specifically, there were problems with output display from vllm due to the use of subprocesses for launching these models. Additionally, some multi-turn functions (such as xargs) rely on subprocesses, which caused inference on certain test entries (such as multi_turn_36 ) to hang indefinitely, resulting in an undesirable pipeline halt.

To fix this, the terminal logging logic has been updated to utilize a separate thread for reading from the subprocess pipe and printing to the terminal.

Alos, for readability, the _format_prompt function has been moved to the Prompting methods section; this would not change the leaderboard score.

Sep 27 '24 22:09 HuanzhiMao