Yang

Results 2 issues of Yang

### Title: Feat: Implement Streaming Asynchronous Sampling for Zero-Waste Generation ### Description This pull request introduces a streaming asynchronous sampling mechanism designed to significantly improve the efficiency of sample generation...

This PR adds support for directly extracting logits from vLLM during training. By enabling this feature, we avoid redundant computation of sequence logits during reinforcement learning (RL) fine-tuning, which previously...