JoshonSmith
JoshonSmith
https://artificialanalysis.ai/text-to-image/arena?tab=Leaderboard
**Describe the feature** LitePPO: 1、鲁棒的优势归一化 (Robust Advantage Normalization): 该技术结合了组级别和批次级别的统计量。具体来说,优势的均值在组(group)级别计算,而标准差在整个批次(batch)级别计算。 2、token级损失聚合 (Token-Level Loss Aggregation): 与DAPO中采用的技术类似,LitePPO主张在计算总损失时,对批次内所有token的损失进行求和,然后除以token总数,而不是在序列之间取平均。 **Paste any useful information** https://arxiv.org/abs/2508.08221
### Required prerequisites - [x] I have read the documentation . - [x] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/align-anything/issues) and [Discussions](https://github.com/PKU-Alignment/align-anything/discussions) that this hasn't already been reported. (+1 or comment...