yu_wang

Results 2 repositories owned by yu_wang

Logic-RL-Lite

49
Stars
0
Forks
49
Watchers

Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".

DeepEnlighten

38
Stars
0
Forks
38
Watchers

Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.