yu_wang
Results
2
repositories owned by
yu_wang
Logic-RL-Lite
49
Stars
0
Forks
49
Watchers
Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠Accuracy", and "Language Mixing in Instruct Models".
DeepEnlighten
38
Stars
0
Forks
38
Watchers
Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.