DeepEnlighten
DeepEnlighten copied to clipboard
Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.