I am a senior majoring in Mathematics and Computer Science at the University of Illinois Urbana-Champaign. My research interests are in large language models and reinforcement learning.
Large language models (LLMs) often solve challenging math exercises yet fail to apply the concept right when the problem requires genuine understanding. Popular Reinforcement Learning with Verifiable Rewards (RLVR) pipelines reinforce final answers but provide little fine-grained conceptual signal, so models improve at pattern reuse rather than conceptual applications. We introduce CORE (Concept-Oriented REinforcement), an RL training framework that turns explicit concepts into a controllable supervision signal. Starting from a high-quality, low contamination textbook resource that links verifiable exercises to concise concept descriptions, we run a sanity probe showing LLMs can restate definitions but fail concept-linked quizzes, quantifying the conceptual reasoning gap. CORE then (i) synthesizes additional concept-aligned quizzes, (ii) injects concept snippets into rollouts, and (iii) reinforces the conceptual reasoning by replacing with correctly concept-applied trajectories or constraining drift with a lightweight divergence penalty; the procedure is compatible with standard policy-gradient methods. On two 7B models, CORE yields consistent gains over the vanilla baseline and SFT training across in-domain concept–exercise suites and diverse out-of-domain math benchmarks. CORE demonstrates that concept-injected, outcome regularized rollouts supply the missing fine-grained supervision needed to bridge question-solving competence and true conceptual reasoning without committing to a particular RL algorithm or certain process-based verifiers.
SimWorld: A World Simulator for Scaling Photorealistic Multi-Agent Interactions
Yan Zhuang, Jiawei Ren, Xiaokang Ye, Xuhong He, Zijun Gao, Ryan Wu, Mrinaal Dogra, Cassie Zhang, Kai Kim, Bertt Wolfinger, Ziqiao Ma, Tianmin Shu, Zhiting Hu, and Lianhui Qin.
In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025), Demo Track.
[webpage]
Education
University of Illinois Urbana-Champaign
B.S. in Mathematics and Computer Science
2024.1 - 2025.12
Beijing Jiaotong University
B.Eng. in Computer Science and Technology
2021.8 - 2023.12 (Transferred)