Hello! I am Zijun Gao. I received my B.S. in Mathematics and Computer Science from the University of Illinois Urbana-Champaign in December 2025. My work sits at the intersection of LLM reasoning, Reinforcement Learning (RL), and AI agents.
I have conducted research at Northwestern University's MLL Lab, advised by Prof. Manling Li on agentic RL, and at Arizona State University's ARC Lab under the supervision of Prof. Ben Zhou on improving mathematical reasoning in large language models.
News
Research Interests
My research studies how reinforcement learning can improve the reasoning capabilities of large language models, with an emphasis on long-horizon decision making, interaction with real-world environments, scalable environment generation, and generalization to unseen tasks.
My current interests include:
- Agentic RL: Reinforcement learning for language and multimodal agents, with a focus on tool use, long-context management, credit assignment, and multi-turn interaction.
- Reasoning RL: Reinforcement learning for reasoning-intensive tasks and structured decision-making processes.
- Environments & Sandboxes: Building scalable environments, benchmarks, and sandboxes for agent training and evaluation.
Academic Research
Northwestern University – MLL Lab May 2025 – Jan 2026
Multi-Agent Collaborative Training: Developed the MAGEN framework, a multi-turn multi-agent reinforcement learning pipeline for collaborative training of LLMs and VLMs.
Outcome: Co-First Author, Project in Progress. Supervised by Prof. Manling Li.
Arizona State University – ARC Lab Feb 2025 – Dec 2025
Concept-Oriented Reinforcement Learning (CORE): Proposed the CORE framework to bridge concept definitions and mathematical reasoning through reinforcement learning, and achieved consistent improvements on both in-domain and out-of-domain mathematical benchmarks.
Outcome: First Author, ICLR 2026. Supervised by Prof. Ben Zhou. [Paper] [Code]
University of California, San Diego Jul 2024 – Jan 2025
Photorealistic World Simulator (SimWorld): Built photorealistic 3D environments for multi-agent interaction. Developed automated asset pipelines using UnrealCV and Blender, and contributed to large-scale dataset generation.
Outcome: CVPR 2025 Demo Track. Supervised by Prof. Zhiting Hu and Prof. Lianhui Qin.
Industry Experience
Meituan – LongCat Team May 2026 – Present
Baidu – ERNIE Team Apr 2026 – May 2026
Education
University of Illinois Urbana-Champaign (UIUC)
Jan 2024 – Dec 2025
B.S. in Mathematics and Computer Science, Highest Distinction
Beijing Jiaotong University (BJTU)
Aug 2021 – Dec 2023
B.E. in Computer Science and Technology, Top 5% (Transferred)