📝 Publications

🤖 Language Agents, RL for LLM Reasoning

ICLR 2026
sym

Diversity-Incentivized Exploration for Versatile Reasoning
Zican Hu*, Shilin Zhang*, Yafu Li, Jianhao Yan, Xuyang Hu, Leyang Cui, Xiaoye Qu, Chunlin Chen, Yu Cheng, Zhi Wang

Code, HuggingFace

We propose DIVER, which reveals a strong positive correlation between global diversity and reasoning capacity, and introduces global diversity incentives as an intrinsic reward to promote deep exploration in a semantically structured space.

NeurIPS 2025
sym

Learning to Reason under Off-Policy Guidance
Jianhao Yan, Yafu Li, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang

Code, HuggingFace

We introduce LUFFY, a framework augmenting zero-RL with off-policy reasoning traces. LUFFY balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training.

ICML 2025
sym

Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng

Code

We propose GLIDER, an innovative framework that introduces a parameter-efficient and generally applicable hierarchy to train competent LLM policies. GLIDER grounds LLMs for complex interactive planning tasks via offline hierarchical reinforcement learning.

NeurIPS 2025
sym

Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
Shilin Zhang*, Zican Hu*, Wenhao Wu*, Xinyi Xie, Jianxiang Tang, Chunlin Chen, Daoyi Dong, Yu Cheng, Zhenhong Sun, Zhi Wang

Code

We propose Text-to-Decision Agent (T2DA), a simple and scalable pre-training framework for learning generalist policies. T2DA aligns language knowledge with environment dynamics of decision tasks.


🎮 In-Context RL, Multi-Agent RL

NeurIPS 2025
sym

Mixture-of-Experts Meets In-Context Reinforcement Learning
Wenhao Wu, Fuhong Liu, Haoru Li, Zican Hu, Daoyi Dong, Chunlin Chen, Zhi Wang

Code

We propose T2MIR (Token- and Task-wise MoE for In-context RL), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models.

ICLR 2024
sym

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning
Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, Zhi Wang

Code

We learn compact role representations that capture complex agent behaviors in multi-agent systems. Our method promotes heterogeneity, knowledge transfer, and skillful coordination among agents.