📝 Publications
🤖 Language Agents, RL for LLM Reasoning

Diversity-Incentivized Exploration for Versatile Reasoning
Zican Hu*, Shilin Zhang*, Yafu Li, Jianhao Yan, Xuyang Hu, Leyang Cui, Xiaoye Qu, Chunlin Chen, Yu Cheng, Zhi Wang
We propose DIVER, which reveals a strong positive correlation between global diversity and reasoning capacity, and introduces global diversity incentives as an intrinsic reward to promote deep exploration in a semantically structured space.

Learning to Reason under Off-Policy Guidance
Jianhao Yan, Yafu Li, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang
We introduce LUFFY, a framework augmenting zero-RL with off-policy reasoning traces. LUFFY balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training.

Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng
We propose GLIDER, an innovative framework that introduces a parameter-efficient and generally applicable hierarchy to train competent LLM policies. GLIDER grounds LLMs for complex interactive planning tasks via offline hierarchical reinforcement learning.

Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
Shilin Zhang*, Zican Hu*, Wenhao Wu*, Xinyi Xie, Jianxiang Tang, Chunlin Chen, Daoyi Dong, Yu Cheng, Zhenhong Sun, Zhi Wang
We propose Text-to-Decision Agent (T2DA), a simple and scalable pre-training framework for learning generalist policies. T2DA aligns language knowledge with environment dynamics of decision tasks.
🎮 In-Context RL, Multi-Agent RL

Mixture-of-Experts Meets In-Context Reinforcement Learning
Wenhao Wu, Fuhong Liu, Haoru Li, Zican Hu, Daoyi Dong, Chunlin Chen, Zhi Wang
We propose T2MIR (Token- and Task-wise MoE for In-context RL), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models.

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning
Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, Zhi Wang
We learn compact role representations that capture complex agent behaviors in multi-agent systems. Our method promotes heterogeneity, knowledge transfer, and skillful coordination among agents.