News

🚀 Excited to share our latest work at hashtag#AAAI2025! 🚀

27 Feb 2025

How can we make Reinforcement Learning from Human Feedback (RLHF) more query-efficient? Introducing DUO: Diverse, Uncertain, On-policy query generation and selection.

Defining reward functions for RL is hard. RLHF helps by learning rewards from human preferences, but querying humans is costly.
Many RLHF methods focus on improving exploration, data augmentation, or training objectives. But what if we improved how we ask queries?

DUO's key idea is to ask:
🎯 On-policy queries: Prioritize queries that are relevant for the current policy.
❓ Uncertain queries: Use epistemic uncertainty to select the most informative ones.
🌈 Diverse queries: Avoid redundancy by clustering and filtering queries.

📊 DUO outperforms many baselines across a variety of locomotion and robotic manipulation tasks, while requiring fewer human queries. Check out our results!

Want to learn more? Catch our poster presentation at #AAAI2025:
📅 Thursday, Feb 27
🕛 12:30 PM – 2:30 PM
📍 Paper ID: 16523

🎤 Presented by Xuening Feng, coauthored by Timo Kaufmann, Zhaohui Jiang, Puchen Xu, Eyke Hüllermeier, Paul Weng, Yifei Zhu. Drop by to chat or reach out on social media!
📄 https://lnkd.in/emwSidVz (temp link)

A figure illustration the query generation and selection pipeline
© KIML