Learning guarantee of reward modeling using deep neural networks
Published in KDD2026, 2025
Architecture‑dependent, non-asymptotic regret bounds that can be sharpened under a simple margin condition (when human preferences are more “clear” than you think).

Recommended citation: Luo, Y., Ge, Y., Han, R. & Shen, G. (2025). Learning Guarantee of Reward Modeling Using Deep Neural Networks. arXiv preprint arXiv:2505.06601.

