GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping
Published in , 2025
This paper introduces a gradient-based early stopping technique that significantly accelerates transformer training while maintaining model performance.
Recommended citation: Wen, Q., Zeng, X., Zhou, Z., Liu, S., Hosseinzadeh, M., & Rawassizadeh, R. (2025). "GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping." Submitted to ICLR 2026.
Download Paper
