Training Person-Specific Gaze Estimators from Interactions with Multiple Devices

Xucong Zhang, Michael Xuelin Huang, Yusuke Sugano, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–12, 2018.

Abstract

Learning-based gaze estimation has significant potential to enable attentive user interfaces and gaze-based interaction on the billions of camera-equipped handheld devices and ambient displays. While training accurate person- and device-independent gaze estimators remains challenging, person-specific training is feasible but requires tedious data collection for each target device. To address these limitations, we present the first method to train person-specific gaze estimators across multiple devices. At the core of our method is a single convolutional neural network with shared feature extraction layers and device-specific branches that we train from face images and corresponding on-screen gaze locations. Detailed evaluations on a new dataset of interactions with five common devices (mobile phone, tablet, laptop, desktop computer, smart TV) and three common applications (mobile game, text editing, media center) demonstrate the significant potential of cross-device training. We further explore training with gaze locations derived from natural interactions, such as mouse or touch input.

Links

doi: 10.1145/3173574.3174198

Paper: zhang18_chi.pdf

BibTeX

@inproceedings{zhang18_chi, title = {Training Person-Specific Gaze Estimators from Interactions with Multiple Devices}, author = {Zhang, Xucong and Huang, Michael Xuelin and Sugano, Yusuke and Bulling, Andreas}, year = {2018}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3173574.3174198}, pages = {1--12} }