CAI Logo

Multimodal Learning for Head Motion Prediction in Immersive Virtual Reality


Description: Immersive virtual reality (VR) can provide users with higher sense of presence than traditional 2D displays and has become an important 3D user interface in recent years. A user's head motion is of great significance in VR as it is a typical way for users to interact with the virtual world. Recently, head motion prediction has become a popular research topic as it is significant for a variety of applications including pre-rendering of huge 3D models and transmission of 360-degree VR videos. However, existing methods typically focused on single modality (VR content or past motion) and failed to obtain good performance for long-term prediction (> 1s). Therefore, it would be very interesting and meaningful to examine the effectiveness of multiple modalities and predict long-term head motion using multimodal features.

Goal: Explore the effectiveness of different modalities on the task of head motion prediction in VR. Develop deep learning methods to forecast head motion from multimodal features.

Supervisor: Zhiming Hu

Distribution: 60% Implementation, 20% Literature, 20% Analysis

Requirements: Good knowledge of deep learning, strong programming skills in Python and PyTorch. Preferable: knowledge of multimodal learning.

Literature: Jaegle, A., et al. 2022. Perceiver io: A general architecture for structured inputs & outputs. International Conference on Learning Representations (ICLR).

Hou, X., et al. 2019. Head and body motion prediction to enable mobile VR experiences with low latency. IEEE Global Communications Conference (GLOBECOM).

Wu, C., et al. 2020. A spherical convolution approach for learning long term viewport prediction in 360 immersive video. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).

Feng, X. 2020. LiveDeep: Online viewport prediction for live virtual reality streaming using lifelong deep learning. IEEE Conference on Virtual Reality + 3D User Interfaces (IEEE VR).

Ban, Y., et al. 2018. Cub360: Exploiting cross-users behaviors for viewport prediction in 360 video adaptive streaming. IEEE International Conference on Multimedia and Expo (ICME).