CAI Logo

Inferring Other Agents' Goal in Collaborative Environments Using Graphs


Description: Watch-And-Help (WAH) is a challenge for testing social intelligence in agents. In WAH, an AI system has to help a human-like agent perform a complex household task efficiently. The environment, VirtualHome-Social, simulates realistic and rich home environments where agents can interact with different objects. The framework consists of two phases. In the first, Watch stage, an AI agent (Bob) watches a human-like agent (Alice) performing a task once and infers Alice’s goal from her actions. In the second, Help stage, Bob helps Alice achieve the same goal in a different environment as quickly as possible. To do so, the AI agent (Bob) has a module to infer Alice’s goal, a high-level policy to predict a sub-goal and a low-level policy to decide which actions to take. To encode the state of the environment, the authors use a Transformer, but given the different types of relations between the objects in the scene, a Graph Neural Network might be better suited as encoder.

Goal: Replace the Transformer in the baseline state encoder with a Graph Neural Network. Run different experiments with different types of layers and discuss the results.

Supervisor: Matteo Bortoletto and Lei Shi

Distribution: 10% literature review, 70% implementation, 20% analysis

Requirements: Good knowledge of deep learning, strong programming skills in Python and PyTorch, good self-management

Literature:

Puig, Xavier et al. 2020. Watch-and-help: A challenge for social perception and human-ai collaboration. arXiv:2010.09890.

Puig, Xavier et al. 2018. Virtualhome: Simulating household activities via programs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Wu, Zonghan et al. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), p.4-24.