Collaborative Artificial Intelligence

MSCOCO-EMMA and FigureQA-EMMA

We present a novel method for deep image saliency prediction that leverages a cognitive model of visual attention as an inductive bias. This is in stark contrast to recent purely data-driven models that have achieved performance improvements mainly by increased model capacity, resulting in high computational costs and the need for large scale, domain specific training data. We demonstrate that by leveraging a cognitive model of visual attention, our method achieves competitive performance to the state-of-the-art across several natural image datasets while only requiring a fraction of the parameters.

Furthermore, we set the new state of the art for saliency prediction on information visualizations, demonstrating the effectiveness of our approach for cross-domain generalization.We further provide large-scale cognitively plausible synthetic gaze data on corresponding images in the full MSCOCO and FigureQA datasets, which we used for pre-training. These results are highly promising and underline the significant potential of bridging between first principle cognitive and data-driven models for computer vision tasks, potentially also beyond saliency prediction, and even visual attention.

The full dataset can be requested by contacting us and filling out a license agreement.

Contact: Prof. Andreas Bulling,

The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:

Improving Neural Saliency Prediction with a Cognitive Model of Human Visual Attention

Ekta Sood, Lei Shi, Matteo Bortoletto, Yao Wang, Philipp Müller, Andreas Bulling

Proc. the 45th Annual Meeting of the Cognitive Science Society (CogSci), pp. 3639–3646, 2023.

Abstract Links BibTeX Project

We present a novel method for saliency prediction that leverages a cognitive model of visual attention as an inductive bias. This approach is in stark contrast to recent purely data-driven saliency models that achieve performance improvements mainly by increased capacity, resulting in high computational costs and the need for large-scale training datasets. We demonstrate that by using a cognitive model, our method achieves competitive performance to the state of the art across several natural image datasets while only requiring a fraction of the parameters. Furthermore, we set the new state of the art for saliency prediction on information visualizations, demonstrating the effectiveness of our approach for cross-domain generalization. We further provide augmented versions of the full MSCOCO dataset with synthetic gaze data using the cognitive model, which we used to pre-train our method. Our results are highly promising and underline the significant potential of bridging between cognitive and data-driven models, potentially also beyond attention.

Paper: sood23_cogsci.pdf

Code: https://git.hcics.simtech.uni-stuttgart.de/public-projects/neural-saliency-prediction-with-a-cognitive-model/

Supplementary Material: sood23_cogsci_sup.pdf

Dataset: https://collaborative-ai.org/research/datasets/MSCOCOEMMAFigureQAEMMA/

@inproceedings{sood23_cogsci, author = {Sood, Ekta and Shi, Lei and Bortoletto, Matteo and Wang, Yao and Müller, Philipp and Bulling, Andreas}, title = {Improving Neural Saliency Prediction with a Cognitive Model of Human Visual Attention}, booktitle = {Proc. the 45th Annual Meeting of the Cognitive Science Society (CogSci)}, year = {2023}, pages = {3639--3646} }

MSCOCO-EMMA and FigureQA-EMMA

Links

Contact Us