Content based video indexing and retrieval (CBVIR) is a lively area of research which focuses on automating the indexing, retrieval and management of videos. This area has a wide spectrum of promising applications where assessing the impact of audiovisual productions emerges as a particularly interesting and motivating one. In this paper we present a computational model capable to predict the impact (i.e. positive or negative) upon viewers of car advertisements videos by using a set of visual saliency descriptors. Visual saliency provides information about parts of the image perceived as most important, which are instinctively targeted by humans when looking at a picture or watching a video. For this reason we propose to exploit visual information, introducing it as a new feature which reflects high-level semantics objectively, to improve the video impact categorization results. The suggested salience descriptors are inspired by the mechanisms that underlie the attentional abilities of the human visual system and organized into seven distinct families according to different measurements over the identified salient areas in the video frames, namely population, size, location, geometry, orientation, movement and photographic composition. Proposed approach starts by computing saliency maps for all the video frames, where two different visual saliency detection frameworks have been considered and evaluated: the popular graph based visual saliency (GBVS) algorithm, and a state-of-the-art DNN-based approach. Then, frame-level salience descriptors are extracted from these maps. Next, pooled statistics are used to collapse the obtained frame-level values into video-level descriptors. Finally, a Logistic regression classifier is built upon the subset of video-level features resulting from a feature selection stage. Experimental validation, conducted on a publicly available corpus of 138 commercials collected from YouTube, shows that the proposed salience descriptors are indicative of the impact upon viewers and achieve a similar performance when compared to a method purely based on aesthetics. Besides, the combined approach, exploiting both saliency and aesthetics together, ultimately results in better performance than what can be achieved individually. In addition, the seven families of salience descriptors defined are also compared in terms of classification performance. Finally, a similar study is also performed targeting the distinct pooling techniques used in the video-level feature computation.
Recommended citation: Fernández-Martínez, F., Hernández-García, A., Fernández-Torres, M.A. et al. Exploiting visual saliency for assessing the impact of car commercials upon viewers. Multimed Tools Appl 77, 18903–18933 (2018). https://doi.org/10.1007/s11042-017-5339-9