A great world full of visible information is opened to us, and visual attention allows humans either to highlight the most conspicuous areas in a particular context (e.g. an airport, a highway, a hospital, etc.) or to select those that aid to solve a particular task (e.g. video surveillance, driving, a surgery, etc.).
In this talk, we will show how we can train a machine to perform the visual attention task, as well as its advantages when dealing with large amounts of information in complex and crowded scenarios. For that purpose, we will divide the presentation in two parts.
In the first part of the talk, we will briefly introduce how to model some of the attributes (e.g. color, orientation, motion, etc.) and objects that guide attention, both using traditional computer vision techniques and recent Convolutional Neural Networks (CNNs). Then, we will present a model able to learn comprehensible representations of visual attention. Drawing on the first enumerated attributes and the information provided by human eye fixations, these representations attempt either to predict where people look or to understand how visual attention works.
In the second part of the talk, we will mention some of the most outstanding video scenarios where visual attention could be useful to solve a particular application. In these contexts, modeling visual attention would allow to guide the latter processing to spatial regions and time segments of special importance. We will put special emphasis on the anomaly detection task performed by CCTV operators in video surveillance scenarios, which implies watching many hours of footage from large arrays of cameras.