Publications

Learning Granger Causal Feature Representations

Published in ICML 2021 Workshop Tackling Climate Change with Machine Learning, 2021

Tackling climate change needs to understand the complex phenomena occurring on the Planet. Discovering teleconnection patterns is an essential part of the endeavor. Events like El Niño Southern Oscillation (ENSO) impact essential climate variables at large distances, and influence the underlying Earth system dynamics. However, their automatic identification from the wealth of observational data is still unresolved. Nonlinearities, nonstationarities and the (ab)use of correlation analyses hamper the discovery of true causal patterns. We here introduce a deep learning methodology that extracts nonlinear latent functions from spatio-temporal Earth data and that are Granger causal with the index altogether. We illustrate its use to study the impact of ENSO on vegetation, which allows for a more rigorous study of impacts on ecosystems globally.

Recommended citation: Varando, G., Fernández-Torres, M. A., & Camps-Valls, G. Learning Granger Causal Feature Representations. https://www.climatechange.ai/papers/icml2021/34

Interpretable Global-Local Dynamics for the Prediction of Eye Fixations in Autonomous Driving Scenarios

Published in IEEE Access, 2020

Human eye movements while driving reveal that visual attention largely depends on the context in which it occurs. Furthermore, an autonomous vehicle which performs this function would be more reliable if its outputs were understandable. Capsule Networks have been presented as a great opportunity to explore new horizons in the Computer Vision field, due to their capability to structure and relate latent information. In this article, we present a hierarchical approach for the prediction of eye fixations in autonomous driving scenarios. Context-driven visual attention can be modeled by considering different conditions which, in turn, are represented as combinations of several spatio-temporal features. With the aim of learning these conditions, we have built an encoder-decoder network which merges visual features’ information using a global-local definition of capsules. Two types of capsules are distinguished: representational capsules for features and discriminative capsules for conditions. The latter and the use of eye fixations recorded with wearable eye tracking glasses allow the model to learn both to predict contextual conditions and to estimate visual attention, by means of a multi-task loss function. Experiments show how our approach is able to express either frame-level (global) or pixel-wise (local) relationships between features and contextual conditions, allowing for interpretability while maintaining or improving the performance of black-box related systems in the literature. Indeed, our proposal offers an improvement of 29% in terms of information gain with respect to the best performance reported in the literature.

Recommended citation: J. Martínez-Cebrián, M. -Á. Fernández-Torres and F. Díaz-De-María, "Interpretable Global-Local Dynamics for the Prediction of Eye Fixations in Autonomous Driving Scenarios," in IEEE Access, vol. 8, pp. 217068-217085, 2020, doi: 10.1109/ACCESS.2020.3041606. https://doi.org/10.1109/ACCESS.2020.3041606

Probabilistic Topic Model for Context-Driven Visual Attention Understanding

Published in IEEE Transactions on Circuits and Systems for Video Technology, 2019

Modern computer vision techniques have to deal with vast amounts of visual data, which implies a computational effort that has often to be accomplished in broad and challenging scenarios. The interest in efficiently solving these image and video applications has led researchers to develop methods to expertly drive the corresponding processing to conspicuous regions that either depend on the context or are based on specific requirements. In this paper, we propose a general hierarchical probabilistic framework, independent of the application scenario, and founded on the most outstanding psychological studies about attention and eye movements which support that guidance is not based directly on the information provided by early visual processes but on a contextual representation arose from them. The approach defines the task of context-driven visual attention as a mixture of latent sub-tasks, which are in turn modeled as a combination of specific distributions associated to low-, mid-and high-level spatio-temporal features. Learning from fixations gathered from human observers, we incorporate an intermediate level between feature extraction and visual attention estimation that enables to obtain comprehensively guiding representations. Experiments show how our proposal successfully learns particularly adapted hierarchical explanations of visual attention in diverse video genres, outperforming several leading models in the literature.

Recommended citation: M. Fernández-Torres, I. González-Díaz and F. Díaz-de-María, "Probabilistic Topic Model for Context-Driven Visual Attention Understanding," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1653-1667, June 2020, doi: 10.1109/TCSVT.2019.2909427. https://doi.org/10.1109/TCSVT.2019.2909427

Exploiting visual saliency for assessing the impact of car commercials upon viewers

Published in Multimedia Tools and Applications, 2017

Content based video indexing and retrieval (CBVIR) is a lively area of research which focuses on automating the indexing, retrieval and management of videos. This area has a wide spectrum of promising applications where assessing the impact of audiovisual productions emerges as a particularly interesting and motivating one. In this paper we present a computational model capable to predict the impact (i.e. positive or negative) upon viewers of car advertisements videos by using a set of visual saliency descriptors. Visual saliency provides information about parts of the image perceived as most important, which are instinctively targeted by humans when looking at a picture or watching a video. For this reason we propose to exploit visual information, introducing it as a new feature which reflects high-level semantics objectively, to improve the video impact categorization results. The suggested salience descriptors are inspired by the mechanisms that underlie the attentional abilities of the human visual system and organized into seven distinct families according to different measurements over the identified salient areas in the video frames, namely population, size, location, geometry, orientation, movement and photographic composition. Proposed approach starts by computing saliency maps for all the video frames, where two different visual saliency detection frameworks have been considered and evaluated: the popular graph based visual saliency (GBVS) algorithm, and a state-of-the-art DNN-based approach. Then, frame-level salience descriptors are extracted from these maps. Next, pooled statistics are used to collapse the obtained frame-level values into video-level descriptors. Finally, a Logistic regression classifier is built upon the subset of video-level features resulting from a feature selection stage. Experimental validation, conducted on a publicly available corpus of 138 commercials collected from YouTube, shows that the proposed salience descriptors are indicative of the impact upon viewers and achieve a similar performance when compared to a method purely based on aesthetics. Besides, the combined approach, exploiting both saliency and aesthetics together, ultimately results in better performance than what can be achieved individually. In addition, the seven families of salience descriptors defined are also compared in terms of classification performance. Finally, a similar study is also performed targeting the distinct pooling techniques used in the video-level feature computation.

Recommended citation: Fernández-Martínez, F., Hernández-García, A., Fernández-Torres, M.A. et al. Exploiting visual saliency for assessing the impact of car commercials upon viewers. Multimed Tools Appl 77, 18903–18933 (2018). https://doi.org/10.1007/s11042-017-5339-9 https://doi.org/10.1007/s11042-017-5339-9

Enriched dermoscopic-structure-based cad system for melanoma diagnosis

Published in Multimedia Tools and Applications, 2017

Computer-Aided Diagnosis (CAD) systems for melanoma detection have received a lot of attention during the last decades because of the utmost importance of detecting this type of skin cancer in its early stages. However, despite of the many research efforts devoted to this matter, these systems are not used yet in everyday clinical practice. Very likely, this is due to two main reasons: 1) the accuracy of the systems is not high enough; and 2) they simply provide a parallel diagnosis that actually does not help to the doctors (as long as there is no way to interpret it). In this paper, we propose a novel approach that aims to provide the doctor with an enriched diagnosis. Specifically, we rely on a dermoscopic-structure-based soft segmentation to design a set of structure-specific classifiers. Each individual structure-specific classifier is trained to distinguish benign lesions from melanomas just paying attention to one type of dermoscopic structure. Then, the outputs of the individual classifiers are combined by a means of the Bayesian method that, besides the final diagnosis, provide the doctor with additional valuable information, such as the opinions of the individual structure-specific experts and the uncertainty of the diagnosis. The results in terms of the features selected for the structure-specific classifiers are consistent with the expert insights. Furthermore, regarding the automatic melanoma diagnosis problem, the proposed method has been assessed on two different datasets, and the experimental results revealed that the proposed system clearly outperforms other methods in two datasets and compares well with the official submissions of the ISBI 2016 challenge on melanoma detection. Moreover, the system performance is equivalent to that of a well-known dermoscopy expert and its combination with the human diagnosis surpasses the human performance.

Recommended citation: López-Labraca, J., Fernández-Torres, M.Á., González-Díaz, I. et al. Enriched dermoscopic-structure-based cad system for melanoma diagnosis. Multimed Tools Appl 77, 12171–12202 (2018). https://doi.org/10.1007/s11042-017-4879-3 https://doi.org/10.1007/s11042-017-4879-3

A probabilistic topic approach for context-aware visual attention modeling

Published in 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), 2016

The modeling of visual attention has gained much interest during the last few years since it allows to efficiently drive complex visual processes to particular areas of images or video frames. Although the literature concerning bottom-up saliency models is vast, we still lack of generic approaches modeling top-down task and context-driven visual attention. Indeed, many top-down models simply modulate the weights associated to low-level descriptors to learn more accurate representations of visual attention than those ones of the generic fusion schemes in bottom-up techniques. In this paper we propose a hierarchical generic probabilistic framework that decomposes the complex process of context-driven visual attention into a mixture of latent subtasks, each of them being in turn modeled as a combination of specific distributions of low-level descriptors. The inclusion of this intermediate level bridges the gap between low-level features and visual attention and enables more comprehensive representations of the later. Our experiments on a dataset in which videos are organized by genre demonstrate that, by learning specific distributions for each video category, we can notably enhance the system performance.

Recommended citation: M. Fernández-Torres, I. González-Díaz and F. Díaz-de-María, "A probabilistic topic approach for context-aware visual attention modeling," 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), Bucharest, 2016, pp. 1-6, doi: 10.1109/CBMI.2016.7500272. https://doi.org/10.1109/CBMI.2016.7500272

A Bayesian model for brain tumor classification using clinical-based features

Published in 2014 IEEE International Conference on Image Processing (ICIP), 2014

This paper tackles the problem of automatic brain tumor classification from Magnetic Resonance Imaging (MRI) where, traditionally, general-purpose texture and shape features extracted from the Region of Interest (tumor) have become the usual parameterization of the problem. Two main contributions are made in this context. First, a novel set of clinical-based features that intend to model intuitions and expert knowledge of physicians is suggested. Second, a system is proposed that is able to fuse multiple individual scores (based on a particular MRI sequence and a pathological indicator present in that sequence) by using a Bayesian model that produces a global system decision. This approximation provides a quite flexible solution able to handle missing data, which becomes a very likely case in a realistic scenario where the number clinical tests varies from one patient to another. Furthermore, the Bayesian model provides extra information concerning the uncertainty of the final decision. Our experimental results prove that the use of clinical-based feature leads to a significant increment of performance in terms of Area Under the Curve (AUC) when compared to a state-of-the art reference. Furthermore, the proposed Bayesian fusion model clearly outperforms other fusion schemes, especially when few diagnostic tests are available.

Recommended citation: T. Martínez-Cortés et al., "A Bayesian model for brain tumor classification using clinical-based features," 2014 IEEE International Conference on Image Processing (ICIP), Paris, 2014, pp. 2779-2783, doi: 10.1109/ICIP.2014.7025562. https://doi.org/10.1109/ICIP.2014.7025562