Summary


Summary

The volume of data captured worldwide is growing at an exponential rate posing certain challenges regarding their processing and analysis. Data analysis, regression and prediction/forecasting have played a leading role in learning insights and extracting useful information from raw data covering a wide range of applications in several areas such as biomedical, econometrics, content preference, etc. Even though data tend to live in high dimensional spaces, they often exhibit a high degree of redundancy; that is, their useful information can be represented using a number of attributes much lower compared to their original dimensionality. Often, this redundancy can be effectively exploited by treating the data in a transformed domain, in which they can be represented by sparse models; that is, models comprising mostly zeros and only a few nonzero parameters.

The advent of compressed sensing led to novel theoretical as well as algorithmic tools, which can be efficiently employed for sparsity—aware model parameter learning. The majority of these techniques performs batch processing of the full data set, meaning that in order for the processing to begin, the full amount of data need to be collected and stored. On the contrary, SOL aims at developing theory and algorithms for sparsity-aware learning in an online fashion; the data instead of getting stored they are processed sequentially, “on the fly”, as long as they are becoming available.

The objectives of the SOL project are summarized as follows:

  1. To develop sparsity-aware algorithms that belong to the set-theoretic learning framework exhibiting guarantees with respect to computational complexity demands in order to allow real time operation
  2. To incorporate in the online learning framework advanced sparsity structures such as those employed for performance enhancement in the batch learning setting.
  3. To extent the proposed methods for leaning using data from multiple-sensor devices/ topologies exploiting joint sparsity structures. Moreover, distributed processing locally in the place where the sensors lie are supported as well.
  4. To develop a platform for the fair and objective performance evaluation of the proposed techniques against other competitors.
  5. To assess the developed techniques in a real-world application and particularly to implement a wireless electrocardiogram (ECG) monitoring system employing 10 sensors/electrodes (12-Lead ECG). This ECG monitoring system serves as a “proof of concept” for many of the techniques developed in SOL.
The original proposal was focused in the online regression/filtering tasks and all the critical original objectives have been fulfilled. However, during the project time period, a significant interest emerged in the research community for more general cases, where sparsity and advanced structures are employed in data matrix factorization and analysis. Accordingly, additional research lines were adopted in order to keep track with this rapidly expanding research field. In particular, our research efforts were focused on extending the previously gained experience, on online regression tasks, to more general tasks involving robust subspace tracking, online and distributed dictionary learning and dictionary learning-based matrix factorization dedicated to functional Magnetic Resonance Imaging (fMRI) analysis. These new, more general problems, paved the way to look at alternative mathematical tools, which evolves around the concept of randomized projections. To this end, randomized projections were adopted for dimensionality reduction and were adapted to our previously developed algorithms in order to reduce computational time for the fMRI data analysis. Finally, in the same spirit, a novel robust linear regression method based on randomized projections, and which is suitable for big data applications, was also developed.

The outcome of SOL project is a set of algorithms and techniques for modelling, analysing and/or reconstructing signals that exploits their inherent parsimonious structure. More important, it offers systematic means for imposing such structures while operating in an online rather than a batch fashion. As a result, the methods, developed in the SOL project, are of low complexity capable of coping with large amounts of data and/or with data living in very large dimensional spaces. This fact, which was further strengthened by the development of associated distributed algorithmic versions, renders these techniques appropriate for big data applications, which are likely to be a dominant trend in the coming years.

Finally, a wireless ECG monitoring system relying on the techniques developed in the SOL project was developed. It achieves reduced-power consumption and exhibits a high potential for future improvements and developments. A demonstration can be found in sol.di.uoa.gr/.... .

The results of the SOL project have been published in 3 book chapters, 2 leading journal publications and 11 conference publications, so far. Moreover, two extra journal articles will be submitted for publication soon.