Βιβλιοθήκη ΟΠΑ - Ψηφιακό Αποθετήριο

ΠΥΞΙΔΑ Ιδρυματικό Αποθετήριο
και Ψηφιακή Βιβλιοθήκη

Όνομα χρήστη
Κωδικός πρόσβασης

Συλλογές :	Ιδρυματικό Αποθετήριο ΟΠΑ / AUEB Institutional Repository Σχολή Επιστημών και Τεχνολογίας της Πληροφορίας / School of Informatics Τμήμα Πληροφορικής / Department of Informatics Διδακτορικές διατριβές / PhD Theses

Τίτλος :	Stability of spectral learning algorithms: theory, methodologies and applications

Δημιουργός :	Mavroeidis, Dimitrios

Συντελεστής :	Vazirgiannis, Michalis (Επιβλέπων καθηγητής) Athens School of Economics and Business, Department of Informatics (Degree granting institution)

Τύπος :	Text

Φυσική περιγραφή :	135p.

Γλώσσα :	en

Περίληψη :	In Data Mining an important research problem is the identification and analysis of theoretical properties that characterize and explain the behavior of learning algorithms. Based on such theoretical tools the comparison and analysis of algorithms can be based on rigorous and sound criteria other than simply their empirical behavior. In this context, an important criterion that is widely used for assessing the quality of learning algorithms is stability. Stability essentially evaluates the sensitivity of the output of a learning algorithm with respect to small perturbations of the input. An algorithm is said to be stable if it is insensitive to perturbations and unstable if even small perturbations of the input can significantly alter the algorithm’s output. Based on this definition it is natural to require that an algorithm is stable. A learning paradigm that has been recently developed in the context of Data Mining considers the use of Spectral techniques for addressing several data mining tasks, ranging from Dimensionality Reduction to Clustering and Classification. Spectral algorithms are commonly perceived as approximate solutions to certain combinatorial optimization problems and derive their solutions based on the eigenvectors of appropriate input matrices. The stability of the results depends on certain algebraic properties of these matrices, thus the study of stability reduces to the study of these algebraic properties.In this thesis we focus on the stability of spectral learning algorithms and consider the following research questions: How can we measure the stability of spectral learning algorithms? What is the relation of sample size and (in)stability? What is the relation between stability and efficiency? How can we identify the sufficient sample size for guaranteeing stability? How can we enhance the stability of spectral learning algorithms? How can the concept of stability be employed in distributed learning algorithms? Based on the aforementioned research directions, the contributions of this thesis can be summarized in the following. We define an efficient bootstrap based methodology for measuring the stability of Principal Components Analysis (PCA) [88], Spectral k-Means and Normalized Spectral Clustering [85]. • We propose a Feature Selection Framework that enhances the stability of Principal Components Analysis PCA [88] and Spectral Clustering/Ordering [83]. • We propose a Semi-Supervised Framework that enhances the stability and efficiency of Spectral Ordering[83, 84]. • Motivated by [87] where highly accurate classification accuracy is achieved with very small training data sets, we associate the concept of sampling variability to instability and Sample Size, and propose a Spectral Clustering algorithm that automatically identifies the sufficient sample size that guarantees stability [85]. • We define a Distributed Spectral Clustering framework, that aims in minimizing instability. The proposed framework is demonstrated to obtain highly accurate models with low bandwidth consumption [85].

Περίληψη :

In Data Mining an important research problem is the identification and analysis of theoretical properties that characterize and explain the behavior of learning algorithms. Based on such theoretical tools the comparison and analysis of algorithms can be based on rigorous and sound criteria other than simply their empirical behavior. In this context, an important criterion that is widely used for assessing the quality of learning algorithms is stability. Stability essentially evaluates the sensitivity of the output of a learning algorithm with respect to small perturbations of the input. An algorithm is said to be stable if it is insensitive to perturbations and unstable if even small perturbations of the input can significantly alter the algorithm’s output. Based on this definition it is natural to require that an algorithm is stable. A learning paradigm that has been recently developed in the context of Data Mining considers the use of Spectral techniques for addressing several data mining tasks, ranging from Dimensionality Reduction to Clustering and Classification. Spectral algorithms are commonly perceived as approximate solutions to certain combinatorial optimization problems and derive their solutions based on the eigenvectors of appropriate input matrices. The stability of the results depends on certain algebraic properties of these matrices, thus the study of stability reduces to the study of these algebraic properties.In this thesis we focus on the stability of spectral learning algorithms and consider the following research questions: How can we measure the stability of spectral learning algorithms? What is the relation of sample size and (in)stability? What is the relation between stability and efficiency? How can we identify the sufficient sample size for guaranteeing stability? How can we enhance the stability of spectral learning algorithms? How can the concept of stability be employed in distributed learning algorithms? Based on the aforementioned research directions, the contributions of this thesis can be summarized in the following. We define an efficient bootstrap based methodology for measuring the stability of Principal Components Analysis (PCA) [88], Spectral k-Means and Normalized Spectral Clustering [85]. • We propose a Feature Selection Framework that enhances the stability of Principal Components Analysis PCA [88] and Spectral Clustering/Ordering [83]. • We propose a Semi-Supervised Framework that enhances the stability and efficiency of Spectral Ordering[83, 84]. • Motivated by [87] where highly accurate classification accuracy is achieved with very small training data sets, we associate the concept of sampling variability to instability and Sample Size, and propose a Spectral Clustering algorithm that automatically identifies the sufficient sample size that guarantees stability [85]. • We define a Distributed Spectral Clustering framework, that aims in minimizing instability. The proposed framework is demonstrated to obtain highly accurate models with low bandwidth consumption [85].

Λέξη κλειδί :	Spectral algorithms Stability Learning algorithms Data mining

Ημερομηνία :	30-04-2009

Άδεια χρήσης :

Αρχείο: Mavroeidis_2009.pdf

Τύπος: application/pdf

Είσοδος