Πλοήγηση ανά Συγγραφέα "Pierrakos, Georgios"
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω
Τώρα δείχνει 1 - 1 από 1
- Αποτελέσματα ανά σελίδα
- Επιλογές ταξινόμησης
Τεκμήριο Clustering time series data(2020-08-26) Pierrakos, Georgios; Πιερράκος, Γεώργιος; Athens University of Economics and Business, Department of Management Science and Technology; Ntzoufras, Ioannis; Chatziantoniou, Damianos; Karlis, DimitriosThe present thesis aims to examine all the factors that need to be defined when applying clustering methodologies to time series data. However, the adaptation of existing methodologies to time series is not straightforward; the intricacies of high dimensionality, ordered and correlated observations need also to be addressed. The problem definition amounts to two separate tasks: (i) identifying the criteria to assess the value of time series clustering methodologies and then, (ii) identifying which clustering methodology works best. The former task reveals that the problem is one of multi-objective optimization: both accuracy (measured by the Silhouette cvi – cluster validity index) and efficiency (measured by algorithm execution times) need to be maximized so that meaningful methodologies can be proposed. The latter task entails testing a number of methodologies using a sample dataset. A set of such methodologies use static data clustering approaches, hierarchical, partitioning and fuzzy, using time series distance definitions. Bibliography reveals that the most prominent distance definition is dynamic time warping. A number of related parameters need to be examined: step pattern, window size and sample timeseries selection (for algorithms that build clusters around representative timeseries). Another set of methodologies use a hierarchical algorithm fed with clipped series, Pearson correlation and Lp-norm (euclidean and Manhattan) distance definitions. The sample dataset consists of the diurnal variation of bike rental commencements of the Capital Bikeshare scheme in Washington DC, USA across the various stations. Results obtained are also examined on the map, to check whether time series clusters lead to geographical clusters as well. Main conclusions are that: (i) no single distance definition is best in all cases, the semantics of the underlying process need to be very well understood and (ii) while the dtw improves on standard Lp-norm distance definitions, it is associated with a heavy time cost, reducing scalability.
