Πλοήγηση ανά Επιβλέπων "Karlis, Dimitrios"
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω
Τώρα δείχνει 1 - 20 από 58
- Αποτελέσματα ανά σελίδα
- Επιλογές ταξινόμησης
Τεκμήριο Actuarial odelling of claim counts(31-01-2024) Γκάζικα, Βασιλική; Gkazika, Vassiliki; Athens University of Economics and Business, Department of Management Science and Technology; Ntzoufras, Ioannis; Chatziantoniou, Damianos; Karlis, DimitriosΟ τομέας της ασφαλιστικής κάλυψης ατυχημάτων γνωρίζει αξιόλογη ανάπτυξη, καθιστώντας τον έναν από τους ταχύτερα αναπτυσσόμενους τομείς στο επάγγελμα του αναλογιστή. Καθώς αυτός ο τομέας γνωρίζει ταχεία ανάπτυξη, η ζήτηση για ακριβή και αποτελεσματική μοντελοποίηση δεδομένων σχετικά με τα ατυχήματα αυξάνεται κατακόρυφα. Ευτυχώς, έχουμε άφθονα ιστορικά δεδομένα στη διάθεσή μας, και η διερεύνηση αυτών των δεδομένων μπορεί να αποκαλύψει ενδιαφέρουσες πληροφορίες για τους παράγοντες που οδηγούν σε ασφαλιστικές απαιτήσεις, τα χαρακτηριστικά των απαιτητών και πολλές άλλες πτυχές.Το επίκεντρο αυτής της διατριβής έγκειται στη διερεύνηση και εξερεύνηση των καταλληλότερων μοντέλων για την ανάλυση των δεδομένων απαιτήσεων και την κατανόηση των συνεπειών των ευρημάτων. Χρησιμοποιούμε επίσης βασικούς δείκτες για την αξιολόγηση της ποιότητας της προσαρμογής αυτών των μοντέλων στα δεδομένα, προκειμένου να διασφαλίσουμε μια ολοκληρωμένη ανάλυση. Ιδιαίτερα, οι πληροφορίες που προκύπτουν από αυτά τα μοντέλα ρίχνουν φως στα πρότυπα των απαιτήσεων του παρελθόντος και μας επιτρέπουν να προβλέψουμε μελλοντικές απαιτήσεις και τις σχετιζόμενες απώλειες.Τεκμήριο Adaptive designs in phase II clinical trials(23-09-2013) Poulopoulou, Stavroula; Πουλοπούλου, Σταυρούλα; Athens University of Economics and Business. Department of Statistics; Karlis, Dimitrios; Dafni, UraniaClinical trials play a very important role in the development process of new therapies. Recently there has been a rapid increase in theresearch and creation of new modern molecular agents, which makes necessary the development of more flexible and adaptive designs forthe implementation of clinical trials. The objective of adaptive designs is to ensure direct and dynamic control of the effectiveness and thesafety of a new treatment by allowing the adjustment of the elements of the study (i.e sample size), during the study, in such a way that wewill not sacrifice elements which are associated with the credibility of the study (i.e statistical power) and also issues which concern ethicalcharacteristics of the clinical trials.Τεκμήριο Analysis with Play-by-Play Data in Euroleague(31-01-2023) Λουκόπουλος, Ορέστης; Loukopoulos, Orestis; Athens University of Economics and Business, Department of Management Science and Technology; Ntzoufras, Ioannis; Chatziantoniou, Damianos; Karlis, DimitriosΗ παρούσα διατριβή παρουσιάζει μια ανάλυση των επιθέσεων στο ευρωπαϊκό μπάσκετ. Ο στόχος αυτής της μελέτης είναι να εντοπίσει τους κύριους παράγοντες που συντελούν στην επιτυχημένη έκβαση των επιθέσεων. Συνήθως, οι περισσότεροι ερευνητές χρησιμοποιούν δεδομένα από το box-score για να εντοπίσουν αυτούς τους παράγοντες. Σε αυτήν τη διατριβή, χρησιμοποιήσαμε δεδομένα από τη σεζόν 2019 - 2020 της Ευρωλίγκας σε μορφή play-by-play. Η κύρια ιδέα ήταν να λάβουμε υπόψη πληροφορίες σχετικά με τη διάρκεια κάθε κατοχής, καθώς και πληροφορίες σχετικά με την πρόοδο και την έκβαση της προηγούμενης κατοχής των αντιπάλων.Χρησιμοποιήσαμε αυτά τα δεδομένα εφαρμόζοντας ορισμένα περιγραφικά στατιστικά καθώς και ένα μοντέλο Λογιστικής Παλινδρόμησης. Τα αποτελέσματα της ανάλυσης αυτής, παρέχουν πληροφορίες σχετικά με τα χαρακτηριστικά του παιχνιδιού που μεγιστοποιούν την επιθετική αποδοτικότητα μιας ομάδας. Τα ευρήματα αυτής της μελέτης μπορούν να χρησιμοποιηθούν από προπονητές προκειμένου να βελτιώσουν στρατηγικές και να αυξήσουν την επιθετική αποτελεσματικότητα της ομάδας τους.Τεκμήριο Analytics using public medical records(2024) Vlassis, Georgios-Konstantinos; Βλάσσης, Γεώργιος-Κωνσταντίνος; Athens University of Economics and Business, Department of Management Science and Technology; Papastamoulis, Panagiotis; Chatziantoniou, Damianos; Karlis, DimitriosΗ διατριβή «Αναλυτική με τη χρήση δημόσιων ιατρικών αρχείων» διερευνά τη μεταμορφωτική δυναμική της ενσωμάτωσης της αναλυτικής στην υγειονομική περίθαλψη της Ελλάδας, αξιοποιώντας δημόσια ιατρικά αρχεία. Εξετάζει τα πολυδιάστατα οφέλη, όπως η βελτίωση της διαχείρισης ασθενειών, η ενισχυμένη φροντίδα ασθενών μέσω εξατομικευμένων θεραπευτικών σχεδίων και η αποδοτικότητα της παροχής υγειονομικής περίθαλψης. Η διατριβή αντιμετωπίζει τεχνικές προκλήσεις όπως η διαχείριση βάσεων δεδομένων, η προστασία δεδομένων, η ασφάλεια και η διαλειτουργικότητα, τονίζοντας λύσεις και στρατηγικές για την υπέρβασή τους. Ηθικές πτυχές, συμπεριλαμβανομένης της διαχείρισης ευαίσθητων δεδομένων ασθενών και της διασφάλισης της ιδιωτικότητας, αξιολογούνται κριτικά. Μέσα από μελέτες περιπτώσεων και την εξέταση υπαρχουσών ερευνών, η διατριβή υπογραμμίζει τον κρίσιμο ρόλο των προγνωστικών, προδιαγραφικών, περιγραφικών και διαγνωστικών αναλύσεων στην επανάσταση της υγειονομικής περίθαλψης. Τονίζει την αναγκαιότητα μιας προσεκτικής σύνθεσης τεχνικών καινοτομιών και ηθικών πρακτικών για την αξιοποίηση του πλήρους δυναμικού της αναλυτικής στην υγειονομική περίθαλψη, ιδιαίτερα στο μοναδικό πλαίσιο των υποδομών υγειονομικής περίθαλψης της Ελλάδας και της ανάπτυξης ενός ηλεκτρονικού συστήματος φακέλου ασθενούς.Τεκμήριο Application of Copula functions in statistics(09-2007) Nikoloulopoulos, Aristidis; Νικολουλόπουλος, Αριστείδης; Athens University of Economics and Business, Department of Statistics; Karlis, DimitriosStudying associations among multivariate outcomes is an interesting problem in statistical science. The dependence between random variables is completely described by their multivariate distribution. When the multivariate distribution has a simple form, standard methods can be used to make inference. On the other hand one may create multivariate distributions based on particular assumptions, limiting thus their use. Unfortunately, these limitations occur very often when working with multivariate discrete distributions. Some multivariate discrete distributions used in practice can have only certain properties, as for example they allow only for positive dependence or they can have marginal distributions of a given form. To solve this problem copulas seem to be a promising solution. Copulas are a currently fashionable way to model multivariate data as they account for the dependence structure and provide a flexible representation of the multivariate distribution. Furthermore, for copulas the dependence properties can be separated from their marginal properties and multivariate models with marginal densities of arbitrary form can be constructed, allowing a wide range of possible association structures. In fact they allow for flexible dependence modelling, different from assuming simple linear correlation structures. However, in the application of copulas to discrete data marginal parameters affect dependence structure, too, and, hence the dependence properties are not fully separated from the marginal properties. Introducing covariates to describe the dependence by modelling the copula parameters is of special interest in this thesis. Thus, covariate information can describe the dependence either indirectly through the marginal parameters or directly through the parameters of the copula . We examine the case when the covariates are used both in marginal and/or copula parameters aiming at creating a highly flexible model producing very elegant dependence structures. Furthermore, the literature contains many theoretical results and families of copulas with several properties but there are few papers that compare the copula families and discuss model selection issues among candidate copula models rendering the question of which copulas are appropriate and whether we are able, from real data, to select the true copula that generated the data, among a series of candidates with, perhaps, very similar dependence properties. We examined a large set of candidate copula families taking intoaccount properties like concordance and tail dependence. The comparison is made theoretically using Kullback-Leibler distances between them. We have selected this distance because it has a nice relationship with log-likelihood and thus it can provide interesting insight on the likelihood based procedures used in practice. Furthermore a goodness of fit test based on Mahalanobis distance, which is computed through parametric bootstrap, will be provided. Moreover we adopt a model averaging approach on copula modelling, based on the non-parametric bootstrap. Our intention is not to underestimate variability but add some additional variability induced by model selection making the precision of the estimate unconditional on the selected model. Moreover our estimates are synthesize from several different candidate copula models and thus they can have a flexible dependence structure. Taking under consideration the extended literature of copula for multivariate continuous data we concentrated our interest on fitting copulas on multivariate discrete data. The applications of multivariate copula models for discrete data are limited. Usually we have to trade off between models with limited dependence (e.g. only positive association) and models with flexible dependence but computational intractabilities. For example, the elliptical copulas provide a wide range of flexible dependence, but do not have closed form cumulative distribution functions. Thus one needs to evaluate the multivariate copula and, hence, a multivariate integral repeatedly for a large number of times. This can be time consuming but also, because of the numerical approach used to evaluate a multivariate integral, it may produce roundoff errors. On the other hand, multivariate Archimedean copulas, partially-symmetric m-variate copulas with m-1 dependence parameters and copulas that are mixtures of max-infinitely divisible bivariate copulas have closed form cumulative distribution functions and thus computations are easy, but allow only positive dependence among the random variables.The bridge of the two above-mentioned problems might be the definition of a copula family which has simple form for its distribution functionwhile allowing for negative dependence among the variables. We define sucha multivariate copula family exploiting the use of finite mixture of simple uncorrelated normal distributions. Since the correlation vanishes, the cumulative distribution is simply the product of univariate normal cumulative distribution functions. The mixing operation introduces dependence. Hence we obtain a kind of flexible dependence, and allow for negative dependence.Τεκμήριο Application of hidden Markov and related models to earthquake studies(2015) Orfanogiannaki, Aikaterini M.; Ορφανογιαννάκη, Αικατερίνη Μ.; Athens University of Economics and Business, Department of Statistics; Karlis, DimitriosDiscrete valued hidden Markov Models (HMMs) are used to model time series of event counts in several scientific fields like genetics, engineering, seismology and finance. In its general form the model consists of two parts: the observation sequence and an unobserved sequence of hidden states that underlies the data and consist a Markov chain. Each state is characterized by a specific distribution and the progress of the hidden process from state to state is controlled by a transition probability matrix. We extend the theory of HMMs to the multivariate case and apply them to seismological data fromdifferent seismotectonic environments. This extension is not straightforward and it is achieved gradually by assuming different multivariate distributions to describe each state of the model.Τεκμήριο Assessing the trajectories of Greek football players' careers(17-06-2024) Charamaras, Kostas; Χαραμαράς, Κωνσταντίνος; Athens University of Economics and Business, Department of Management Science and Technology; Ntzoufras, Ioannis; Chatziantoniou, Damianos; Karlis, DimitriosΑυτή η διπλωματική εργασία χρησιμοποιεί δεδομένα μεταγραφών και απόδοσης για να εξετάσει την πορεία της καριέρας ενός Έλληνα ποδοσφαιριστή μετά από μια μεταγραφική κίνηση. Μετά την ανάλυση των αγοραίων αξιών 92.811 ποδοσφαιριστών σε όλη την Ευρώπη, χρησιμοποιήθηκαν πληροφορίες σχετικά με 1.050 Έλληνες ποδοσφαιριστές με βάση την Ελλάδα και το εξωτερικό κατά την περίοδο 2013 – 2023. Εφαρμόζοντας ανάλυση παλινδρόμησης, μπορούμε να αποκτήσουμε πληροφορίες σχετικά με το πώς η αλλαγή της αγοραίας αξίας ενός ποδοσφαιριστή συμπεριφέρεται μετά από μια μεταγραφή, και τα αποτελέσματα υποδηλώνουν ότι ενώ η απόδοση του παίκτη στο γήπεδο είναι προφανώς σημαντική, η εγγενής αλλαγή της αξίας των προηγούμενων ετών μπορεί να θεωρηθεί ως ο πιο σημαντικός παράγοντας.Τεκμήριο Bayesian approaches for maximum tolerated dose estimation in Phase I clinical trials(12/21/2018) Tsiros, Periklis; Athens University of Economics and Business, Department of Informatics; Πεντελή, Ξανθή; Vassalos, Vasilios; Karlis, DimitriosDetermining the maximum tolerated dose (MTD) is one of the fundamental goals of phase I clinical trials. In a major class of phase I clinical trials, cancer trials, establishing the MTD has an even greater value, since drug efficacy and toxicity are considered to increase monotonically with respect to the administrated dose. However, an ethical complication arises in the current dose escalation designs that are used in phase I cancer trials: early patients receive subtherapeutic doses due to safety concerns. The present thesis proposed a novel approach in the preclinical determination of the MTD, which, in combination with the continual reassessment method, could result in resolving the problem. A dataset consisting of 35 drugs was created using previous publications and a Bayesian regression model, using predictors deriving from physicochemical descriptors of the drugs, was fit to the data. The predictive ability of the model was evaluated using an appropriate test set. Following that, the predictions were used for constructing a series of simulations using the CRM, with the purpose of demonstrating the benefits and weaknesses of this methodology.Τεκμήριο Behavioral analysis using retail data for customer segmentation(2022) Maniatis, Filippos; Μανιάτης, Φίλιππος; Athens University of Economics and Business, Department of Informatics; Vassalos, Vasilios; Zois, Georgios; Karlis, DimitriosOn this thesis we experiment on the use of a clustering algorithm for customer segmentation in order to produce meaningful cluster analysis that will be used for business decision making. The analysis explores the application of k-means algorithm paired with Principal Component Analysis for dimensionality reduction on real sales data from one of the largest coffee store chains in Greece. What is more, metrics for wellness of fit of the algorithm and choosing the optimal number of clusters prior to clustering are also used and explained. After clustering, a Decision Tree classifier is used for interpretation of the clustering results and a Random Forest classifier is employed in order to extract the importance of the features that take part in the clustering process. Finally, we analyze the clusters in order to gain useful insights concerning customer behavior.Τεκμήριο Big data regression(22-03-2017) Kontou, Eleni; Athens University of Economics and Business, Department of Statistics; Karlis, DimitriosWe are entering the era of Big Data. Half a century after computers enteredin our society, data has begun to be the center of attention. Not only there ismassive information, which has never existed before, but also this informationis proliferating faster and has a variety of forms. Therefore, Big Data bringsnew opportunities for the development of society and poses new challenges todata scientists. It has unique features that are not shared by the traditionaldata sets. Speciffcally, it is characterized by high dimensionality and largesample size. Due to this, ordinary statistical methods do not work. Thus,we need new effective statistical procedures and computational methods.In the current thesis, we will study methods that deal with regressionfor large datasets. After we review such existing methodologies, we willexamine an updating method, which estimates the regression coefficients andis faster than the ordinary least square estimation. We will also investigatethe regression by orthogonalization and regression using QR decomposition.Afterwards, we will compare the methods through their respective runningtime that is required in order to estimate the regression coefficients. In thesame way, generalised linear models are studied and an updating method forthem is attempted. Finally, several proposals for deeper exploration of theprevious subjects will be discussed.Τεκμήριο Categorical time series(28-09-2023) Δασκαλάκη, Βασιλική; Daskalaki, Vasiliki; Athens University of Economics and Business, Department of Statistics; Vrontos, Ioannis; Pedeli, Xanthi; Karlis, DimitriosΗ παρούσα διατριβή εμβαθύνει στην ολοκληρωμένη διερεύνηση της ανάλυσης χρονοσειρών για κατηγορικά δεδομένα, εστιάζοντας σε ποικίλα εργαλεία και μοντέλα που έχουν ερευνηθεί εκτενώς σε αυτόν τον τομέα. Τα δεδομένα κατηγορικών χρονοσειρών, χαρακτηρίζονται από διακριτές καταστάσεις ή κατηγορίες και εμφανίζονται συχνά σε διάφορους τομείς. Η κατανόηση των υποκείμενων προτύπων και εξαρτήσεων σε τέτοια δεδομένα είναι κρίσιμης σημασίας για τη λήψη τεκμηριωμένων αποφάσεων και προβλέψεων. Στο πρώτο μέρος της διατριβής παρουσιάζεται μια εμπεριστατωμένη επισκόπηση των υφιστάμενων τεχνικών και μεθοδολογιών που χρησιμοποιούνται για την ανάλυση κατηγορικών χρονοσειρών. Με την εξέταση των πλεονεκτημάτων και των περιορισμών κάθε εργαλείου, η παρούσα έρευνα στοχεύει στην παροχή μιας ολοκληρωμένης επισκόπησης της κατάστασης προόδου στην ανάλυση κατηγορικών χρονοσειρών. Το δεύτερο μέρος της διατριβής επικεντρώνεται στην τη διερεύνηση και εφαρμογή των αλυσίδων Markov, των κρυφών Markov μοντέλων, και των μίξεών τους ως ισχυρές τεχνικές μοντελοποίησης για δεδομένα κατηγορικών χρονοσειρών. Οι αλυσίδες Markov χρησιμοποιούνται ευρέως για τη μοντελοποίηση διαδοχικών εξαρτήσεων μεταξύ διακριτών καταστάσεων, ενώ τα κρυφά Markov μπορούν να συλλάβουν λανθάνουσες ή μη παρατηρήσιμες καταστάσεις που υποκρύπτουν τα παρατηρούμενα κατηγορικά δεδομένα. Επιπλέον, διερευνώνται μίξεις αυτών των μοντέλων, συνδυάζοντας τα πλεονεκτήματά τους για τη δημιουργία πιο ευέλικτων και ευπερίστατων αναπαραστάσεων. Η μεθοδολογία αυτή καταδεικνύεται με τη χρήση πραγματικών δεδομένων γυναικών εργατριών.Τεκμήριο Clustering algorithms analysis on telecommunications data(01-08-2024) Καϊμενοπούλου, Γλυκερία; Kaimenopoulou, Glykeria; Athens University of Economics and Business, Department of Management Science and Technology; Papastamoulis, Panagiotis; Chatziantoniou, Damianos; Karlis, DimitriosΗ μελέτη παρέχει μια ολοκληρωμένη έρευνα των τεχνικών ομαδοποίησης, αξιολογώντας την καταλληλότητά τους στα δεδομένα τηλεπικοινωνιών. Αναλύει την επεκτασιμότητα κάθε αλγορίθμου, την ευαισθησία σε παραμέτρους, στον θόρυβο και τον χειρισμό ακραίων τιμών, την ποιότητα ομαδοποίησης, την ερμηνευτικότητα και την υπολογιστική απόδοση. Η έρευνα περιλαμβάνει επίσης πρακτικές εφαρμογές σε δεδομένα τηλεπικοινωνιών, αξιολογώντας την αποτελεσματικότητα και την απόδοση των επιλεγμένων αλγορίθμων. Αυτή η εργασία στοχεύει στο να βελτιώσει την κατανόηση των μεθοδολογιών ομαδοποίησης και να υποστηρίξει αποφάσεις που βασίζονται σε δεδομένα στον κλάδο των τηλεπικοινωνιών, τονίζοντας τις δυνατότητες βελτιστοποίησης.Τεκμήριο Clustering mixed mode data(2021) Apostolaki, Eleftheria; Αποστολάκη, Ελευθερία; Athens University of Economics and Business, Department of Management Science and Technology; Karlis, DimitriosIn the current thesis, the research problem that will be approached covers the clustering of mixed mode data (e.g. numeric, categorical, etc.), its benefits and applications. In Chapter 2, the literature review for clustering mixed mode data is detailed including the methodologies that will be used as part of the thesis and any additional methodologies that are available for this type of clustering according to the bibliography. In Chapter 3, a detailed overview and analysis is presented for the prostate cancer dataset on which the selected clustering methods (Kamila, K-Prototypes, Latent Variable Model) will be applied while in Chapter 4 the clustering results and their interpretation are provided. In chapter 5 that follows, the conclusions drawn from this research are described along with any future work required for the clustering of mixed mode data.Τεκμήριο Clustering mixed mode data(2021) Tsami, Konstantina; Τσάμη, Κωνσταντίνα; Athens University of Economics and Business, Department of Statistics; Karlis, DimitriosThe present dissertation focuses on cluster analysis methods to mixed type data. Hence, data that are not exclusively comprised of a single type ofvariable, but may combine continuous and categorical variables. This kind of data may be usually seen in real-world situations. In the current master thesis, we first present some common clustering techniques of mixed-type data. Later, we present more thoroughly five methods that have drawn attention in recent years. Finally, we present an application of three of those methods to a clinical trial data set of mixed-type data and compare their results.Τεκμήριο Clustering professional basketball players by three-point shooting efficiency & participation(2021) Kolovos, Konstantinos; Κολοβός, Κωνσταντίνος; Athens University of Economics and Business, Department of Management Science and Technology; Karlis, DimitriosThe most important coach's duties are talent identification, selection of players for specific teams, player potential determination, as well as the design and application of developmental training programs aimed at improving actual play quality. Because players are the bearers of play concepts and the creators of competition results, player selection is a key aim of professional sports club activity. Players must acquire and improve universal technical-tactical skills and knowledge as a result of their duty to assume responsibility for diverse roles in various phases of the game.This master's thesis is part of the general framework of player selection, dealing initially with the question of whether players become better by increasing their shooting accuracy of three-pointers over time and their greater experience. We are grouping and analyzing players' profiles according to their shooting evolution through years looking for common characteristics. In addition, it has been observed that players do not participate proportionally in each one of the four periods of a basketball game, but some of them more at the beginning, in the middle, or the end of a game. Are there any common attributes in the above groups of players? Using a database with each shot attempted in the NBA from 1996 to the 2019-2020 season, approximately 5 million shots, and combining biometric and statistical data for players we attempt to answer the above questions.Τεκμήριο Clustering time series data(08/26/2020) Pierrakos, Georgios; Πιερράκος, Γεώργιος; Athens University of Economics and Business, Department of Management Science and Technology; Ntzoufras, Ioannis; Chatziantoniou, Damianos; Karlis, DimitriosThe present thesis aims to examine all the factors that need to be defined when applying clustering methodologies to time series data. However, the adaptation of existing methodologies to time series is not straightforward; the intricacies of high dimensionality, ordered and correlated observations need also to be addressed. The problem definition amounts to two separate tasks: (i) identifying the criteria to assess the value of time series clustering methodologies and then, (ii) identifying which clustering methodology works best. The former task reveals that the problem is one of multi-objective optimization: both accuracy (measured by the Silhouette cvi – cluster validity index) and efficiency (measured by algorithm execution times) need to be maximized so that meaningful methodologies can be proposed. The latter task entails testing a number of methodologies using a sample dataset. A set of such methodologies use static data clustering approaches, hierarchical, partitioning and fuzzy, using time series distance definitions. Bibliography reveals that the most prominent distance definition is dynamic time warping. A number of related parameters need to be examined: step pattern, window size and sample timeseries selection (for algorithms that build clusters around representative timeseries). Another set of methodologies use a hierarchical algorithm fed with clipped series, Pearson correlation and Lp-norm (euclidean and Manhattan) distance definitions. The sample dataset consists of the diurnal variation of bike rental commencements of the Capital Bikeshare scheme in Washington DC, USA across the various stations. Results obtained are also examined on the map, to check whether time series clusters lead to geographical clusters as well. Main conclusions are that: (i) no single distance definition is best in all cases, the semantics of the underlying process need to be very well understood and (ii) while the dtw improves on standard Lp-norm distance definitions, it is associated with a heavy time cost, reducing scalability.Τεκμήριο Clustering Washington D.C. bike share stations by using time series data(08/31/2020) Sarac, Burcin; Athens University of Economics and Business, Department of Management Science and Technology; Karlis, DimitriosIn this economic era, technological developments lead on changing consumption habits. This mutation in consumption created sharing economy, which has consistently been getting a bigger role in economy day by day. Bike sharing systems, as one of the pioneers of sharing economy, are now counted as type of transportation in most of the big cities. Thus there are various studies works on improving efficiency of these systems. This thesis aims to cluster Washington D.C. bike-share stations and by identifying stations with similar rental behaviours, it is aimed to impact on efficiency of management of these stations. The dataset includes one-year rental records of shared bikes with their date time information as large time series data. Before clustering bike stations, first part of the thesis describes various clustering methods. Clustering is unsupervised learning method, to organize set of observations by their similarity and classify them. Calculating similarity between observations depends on selected clustering approach. This thesis covers iteration steps and evaluation of several clustering approaches, like k-means, k-medoids, hierarchical clustering, model based clustering, DBSCAN. After exploration of common algorithms, in the second part, to cluster bike-share stations, k-means, agglomerative hierarchical clustering and model based clustering methods implemented and evaluated.Τεκμήριο Credit scoring through machine learning and artificial intelligence(07/16/2020) Papilas, Konstantinos; Vassalos, Vasilios; Ntzoufras, Ioannis; Karlis, DimitriosOver the last years, most of economy sectors adopted digital technologies to improve business connections and efficiency. The integration of technologies like artificial intelligence, advanced analytics, and machine learning are revolutionizing many industries’ operating model. As technology enables businesses to make data-driven, business-critical decisions, artificial intelligence (AI) will be the number one key component of the digital age of credit risk area in the near future. Once AI becomes the standard in all other sectors, financial institutes will invest in machine learning for risk-related processes, by taking advantage of new digitally available data sources that can factor into a credit model. Traditional logistic regression models (probability of default - PD) have been market best practice for years despite that they underperform in capturing complex relationships that may be present in the real data. The purpose of this thesis is to build alternative models for credit scoring using artificial intelligence and machine learning algorithms that can be adaptable to incoming new data sources and changing economic landscape and compare its performance with traditional approaches. Modelling includes Random Forest and Extreme Gradient Boosting (XGBoost). lso, a part of this thesis come up with the traditional binning process of the data and make the continuous data as a form of a discrete with an appropriate way and compares this process with the machine learning point of view.Τεκμήριο Data analysis in COVID-19 data for Greece(07/01/2021) Malliopoulou, Vasiliki; Μαλλιοπούλου, Βασιλική; Athens University of Economics and Business, Department of Management Science and Technology; Karlis, DimitriosThat dissertation focuses on study in depth the COVID-19 data, to analyse them and to draw conclusions about what is happening and what are the effects of the virus in Greece. Moreover the goal was to create a spatio-temporal model that will explain the spread of the vurius in Greece. That models considered the number of cases in a region in a specific period of time, the number of cases in a neighbor region in a specific period of time and the lockdown effect.Τεκμήριο Demand forecasting for the bike sharing system in Washington DC(07/21/2021) Tsaka, Matilnta; Τσάκα, Ματίλντα; Athens University of Economics and Business, Department of Management Science and Technology; Karlis, DimitriosThe aim of this project is to study the demand of bike sharing systems and try to understand what other variables influence the total number of rentals. The main objective is to create a predictive model for the demand of 10 stations of the Capital Bikeshare, one of the U.S.A.’s largest bicycle sharing systems on an hourly basis. At first, I did some research on the topic and the methodologies that have been used to develop a predictive model for the system’s demand forecasting. The next step was to acquire the necessary data for our analysis. This required downloading data from Capital Bikeshare website, Washington D.C.’s local government website (https://www.capitalbikeshare.com/system-data) and weather data from an internet weather service website (www.timeanddate.com.). In this project we are going to use historical data for two years from 2018 to 2019 on an hourly basis. The data mentioned above were aggregated into one single dataset. For our analysis, we chose 10 stations of the bike sharing system to study. Once the data for the 10 selected stations were gathered, the R programming language was used to visualize and explore the data. Then, time series forecasting models were built to predict the demand of bikes of each station. For most of the station the models gave us good predictions, but they were not the best that we could get. After fitting the models, it was obvious that data incorporated more than one seasonal component, and this was something that the models that were trained could not handle to result in better prediction.
- «
- 1 (current)
- 2
- 3
- »