Serdecznie zapraszamy na wykład z serii Wykładów Akademii Zastosowań Technologii Cyfrowych (AI Tech)
Tytuł: Dynamic Incremental Semi-Supervised Fuzzy Clustering for Data Stream Classification
Data: wtorek, 13.06.2023, godz. 12:00-13:30
Miejsce: A1-33/34 (Sala Rady Wydziału)
Prelegent: Gabriella Casalino, University of Bari, Bari, Włochy
Abstrakt: Data stream mining is a recent methodology that deals with the analysis of large volumes of ordered sequences of data samples. They are continuously produced by sensor networks, e-mails, online transactions, network traffic, weather forecasting, health monitoring, etc., just to cite the most common applications made available by current technology. The main challenges in data stream mining derive from dealing with non-stationary and potentially unbounded data, which require the development of special-purpose algorithms capable of performing almost real-time processing of data samples, with limited time and memory constraints. Also, they should be able to identify changes in data and to accordingly adapt the models to newly available data. In particular, this talk will focus on semi-supervised methods, as in real-world scenarios labeled samples may be difficult or expensive to obtain, meanwhile unlabeled data are relatively easy to collect. For example, it is quite easy to collect new sensor data coming from continuous streams but it may be difficult or even impossible to manually label all such data. A stream data classification algorithm, based on semi-supervised fuzzy clustering, namely Dynamic Incremental Semi-Supervised FCM (DISSFCM) will be discussed. It is applied to non-overlapping chunks of data, in order to incrementally generate informational patterns, representing a synthesized view of all data records analyzed in past and to progressively evolve the clusters model as new data records are available. Moreover, using a splitting and a mapping mechanism it dynamically adapts the number of clusters and classes in data. Indeed a fixed number of clusters and/or classes may not capture the evolving structure of data. The large availability of stream data makes DISSFCM a suitable classification algorithm for a wide variety of applications. Case studies related to different domains, such as cyber security, energy optimization, learning analytics, and medical applications, will be discussed. Finally, the inner interpretable nature of fuzzy logic will be exploited to show a first attempt at model explanation.
Biogram: Gabriella Casalino is currently an Assistant Professor at the Computational Intelligence Laboratory (CILab) of the Informatics department of the University of Bari, working on machine learning techniques applied to Web Economy domain. This position has been funded by the Italian Ministry of University and Research (M.U.R.) through an European project. Her research activity is focused on Computational Intelligence with a particular interest for data analysis. Three are the main themes she is currently working on: 1)Intelligent Data Analysis 2)Computational Intelligence for eHealth, 3)Data Stream Mining. Topics in which she has produced original contributions include: image analysis, educational data mining, text mining, e-health, bioinformatics and signal processing. She is active in the computer science community as a reviewer for international journals and conferences. She is also involved in the organizing committees of international conferences such as IEEE EAIS, Eusflat, FUZZ-IEEE. She is Associate Editor of the Journal of Intelligent and Fuzzy Systems and she is Guest Editor of several special issues (IEEE SMC magazine, IEEE Transactions on Computational Social Systems). She was visiting researcher at the Universitè de Mons (Belgium), at the Polish Academy of Sciences, the Warsaw University of Technology (Poland) and the University of Ghent (Belgium). She is Senior member of IEEE society and she received the FUZZ-IEEE best paper award.