In recent years, there has been a dramatic increase in the amount of digital data. As a consequence, the demand for tools that can automatically learn, analyse, understand and predict data has increased significantly. Shirin Tavara, PhD Student at the University of Skövde, has taken a closer look at methods for this.
During the Covid-19 pandemic, the increase in the amount of data created by digitalisation reached a new record. This due to the fact that people worked more frequently from home, participated in digital education, used social media and search engines more often. In consequence, the demand for tools to automatically learn, analyse, understand, and predict has increased. Machine Learning (ML) provides a tool that can automatically learn from data and make decisions as needed.
New challenges for managing the amount of data
However, as the amount of data increases, the calculation time and memory requirements to solve problems also increase. This means that traditional ML models now face new challenges. Distributed and parallel learning are promising approaches to improve the runtime performance of ML algorithms. Implementing an efficient parallel ML is a difficult task and the communication between distributed nodes may concern privacy.
“My thesis aims to highlight efficient parallel methods including algorithmic approaches and parallel tools for learning a specific ML method called Support Vector Machines (SVM). Furthermore, I investigate how to further improve the performance of some of the parallel SVM methods and make them preserve privacy”.
In her thesis, Shirin Tavara provides recommendations for the development of an efficient parallel framework for solving SVM problems. These are recommendations that can help scientists and developers use efficient approaches based on both demands and preferences. Besides, these recommendations can make it possible to develop a framework that can compile all the effective approaches together with privacy recommendations.
“My research is important because SVM is used in important application areas such as health care and medicine design. It is also used to improve the performance of SVM in terms of training time, accuracy or data privacy, which is important for such applications.”
No simple recommendations
The result of Shirin Tavara's research thus shows that there are no simple recommendations that fit all situations and circumstances, but that it is up to the users to make wise choices. Many issues arising from big data can be solved using distributed learning with decentralised communication between nodes. This has been empirically proven to be an effective approach for high-performance learning of SVMs even when dealing with very large amounts of data.
The research presented in this thesis addresses some issues in parallel and distributed computing of SVMs through four research questions. The most important contribution of this thesis is to provide answers to the thesis research questions through five research articles.
The focus of the thesis has mainly been on binary classification. Future research in the field could implement different multi-class classification techniques.
Shirin Tavara believes that the communication pattern between nodes can be further improved. To do this, the cooperation between nodes can be further improved to see if it can eliminate unnecessary communication and data exchange. Privacy protection can be improved using different strategies to perturb the original data.
“Finally, I think my research can be extended to Deep Neural Networks, (DNN). For example, a comprehensive comparison of DNN and SVM as well as to study the effect of network topology for distributed learning of DNN.
Shirin Tavara has been offered a Post Doc position focusing on research concerning AI-related medicine design.
Shirin Tavara defends her thesis, "Distributed and Federated Learning of Support Vector Machines and Applications" on Friday 30 September at the University of Skövde.