Tehran Institute for Advanced Studies (TEIAS)

/ Data Science Day

Symposium

Data Science Day

TEIAS DS DAYS

November 22, 2023
(1 Azar 1402)

09:00 - 16:00

Venue

Khatam University Amphitheater

Registration Deadline

November 21, 2023 (30 Aban 1402)

+982189174612

Overview

In this one-day event, we will see talks about various topics related to computer science, with a focus on data science and machine learning by faculty from TEIAS at Khatam university and other universities. The talks cover theoretical as well as practical topics.

Speakers

Fateme Ghasemi

Fatemeh Ghassemi

Tehran University

Title: An Enhanced Encrypted Traffic Classifier via Combination of Deep Learning and Automata Learning

Abstract: Characterizing the network traffic and identifying running applications play an instrumental role in several network administration tasks such as protecting against malicious behavior, firewalling, and balancing bandwidth usage. This classification is complex due to recent advances in the Internet, such as encryption-based security protocols, Tor networks, and virtual private networks. We propose a general traffic classifier that addresses all the above-mentioned challenges. We utilize the automata learning technique to derive the behavioral packet-based model of each application in terms of a k-testable language. The corresponding automaton shows the temporal relations among the packets and can be automatically learned from a set of application traces. As some packets always appear together, we apply machine learning techniques to automatically identify those packets with similar timing and statistical features to increase the granularity of the alphabets of the learned languages. This leads to smaller models yet still precise to characterize applications. The learned models are very precise and too sensitive to the order of the packets which leads to overfitting and not tolerable to noise and events loss or reordering in the distributed setting. We classify the packets by a deep neural network trained through the feature vectors extracted from the learned languages. The results of applying our framework on real traffic indicate that our approach outperforms the state-of-the-art methods both in application identification and traffic characterization tasks. The proposed approach is also resilient to noise stemming from the simultaneous execution of multiple applications.

Sadegh Akbari

Sadegh Aliakbary

Shahid Beheshti University

Title: An Introduction to Process Mining

Abstract: Process Mining is an interdisciplinary research area between “process management” and “data science”. With process mining, we analyze operational processes based on event logs in order to turn event data into knowledge, insights, and actions. This talk presents an introduction to the field of process mining with a focus on “Predictive Business Process Monitoring” (PBPM). PBPM is one of the sub-disciplines of process mining, which deals with predicting the future state of business processes. This presentation provides a brief overview of the state-of-the-art in process mining and its recent techniques and applications.

Behnam Bahrak

Behnam Bahrak

Title: Diversity dilemmas: uncovering gender and nationality biases in graduate admissions across top North American computer science programs

Abstract: Although different organizations have defined policies towards diversity in academia, many argue that minorities are still disadvantaged in university admissions due to biases. Extensive research has been conducted on detecting partiality patterns in the academic community. However, in the last few decades, limited research has focused on assessing gender and nationality biases in graduate admission results of universities. In this presentation, we discuss how we collected a novel and comprehensive dataset containing information on approximately 14,000 graduate students majoring in computer science (CS) at the top 25 North American universities and used statistical tests to determine whether there is a preference for students’ gender and nationality in the admission processes. In addition to partiality patterns, we discuss the relationship between gender/nationality diversity and the scientific achievements of research teams. 

TEIAS3
TEIAS4

Title: On Privacy Implications of Data Deletion

Abstract: Perhaps motivated by legal requirements and/or privacy goals, deleting data records from machine learning models has received more attention in recent years. In this work, we formally study the privacy implications of such updates on machine learning models, while the adversary has continuous access to the model. We discuss both definitions as well as attacks on concrete algorithms using those definitional frameworks.

Yadollah-Yaghoobzadeh

Yadollah Yaghoobzadeh

Title: From Language Models to ChatGPT: Breakthroughs and Limitations

Abstract: In this talk, I start by briefly talking about the recent shifts in natural language processing. Then, I’ll cover the history of how language models have grown to become tools like ChatGPT. I’ll explore the big steps forward they have made, especially how they have changed the way we talk to machines and solve problems. But it’s not all perfect. I’ll also talk about the problems these models face. This presentation aims to give a view of how these language models have grown, what they can do now, and what challenges they still face.

Zahra Delbari

Zahra Sadat Delbari

Graduate student

Title: Spanning the Spectrum of Hatred Detection: A Persian Multi-Label Hate Speech Dataset with Annotator Rationales

Abstract: With the alarming rise of hate speech in online communities, the demand for effective NLP (Natural Language Processing) models to identify instances of offensive language has reached a critical point. However, the development of such models heavily relies on the availability of annotated datasets, which are scarce, particularly for less-studied languages. To bridge this gap for the Persian language, we present a novel dataset specifically tailored to multi-label hate speech detection. The dataset consists of over 7k Persian tweets with annotated rationale behind the chosen label and the target of hate. In this talk, I will discuss the procedure we followed to construct this dataset, the challenges we faced during the construction process, and the analysis of the final result dataset.

Schedule

8:30……………..Breakfast and registration
9:20……………..Introduction
9:30-10:00……Dr. Bahrak
10:00-10:45….Dr. Aliakbary
10:45-11:15….Break
11:15-12:00….Dr. Ghassemi
12:00-13:15….Lunch
13:15-13:45….Dr. Yaghoobzadeh
13:45-14:00….Miss Delbari
14:00-14:30….Dr. Mahmoody
14:30-15:00….Break
15:00-15:50….Panel