Frontiers in Computer and Data Sciences

Symposium

April 16 to 18, 2024
(28 to 30 Farvardin 1403)

09:00 - 16:00

Venue

Khatam University Amphitheater

Registration Deadline

April 12, 2024 (24 Farvardin 1403)

[email protected]

+982189174612

Overview

In this event, researchers across different areas of computer and data sciences will gather together to present their recent research findings. The event consists of invited talks, as well as poster presentations by students.

Target Audience

Faculties and students of computer and data science.

Speakers

Dr. Ehsan Asgari

Qatar Computing Research Institute

Title and Abstract

Title: Exploring Recent Advances in Iranian Language Processing and Digital Humanities

Abstract:

Language technologies play a crucial role in enabling machines to understand and generate human languages, transforming digital communication, and information access. Language technologies have proven to be invaluable tools for exploring rich literatures and cultural heritages, offering significant potential to address longstanding challenges in the humanities domain. In this talk, we present some of our recent works in developing language technologies for Iranian languages and digital humanities, with a focus on 1) data mining of the Quran and Hadith, 2) the study of Iranian languages, and 3) AI for media-arts, showcasing how these innovations contribute to both technological progress and cultural understanding.

Biography

Ehsanoddin Asgari is a scientist at the Qatar Computing Research Institute (QCRI), focusing on natural language processing, multimodal models, digital humanities, and the language modeling of biological sequences. Ehsan earned his Ph.D. from the University of California, Berkeley, his master’s from the École Polytechnique Fédérale de Lausanne (EPFL), and his bachelor’s from the Sharif University of Technology. Before joining QCRI, he led NLP technical efforts at the Volkswagen Group and conducted part-time postdoctoral research at the Helmholtz Research Center for Infection Research. His previous roles include research positions at MIT’s CSAIL, MIT Brain and Cognitive Sciences, the NLP group at LMU Munich, ABB Research, and the University of Illinois at Urbana-Champaign’s Singapore research center (ADSC).

Dr. Sharareh Alipour

Khatam University, TEIAS

Title and Abstract

Title: Partial Coloring Complex, Vertex Decomposability and Tverberg’s Theorem with Constraints

Abstract:

We present a novel family of simplicial complexes associated with the graph coloring problem. They include many well-known simplicial complexes such as chessboard complexes. We then study conditions under which these complexes become vertex decomposable and hence shellable. The connectivity of these complexes is also investigated. We apply these results to Tverberg’s theorem with constraints. Notably, we prove a conjecture of Engstro ̈m and Nor ́en on Tverberg graphs.

Biography

Sharareh Alipour is an assistant professor at Tehran Institute for Advanced Studies. She received her B.Sc., M.Sc. and Ph.D. from Sharif University, Iran, in 2009, 2011 and 2016, respectively. She was a postdoc at Institute for Research in Fundamental Sciences (IPM) and Institute of Science and Technology, Austria (ISTA). Her main research area is theoretical computer science, algorithms and discrete math.

Dr. Omid Etesami

Institute for Research in Fundamental Sciences(IPM)

Title and Abstract

Title: High-dimensional algorithmic optimal transport

Abstract:

Optimal transport is the problem of moving a mass of objects from an initial mass distribution to a final mass distribution with minimum cost. The input to the problem is the initial and final distributions, as well as the distance metric or cost of transportation between each initial position and each final position. We should match points in the initial mass with points in the final mass so as to minimize the total cost. This problem was first proposed in the context of economics and recently in the context of machine learning to compare and transform probability distributions.
We will explain a new method for computing transportations between distributions in high dimensions. This approach is different from previous works in that it does not assume that the distributions are given explicitly. Rather, it assumes that we can just query or sample the distribution (through what is often called an oracle in the computer science literature.) This allows the method to work for inputs that may be exponentially larger than explicitly given inputs. The simple main technique behind our method is to change the components in the initial vector point one by one and as little as possible, while making sure we attain the final distribution. Our method can be turned into a specific algorithm for some parameterized classes of distributions and distance metrics. For each class, our method only guarantees that the transportation cost is not worse than the optimal transport of a worst-case instance among that class up to a constant multiplicative factor. We also mention the relationship of this work with previous work on computational concentration of measure, which appeared in the context of adversarial machine learning.
Based on ongoing work with Salman Beigi, Amir Najafi, and Mohammad Mahmoody.

Biography

Omid Etesami graduated from Sharif University of Technology in 2004 with a B.S. in Computer Engineering, and from University of California, Berkeley with a Ph.D. in Computer Science in 2010 under the supervision of Luca Trevisan. During Ph.D. he won the Microsoft graduate fellowship, and worked at Microsoft research supervised by Jennifer Chayes. He later held postdoctoral positions at EPFL, Switzerland under Amin Shokrollahi, and later at IPM (Institute for Research in Fundamental Sciences, Tehran), where he later joined as a faculty member, and where he is now an associate professor at the School of Mathematics. Among his honors is being the co-author of a paper that was selected as a best paper of 2014 by ACM computing surveys. His research interests are in theory of computing, especially as related to probability theory, in different applications domains including machine learning, cryptography, coding theory, pseudo randomness, auctions, and role of information in games.

Dr. Rezvan Farahibozorg

Oxford University

Title and Abstract

Title: Next Generation Machine Learning Techniques for Brain Function Mapping

Abstract:

Functional brain imaging from large populations, as e.g., made available by the UK Biobank with expected 100,000 participants, provides unprecedented resources to examine the brain at population-scale. This holds great promise for addressing fundamental questions in brain health: how differences in brain function result in individual variability in cognition and brain disorders? To leverage this potential, we need new machine learning techniques that can scale to these data and extract accurate, meaningful, clinically relevant characteristics for populations and individuals. In this talk, I will first present an overview of the latest advances in this field. I will then provide details of our proposed framework, Probabilistic Functional Modes, which uses hierarchical Bayesian models to model the brain function in big populations and individuals simultaneously. I will show the model’s utility for: a) capturing cross-individual variability in brain function; b) capturing multiscale information processing in the brain; c) making predictions about individualistic traits. I will finally present an overview of some of the outstanding challenges and future directions in this field.

Biography

Rezvan Farahibozorg is a principal investigator at the Wellcome Centre for Integrative Neuroimaging, Oxford University. She holds a PhD in Brain Imaging Methods from Cambridge University and completed her BSc and MSc at Amirkabir University of Technology. Her research interests include developing new data analysis techniques for non-invasive brain imaging, such as Magnetoencephalography and functional MRI, and their application in neuroscience and brain health. Her current research programme aims to design new machine learning tools that can use the power of big data to yield personalized models of how the brain function varies from one person to another, and how this information can be used to make predictions about traits (e.g., IQ) and disease (i.e. Dementia).

Dr. Amir Goharshady

Hong Kong University of Science and Technology

Title and Abstract

Title: Scalable Program Analysis via Parameterization

Abstract:

Many classical tasks in compiler optimization, program analysis and formal verification are formalized in terms of graph problems, usually over the control-flow or call graphs of programs. Examples include data-flow analyses (such as null-pointer and reaching definitions), register allocation, and the entire framework of algebraic program analysis (APA). The resulting graph problems often end up being NP-hard. Even when a PTIME solution exists, it is not usually linear-time and fails to scale up to handle modern software systems with hundreds of millions of lines of code.
As it turns out, control-flow and call graphs of programs are often sparse and exhibit certain desirable structures, such as tree-likeness, which can be exploited to obtain much faster algorithms for these classical tasks. In this talk, we formalize the sparsity of graphs arising in programs in terms of treewidth, pathwidth and treedepth and present new bounds and algorithms that scale lightweight formal methods to billions of lines of code.

Biography

Amir Goharshady is an Assistant Professor of Computer Science and Mathematics at the Hong Kong University of Science and Technology. His research focuses on formal program verification, parameterized algorithms, algebro-geometric and martingale-based methods in computer science and, most recently, verification of blockchain protocols and smart contracts. See https://amir.goharshady.com/ for more details.

Dr. Ramtin Khosravi

University of Tehran

Title and Abstract

Title: Efficient Construction of Family-Based Behavioral Models

Abstract:

Family-based behavioral models capture the behavior of a software product line in a single model, incorporating the variability among the products. Constructing such a model is sometimes a result of merging the behavioral models of individual products, which may be obtained from a model learning process. To make this construction more efficient, one may improve the efficiency of model learning and/or the merging of the models. In this presentation, we give a brief overview on our previous results on how to make model learning faster, and have a deeper discussion on our more recent work on how to merge product models into a family-based model more efficiently. An important step in model merging is to identify which pair of states from different models are similar enough to be merged. We show that computing similarity of the state pairs based on local information improves efficiency while keeping the merged model reasonable.

Biography

Ramtin Khosravi is an assistant professor at the School of ECE, University of Tehran. His research interests are mainly in modeling, verification, and testing of asynchronous distributed systems and modeling and analysis of software product lines. He has been active in the software development industry for 20 years in several domains such as automotive industry, eLearning, and financial domains. He received his Ph.D. in 2005 from Sharif University of Technology.

The talk will be co-presented with Shaghayegh Tavassoli

Bio of co-speaker: Shaghayegh Tavassoli is a Ph.D. candidate in Software Engineering at the School of ECE, University of Tehran. Her research interests are model learning and constructing the behavioral models of software product lines. She is also an invited lecturer at the University of Tehran.

Dr. Babak Majidi

Khatam University

Title and Abstract

Title: Data Science of Digital Twin Earth

Abstract:

In the next few decades, the impact of human activities amplified by climate change will create significant challenges for various Earth ecosystems. These challenges will produce negative feedback loops which intensify the stress on biodiversity and natural habitats as well as endangering critical resources such as food, fresh water, healthy air and livable areas for humans. Descriptive, predictive and prescriptive modeling of the Earth sub-systems as well as the impact of human activities on these systems can provide solutions to address these environmental challenges. Digital Twin of Earth is one of the methods to address these challenges by visualizing, monitoring and forecasting natural systems and human activities on Earth. This presentation provides an introduction to methods for modeling Earth sub-systems using data science and machine learning. The discussed frameworks and models can help researchers to provide intelligent solutions for smart agriculture, food security, water resource management, air pollution management as well as regenerative solutions for improving natural ecosystems.

Biography

Babak Majidi is an Associate Professor of Computer Engineering at the Khatam University, Tehran, Iran. He received his B.Sc. and M.Sc. degrees in Computer Engineering from the University of Tehran, Tehran, Iran and his Ph.D. degree from the Swinburne University of Technology, Melbourne, Australia. He joined Khatam University in 2014. His research interests include applications of machine learning and digital twins in smart agriculture and regenerative environmental management; and applications of these technologies and extended reality in Education 6.0. He is currently the director of the Smart Digital Reality Laboratory at Khatam University. He is a co-author of more than 70 research articles.

Dr. Mohammad Taher Pilehvar

Khatam University, TEIAS

Title and Abstract

Title: Interpreting Transformer Decisions

Abstract:

Deep learning models, often perceived as inscrutable black boxes, offer predictions without insights into their decision-making processes. This talk addresses the critical need for model interpretability, providing an overview of some of the recent techniques designed to shed light on how these models arrive at their conclusions. I will specifically delve into recent advancements in backward and forward methods for evaluating token attribution and context mixing, with a particular emphasis on Transformer models due to their prevalent role in current deep learning research. This exploration not only aims to demystify the operational intricacies of Transformers but also to highlight the importance of transparency in the development and deployment of AI systems.

Biography

Mohammad Taher Pilehvar is an Assistant Professor at Tehran Institute for Advanced Studies. Taher’s research is mainly in the areas of Lexical Semantics and Interpretability where his work has been recognized by two best paper award nominations at ACL 2013 and 2017, and an AIJ 2023 Prominent Paper award. He is the lead author of a synthesis book on embeddings in NLP and has served as Program/General Chairs of *SEM 2022/2023.

Dr. Mohammad Hossein Rohban

Sharif University of Technology

Title and Abstract

Title: Robust Out-of-distribution Detection

Abstract:

Out-of-distribution (OOD) detection refers to the problem of identifying samples that significantly differ from training samples at inference. Recently, several effective methods have been suggested to solve this problem in the area of Computer Vision. However, such methods are typically fragile when they face imperceptible adversarial perturbations in their inputs. This issue would cause a violation of their purpose, which was to enhance trustworthiness through input monitoring and rejection of OOD samples that the model might not produce a valid response for. Here, we aim to address this problem through a well known technique called Outlier Exposure, which is to utilize samples believed to be OOD during training. We show, both theoretically and empirically, that such samples need to satisfy certain conditions to be effective for this purpose: 1. Near-distribution (training OOD samples be close to the normal samples); 2. OOD samples diversity; 3. Semantic deviation of OOD samples from the normal ones. We leverage simple text-to-image models along with a text description of normal samples to generate training OOD samples that adhere to the mentioned criteria. When trained with regular adversarial training, the proposed OOD detection method exhibits significantly improved robustness against strong adversarial attacks. We showcase method performance against a wide diversity of datasets in Computer Vision, demonstrating its generality and universality.

Biography

Mohammad Hossein Rohban is an assistant professor in Computer Engineering at Sharif University of Technology. His research interests lie in trustworthiness in machine learning and medical imaging.

Dr. Mehrnoush Shamsfard

Shahid Beheshti University

Title and Abstract

Title: Evaluating Large Language Models

Abstract:

The emergence of large language Models (LLMs) has revolutionized natural language processing (NLP) applications cross various domains. The widespread adoption of these models necessitates a thorough understanding of their capabilities and limitations. However, assessing their performance and capabilities presents a myriad of challenges. In this talk, we delve into the intricate process of evaluating language models from various perspectives and will explore the diverse metrics, benchmarks, and methodologies employed. From traditional measures like perplexity and accuracy to more nuanced evaluations such as fluency and coherence, hallucination and bias, knowledge and reasoning capabilities and considering the ethical implications and societal impact. We will then have a look at the evaluation results of some English LLMs and talk about available Persian LLMs and their elementary evaluation results.

Biography

Dr. Mehrnoush Shamsfard has received her BS and MSc both on computer software engineering from Sharif University of Technology and her PhD in Computer Engineering- Artificial Intelligence from AmirKabir University of Technology, Tehran, Iran.

She has been with Shahid Beheshti University from 2004. She is currently associate professor of Faculty of computer science and engineering, and also the head of NLP research Laboratory of this faculty. Her main fields of interest are natural language processing with special focus on the Persian language, evaluating NLP resources and products, developing intelligent assistants and chatbots, knowledge engineering (ontologies and knowledge graphs), text mining and semantic and intelligent web.

Schedule

Day 1 – April 16

Day 2 – April 17

Day 3 – April 18

Day 1 – April 16

Day 2 – April 17

Day 3 – April 18

Tehran Institute for Advanced Studies (TEIAS)

/ Frontiers in Computer and Data Sciences

Symposium