There is a great deal of interest in analyzing data that is best represented as a graph.
Examples include the WWW, social networks, biological networks, communication networks,
transportation networks, energy grids, and many others. These graphs are typically
multi-modal, multi-relational and dynamic. In the era of big data, the importance of being
able to effectively mine and learn from such data is growing, as more and more structured
and semi-structured data is becoming available. The workshop serves as a forum for
researchers from a variety of fields working on mining and learning from graphs to share and
discuss their latest findings.
There are many challenges involved in effectively mining and learning from this kind of
data, including:
Understanding the different techniques applicable, including graph mining algorithms,
network embeddings, graphical models, latent variable models, matrix factorization
methods and more.
Dealing with the heterogeneity of the data.
The common need for information integration and alignment.
Handling dynamic and changing data.
Addressing each of these issues at scale.
Traditionally, a number of subareas have contributed to this space: communities in graph
mining, learning from structured data, statistical relational learning, inductive logic
programming, and, moving beyond subdisciplines in computer science, social network analysis,
and, more broadly network science.
Schedule
Long Beach Convention & Entertainment Center - Grand B
Morning Sessions
8:00-8:15am
Opening Remarks
8:15-9:00am
Keynote:
Jie Tang
Self-supervised Learning and Pre-training on Graphs
9:00-9:30am
Contributed Talks
Qingkai Zeng et al.
Completing Taxonomies with Relation-Aware Mutual Attentions
Rishabh Jain et al.
Neural Priority Queues for Graph Neural Networks (GNNs)
Panel
Liang Zhao (Emory), Zhangyang Wang (UT Austin), Wei Cheng (NEC lab), Meng Jiang (Notre Dame)
11:30-12:00pm
Poster Session 1
12:00-1:00pm
Lunch Break
Afternoon Sessions
1:00-1:45pm
Keynote:
Karthik Subbian
Practical Challenges in Graph Representation Learning
1:45-2:15pm
Contributed Talks
Prasita Mukherjee et al.
OCTAL: Graph Representation Learning for LTL Model Checking
Jing Zhu et al.
SpotTarget: Rethinking the Effect of Target Edges for Link Prediction in GNNs
2:15-3:00pm
Poster Session 2
3:00-3:30pm
Coffee Break
3:30-4:15pm
Keynote:
Leman Akoglu
Expressive, Scalable, and Interpretable Graph Embeddings
4:15-4:40pm
Closing Remarks
Keynote Speakers
Jie Tang
Professor Tsinghua University
Wei Wang
Professor UCLA
Leman Akoglu
Associate Professor CMU
Karthik Subbian
Senior Principal Scientist Amazon
Accepted Papers
Active Learning for Graphs with Noisy StructuresPDF Hongliang Chi, Cong Qi, Suhang Wang and Yao Ma
Abstract: Graph Neural Networks (GNNs) have seen significant success in tasks like node classification, primarily dependent on the availability of abundant labeled nodes. However, the excessive cost of labeling large-scale graphs led to a focus on active learning methods on graphs, which aim for efficient data selection. While most methods assume reliable graph topology, real-world scenarios often present noisy graphs. Designing an active learning framework for noisy graphs is challenging, as selecting data for labeling and obtaining a clean graph are naturally interdependent: selecting high-quality data requires clean graph structure while cleaning noisy graph structure needs adequate labeled data. Considering the challenge mentioned above, we propose a robust active learning framework named GALClean that adopts an iterative approach to achieve data selection and graph purification simultaneously. Furthermore, we summarize GALClean as an instance of the Expectation-Maximization (EM) algorithm, which provides a theoretical understanding of the design and mechanisms in GALClean. Extensive experiments have demonstrated the effectiveness and robustness of our proposed method.
Keywords: Graph Neural Networks, Active Learning, Noisy Learning
@inproceedings{mlg2023_9,
title={Active Learning for Graphs with Noisy Structures},
author={Hongliang Chi, Cong Qi, Suhang Wang and Yao Ma},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
SpotTarget: Rethinking the Effect of Target Edges for Link Prediction in GNNsPDF Jing Zhu, Yuhang Zhou, Vassilis Ioannidis, Shengyi Qian, Wei Ai, Xiang Song and Danai Koutra
Abstract: Graph Neural Networks (GNNs) have demonstrated promising outcomes across various tasks, including node classification and link prediction. However, despite their remarkable success in various high-impact applications, we have identified three common pitfalls in message passing for link prediction, especially within industrial settings. Particularly, in prevalent GNN frameworks (e.g., DGL and PyTorch-Geometric), the target edges (i.e., the edges being predicted) consistently exist as message passing edges in the graph during training. Consequently, this results in overfitting and distribution shift, both of which adversely impact the generalizability to test the target edges. Additionally, during test time, the failure to exclude the test target edges leads to implicit test leakage caused by neighborhood aggregation. In this paper, we analyze these three pitfalls and investigate the impact of including or excluding target edges on the performance of nodes with varying degrees during training and test phases. Our theoretical and empirical analysis demonstrates that low-degree nodes are more susceptible to these pitfalls. These pitfalls can have detrimental consequences when GNNs are implemented in production systems. To systematically address these pitfalls, we propose SppotTarget, an effective and efficient GNN training framework. During training, SpotTarget leverages our insight regarding low-degree nodes and excludes train target edges connected to at least one low-degree node. During test time, it emulates real-world scenarios of GNN usage in production and excludes all test target edges. Our experiments conducted on diverse real-world datasets, including e-commerce graphs, demonstrate that SpotTarget significantly enhances GNNs, achieving up to a 15 times increase in accuracy in sparse graphs. Furthermore, SpotTarget consistently and dramatically improves the performance of GNN models for low-degree nodes in dense graphs.
Keywords: Link Prediction, Graph Neural Network, Overfitting, Distribution Shift, Data Leakage
@inproceedings{mlg2023_3,
title={SpotTarget: Rethinking the Effect of Target Edges for Link Prediction in GNNs},
author={Jing Zhu, Yuhang Zhou, Vassilis Ioannidis, Shengyi Qian, Wei Ai, Xiang Song and Danai Koutra},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Completing Taxonomies with Relation-Aware Mutual AttentionsPDF Qingkai Zeng, Zhihan Zhang, Jinfeng Lin and Meng Jiang
Abstract: Taxonomies serve many applications with a structural representation of knowledge. To incorporate emerging concepts into existing taxonomies, the task of taxonomy completion aims to find suitable positions for emerging query concepts. Previous work captured homogeneous token-level interactions inside a concatenation of the query concept term and definition using pre-trained language models. However, they ignored the token-level interactions between the term and definition of the query concepts and their related concepts. In this work, we propose to capture heterogeneous token-level interactions between the different textual components of concepts that have different types of relations. We design a relation-aware mutual attention module (RAMA) to learn such interactions for taxonomy completion. Experimental results demonstrate that our new taxonomy completion framework based on RAMA achieves the state-of-the-art performance on six taxonomy datasets. This paper belongs to "Application and analysis - Knowledge Graph Construction", and in the "Novel research paper" category.
@inproceedings{mlg2023_10,
title={Completing Taxonomies with Relation-Aware Mutual Attentions},
author={Qingkai Zeng, Zhihan Zhang, Jinfeng Lin and Meng Jiang},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
FiGURe: Simple and Efficient Unsupervised Node Representations with Filter AugmentationsPDF Chanakya Ekbote, Ajinkya P. Deshpande, Arun Iyer, Ramakrishna Bairi and Sundararajan Sellamanickam
Abstract: Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We show significant improvements using these augmentations. Further, we show that sharing the same weights across these different filter augmentations is possible, reducing the computational load. In addition, previous works have shown that good performance on downstream tasks requires high dimensional representations. Working with high dimensions increases the computations, especially when multiple augmentations are involved. We mitigate this problem and recover good performance through lower dimensional embeddings using simple random Fourier feature projections. Our method, FiGURe, achieves an average gain of up to 4.4%, compared to the state-of-the-art unsupervised models, across all datasets in consideration, both homophilic and heterophilic.
@inproceedings{mlg2023_15,
title={FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations},
author={Chanakya Ekbote, Ajinkya P. Deshpande, Arun Iyer, Ramakrishna Bairi and Sundararajan Sellamanickam},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Seq-HyGAN: Sequence Classification via Hypergraph Attention NetworkPDF Khaled Mohammed Saifuddin, Corey May, Farhan Tanvir, Muhammad Ifte Khairul Islam and Esra Akbas
Abstract: Extracting meaningful features from sequences and devising effective similarity measures are vital for sequence data mining tasks, particularly sequence classification. While Neural Network models are commonly used to learn features of sequence automatically, they are limited to capturing adjacent structural connection information and ignore global, higher-order information between the sequences. To address these challenges, we propose a novel Hypergraph Attention Network model, namely Seq-HyGAN, for sequence classification problems. To capture the complex structural similarity between sequence data, we create a novel hypergraph model by defining higher-order relations between subsequences extracted from sequences. Subsequently, we introduce a Sequence Hypergraph Attention Network that learns sequence features by considering the significance of subsequences and sequences to one another. Through extensive experiments, we demonstrate the effectiveness of our proposed Seq-HyGAN model in accurately classifying sequence data, outperforming several state-of-the-art methods by a significant margin. This paper belongs to "Algorithms and methods- Graph neural networks and graph representation learning" and is in the "Novel research papers" category.
@inproceedings{mlg2023_17,
title={Seq-HyGAN: Sequence Classification via Hypergraph Attention Network},
author={Khaled Mohammed Saifuddin, Corey May, Farhan Tanvir, Muhammad Ifte Khairul Islam and Esra Akbas},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphsPDF Ahmad Naser Eddin, Jacopo Bono, David Aparício, Hugo Ferreira, João Ascensão, Pedro Ribeiro and Pedro Bizarro
Abstract: Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks.
Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints, a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving up to 9 times faster inference in our experimental setting.
@inproceedings{mlg2023_29,
title={From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs},
author={Ahmad Naser Eddin, Jacopo Bono, David Aparício, Hugo Ferreira, João Ascensão, Pedro Ribeiro and Pedro Bizarro},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
A Heterogeneous Graph-based Framework for Scalable Fraud DetectionPDF Phanindra Reddy Madduru and Naveed Janvekar
Abstract: The rise of online marketplaces has led to increased concerns regarding the presence of bad actors involved in counterfeit or engage in fraudulent activities. While efforts are being made by organizations to monitor and address these issues, bad actors persistently find new ways to engage in fraudulent behavior, including creating new accounts using different credentials, account hijacking etc. To combat this issue, our study proposes the use of Heterogeneous Relational Graph Convolutional Networks (HRGCN) to identify risky relationships among entities like sellers or customers. By leveraging this advanced graph-based approach, we aim to enhance the detection and mitigation of fraudulent behavior on the e-commerce marketplaces. The HRGCN model is designed to detect sellers with risky associations with other known bad sellers by analyzing various connecting edges such as encrypted device and identity credentials. With the rapid growth of e-commerce stores, the number of sellers has witnessed an exponential increase, leading to a significant expansion in their social networks formed by sharing various relationships such as digital contact information, communication channels and devices. This has made it challenging to process the data with the direct implementation of HRGCN. This highlights the importance of model scalability in handling large datasets. To address this issue, we have introduced a novel mini-batch version of HRGCN variant that works in tandem with a neighborhood sampler, which is optimized to run on GPUs, significantly reducing the training time by 70%. This mini-batch version of HRGCN maintains and/or improves the performance of the model while addressing the scalability issue, making it an efficient solution for handling large datasets. In this paper, we compare the performance of three models: a benchmark model based on Random Forest trained on seller node features alone, HRGCN trained on Full batch, and HRGCN with mini-batch implementation. The findings of our experiments reveal that the HRGCN models outperform the benchmark model with a significant improvement in both F1-score and Recall. Specifically, the HRGCN models show an impressive increase in recall by approximately 115% compared to the baseline model. Moreover, the mini-batch HRGCN model demonstrated substantial improvement in performance over the full batch HRGCN model, achieving a 16% higher F1 score and an 8% higher PR AUC score. These results emphasize the effectiveness of using a mini-batch approach to handle large datasets and detecting related bad sellers.
@inproceedings{mlg2023_4,
title={A Heterogeneous Graph-based Framework for Scalable Fraud Detection},
author={Phanindra Reddy Madduru and Naveed Janvekar},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Editable Graph Neural Network for Node ClassificationsPDF Zirui Liu, Zhimeng Jiang, Shaochen Zhong, Kaixiong Zhou, Li Li, Rui Chen, Soo-Hyun Choi and Xia Hu
Abstract: Despite Graph Neural Networks (GNNs) have achieved prominent success in many graph-based learning problems, such as credit risk assessment in financial networks and fake news detection in social networks. However, the trained GNNs still make errors and these errors may cause serious negative impact on society. \textit{Model editing}, which corrects the model behavior on wrongly predicted target samples while leaving model predictions unchanged on unrelated samples, has garnered significant interest in the fields of computer vision and natural language processing. However, model editing for graph neural networks (GNNs) is rarely explored, despite GNNs' widespread applicability.
To fill the gap, we first observe that existing model editing methods significantly deteriorate prediction accuracy (up to $50\%$ accuracy drop) in GNNs while a slight accuracy drop in multi-layer perception (MLP). The rationale behind this observation is that the node aggregation in GNNs will spread the editing effect throughout the whole graph. This propagation pushes the node representation far from its original one.
Motivated by this observation, we propose \underline{E}ditable \underline{G}raph \underline{N}eural \underline{N}etworks (EGNN), a neighbor propagation-free approach to correct the model prediction on misclassified nodes. Specifically, EGNN simply stitches an MLP to the underlying GNNs, where the weights of GNNs are frozen during model editing. In this way, EGNN disables the propagation during editing while still utilizing the neighbor propagation scheme for node prediction to obtain satisfactory results.
Experiments demonstrate that EGNN outperforms existing baselines in terms of effectiveness (correcting wrong predictions with lower accuracy drop), generalizability (correcting wrong predictions for other similar nodes), and efficiency (low training time and memory) on various graph datasets.
@inproceedings{mlg2023_7,
title={Editable Graph Neural Network for Node Classifications},
author={Zirui Liu, Zhimeng Jiang, Shaochen Zhong, Kaixiong Zhou, Li Li, Rui Chen, Soo-Hyun Choi and Xia Hu},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Topological Representation Learning for E-commerce Shopping BehaviorsPDF Yankai Chen, Quoc-Tuan Truong, Xin Shen, Ming Wang, Jin Li, Jim Chan and Irwin King
Abstract: Learning compact representation from customer shopping behaviors is at the core of web-scale E-commerce recommender systems. At Amazon, we put great efforts into learning embedding of customer engagements in order to fuel multiple downstream tasks for better recommendation services. In this work, we define the notion of shopping trajectory that consists of customer interactions at the categorical level of products, then construct an end-to-end model namely C-STAR which is capable of learning rich embedding for representing the variable-length customer trajectory. C-STAR explicitly captures the inter-trajectory distribution similarity and intra-trajectory semantic correlation, providing a coarse-to-fine trajectory representation learning paradigm both structurally and semantically. We evaluate the model on Amazon proprietary data as well as four public datasets, where the learned embeddings have shown to be effective for customer-centric tasks including customer segmentation and shopping trajectory completion. These tasks empower multiple personalized shopping experiences for our customers. This paper belongs to "Application and analysis: Large-scale analysis and modeling", and in the “Novel research papers” category.
@inproceedings{mlg2023_13,
title={Topological Representation Learning for E-commerce Shopping Behaviors},
author={Yankai Chen, Quoc-Tuan Truong, Xin Shen, Ming Wang, Jin Li, Jim Chan and Irwin King},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
GEANN: Scalable Graph Augmentations for Multi-Horizon Time Series ForecastingPDF Sitan Yang, Malcolm Wolff, Shankar Ramasubramanian, Ronak Mehta and Michael Mahoney
Abstract: Encoder-decoder deep neural networks have been increasingly studied for multi-horizon time series forecasting especially in real-world applications. However, these sophisticated neural forecasters typically rely on a large number of time series examples with substantial history to forecast accurately. A rapidly growing topic of interest is forecasting time series which lack sufficient historical data---often referred to as the ``cold start'' problem. In this paper, we introduce a novel yet simple method to address this problem by leveraging graph neural networks (GNNs) as a data augmentation for enhancing the encoder used by such forecasters. These GNN-based features can capture complex inter-series relationships and their generation process is optimized end-to-end with the forecasting task. We show that our architecture can use either data-driven or domain knowledge defined graphs, scaling to jointly incorporate information from multiple very large graphs with millions of nodes. In our target application of demand forecasting for a large e-commerce retailer, we demonstrate on both a small dataset of 100K products and a large dataset with over 2 million products that our method improves overall performance over competitive baseline models. More importantly, we show that it brings substantially more gains to ``cold start'' products such as those newly launched or recently out-of-stock.
Keywords: Time Series, Multi-Horizon Forecasting, Graph Neural Networks, Data Augmentation
@inproceedings{mlg2023_16,
title={GEANN: Scalable Graph Augmentations for Multi-Horizon Time Series Forecasting},
author={Sitan Yang, Malcolm Wolff, Shankar Ramasubramanian, Ronak Mehta and Michael Mahoney},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Fair Online Dating Recommendations for Sexually Fluid Users via Leveraging Opposite Gender Interaction RatioPDF Yuying Zhao, Yu Wang, Yi Zhang, Pamela Wisniewski, Charu Aggarwal and Tyler Derr
Abstract: Novel research paper:
Online dating platforms have gained widespread popularity as a means for individuals to seek potential romantic relationships. While recommender systems have been designed to improve the user experience in dating platforms by providing personalized recommendations, increasing concerns about fairness have encouraged the development of fairness-aware recommender systems from various perspectives (e.g., gender and race). However, sexual orientation, which plays a significant role in finding a satisfying relationship, is under-investigated. To fill this crucial gap, we propose a novel metric, Opposite Gender Interaction Ratio (OGIR), as a way to investigate potential unfairness for users with varying/fluid preferences towards the opposite gender. We empirically analyze a real online dating dataset and observe existing recommender algorithms could suffer from group unfairness according to OGIR. We further investigate the potential causes for such gaps in recommendation quality, which lead to the challenges of group data imbalance and group calibration imbalance. Ultimately, we propose a fair recommender system based on re-weighting and re-ranking strategies to respectively mitigate these associated imbalance challenges. Experimental results demonstrate both strategies improve fairness while their combination achieves the best performance towards maintaining model utility while improving fairness.
Keywords: Fair Recommendations, Online Dating Networks, Social Network Analysis
@inproceedings{mlg2023_22,
title={Fair Online Dating Recommendations for Sexually Fluid Users via Leveraging Opposite Gender Interaction Ratio},
author={Yuying Zhao, Yu Wang, Yi Zhang, Pamela Wisniewski, Charu Aggarwal and Tyler Derr},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Compact Interpretable Tensor Graph Multi-Modal News EmbeddingsPDF Dawon Ahn, William Shiao, Andrew Bauer, Arindam Khaled, Stefanos Poulis and Evangelos Papalexakis
Abstract: Online news articles encompass a variety of modalities such as text and images. How can we learn a representation that incorporates information from all those modalities in a compact and interpretable manner, while also being useful in a variety of downstream tasks? Recent advances in Large Language and Vision Models have made it possible to represent image and text data as embeddings, which can then be used to perform downstream tasks. Despite these developments, these embedding models tend to generate high-dimensional embeddings, making them problematic in terms of compactness and interpretability.
In this paper, we propose CITEM (Compact Interpretable Tensor graph multi-modal news EMbedding), which is a novel framework for multi-modal news representations via tensor decomposition in a compact and interpretable way. CITEM generates a tensor graph consisting of a news similarity graph for each modality and employs a tensor decomposition to produce compact and interpretable embeddings, each dimension of which is a heterogeneous co-cluster of news articles and corresponding modalities. Interpretability and compactness are key, since our proposed embeddings contain few dimensions which lend themselves to inspection and explanation. Traditional tensor analysis has so far been restricted to transductive learning scenarios (e.g., in the form of semi-supervised learning), but CITEM includes two variants for inductive learning, which essentially enables us to represent unseen news articles. We extensively validate CITEM compared to baselines on two news classification tasks: misinformation news detection and news categorization. The experimental results show that CITEM performs within the same range of AUC as state-of-the-art baselines while producing 7× to 10.5× more compact embeddings. In addition, each embedding dimension of CITEM is interpretable, representing a latent co-cluster of articles.
@inproceedings{mlg2023_24,
title={Compact Interpretable Tensor Graph Multi-Modal News Embeddings},
author={Dawon Ahn, William Shiao, Andrew Bauer, Arindam Khaled, Stefanos Poulis and Evangelos Papalexakis},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Spectral Clustering of Attributed Multi-relational Graphs*PDF Ylli Sadikaj, Yllka Velaj, Sahar Behzadi and Claudia Plant
Abstract: Graph clustering aims at discovering a natural grouping of the nodes such that similar nodes are assigned to a common cluster. Many different algorithms have been proposed in the literature: for simple graphs, for graphs with attributes associated to nodes, and for graphs where edges represent different types of relations among nodes. However, complex data in many domains can be represented as both attributed and multi-relational networks.
In this paper, we propose SpectralMix, a joint dimensionality reduction technique for multi-relational graphs with categorical node attributes. SpectralMix integrates all information available from the attributes, the different types of relations, and the graph structure to enable a sound interpretation of the clustering results. Moreover, it generalizes existing techniques: it reduces to spectral embedding and clustering when only applied to a single graph and to homogeneity analysis when applied to categorical data.
Experiments conducted on several real-world datasets enable us to detect dependencies between graph structure and categorical attributes, moreover, they exhibit the superiority of SpectralMix over existing methods.
The full version of this paper has also appeared in the Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery
Data Mining, KDD’21, and received the Student Best Paper Award.
∗This paper won the Student Best Paper Award at KDD ’21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, August 2021, Pages 1431–1440, https://doi.org/10.1145/3447548.34673
@inproceedings{mlg2023_27,
title={Spectral Clustering of Attributed Multi-relational Graphs*},
author={Ylli Sadikaj, Yllka Velaj, Sahar Behzadi and Claudia Plant},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Data Sampling using Locality Sensitive Hashing for Large Scale Graph LearningPDF Sarath Shekkizhar, Neslihan Bulut, Mohamed Farghal, Sasan Tavakkol, Mohammadhossein Bateni and Animesh Nandi
Abstract: An important step in graph-based data analysis and processing is the construction of similarity graphs. Recent works, such as [7, 23 ], have focused on the semi-supervised setting to learn an optimal similarity function for constructing a task-optimal graph. However, in many scenarios with billions of data points and trillions of potential edges, the run-time and computational requirements for training the similarity model make these approaches impractical. In this work, we consider data sampling as a means to overcome this issue. Unlike typical sampling use-cases which only seek diversity, the similarity-learning for graph construction problem requires data samples that are both diverse and representative of highly similar data points. We present an efficient sampling approach by taking an adaptive partition view of locality sensitive hashing. Theoretically, we show that, though the samples obtained are correlated with sampling probabilities that do not sum to one, the training loss estimated for learning the graph similarity model using our approach is unbiased with a smaller variance compared to random sampling. Experiments on public datasets demonstrate the superior generalization of similarity models learned via our sampling. In a real large-scale industrial abuse-detection example, we observe ≈10× increase in identifying abusive items while having a lower false positive rate compared to the baseline.
@inproceedings{mlg2023_2,
title={Data Sampling using Locality Sensitive Hashing for Large Scale Graph Learning},
author={Sarath Shekkizhar, Neslihan Bulut, Mohamed Farghal, Sasan Tavakkol, Mohammadhossein Bateni and Animesh Nandi},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Multi-Task Learning on Heterogeneous Graph Neural Network for Substitute RecommendationPDF Tianchen Zhou, Michinari Momma, Chaosheng Dong, Fan Yang, Chenghuan Guo, Jin Shang and Jia Liu
Abstract: Substitute recommendation in e-commerce has attracted increasing
attention in recent years, to help improve customer experience. In
this work, we propose a multi-task graph learning framework that
jointly learns from supervised and unsupervised objectives with
heterogeneous graphs. Particularly, we propose a new contrastive
method that extracts global information from both positive and
negative neighbors. By feeding substitute signal data from different
sources to learning tasks with different focuses, our model learns
the representation of products that can be applied for substitute
identification under different substitutable criteria. We conduct ex-
periments on Amazon datasets, and the experiment results demon-
strate that our method outperforms all existing baselines in terms
of comprehensive performance among all metrics of interest.
@inproceedings{mlg2023_6,
title={Multi-Task Learning on Heterogeneous Graph Neural Network for Substitute Recommendation},
author={Tianchen Zhou, Michinari Momma, Chaosheng Dong, Fan Yang, Chenghuan Guo, Jin Shang and Jia Liu},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
DyG2Vec: Representation Learning for Dynamic Graphs with Self-SupervisionPDF Mohammad Ali Alomrani, Mahdi Biparva, Yingxue Zhang and Mark Coates
Abstract: Temporal graph neural networks have shown promising results in learning inductive representations by automatically extracting temporal patterns. However, previous works often rely on complex memory modules or inefficient random walk methods to construct temporal representations. In addition, the existing dynamic graph encoders are non-trivial to adapt to self-supervised paradigms, which prevents them from utilizing unlabeled data. To address these limitations, we present an efficient yet effective attention-based encoder that leverages temporal edge encodings and window-based subgraph sampling to generate task-agnostic embeddings. Moreover, we propose a joint-embedding architecture using non-contrastive SSL to learn rich temporal embeddings without labels. Experimental results on 7 benchmark datasets indicate that on average, our model outperforms SoTA baselines on the future link prediction task by 4.23% for the transductive setting and 3.30% for the inductive setting while only requiring 5-10x less training/inference time. Additionally, we empirically validate the SSL pre-training significance under two probings commonly used in language and vision modalities. Lastly, different aspects of the proposed framework are investigated through experimental analysis and ablation studies.
@inproceedings{mlg2023_14,
title={DyG2Vec: Representation Learning for Dynamic Graphs with Self-Supervision},
author={Mohammad Ali Alomrani, Mahdi Biparva, Yingxue Zhang and Mark Coates},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
A Large Scale Synthetic Graph Dataset Generation FrameworkPDF Sajad Darabi, Piotr Bigaj, Dawid Majchrowski, Artur Kasymov, Pawel Morkisz and Alex Fit-Florea
Abstract: Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many graph analysis tasks such as node and edge classification, link prediction, and clustering with numerous practical applications such as fraud detection, drug discovery, or recommender systems. Albeit there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications with trillions of edges and billions of nodes or are limited in their application domain.
In this work, we tackle this shortcoming by proposing a scalable synthetic graph generation tool. This tool can be used to learn a set of parametric models from proprietary datasets that can subsequently be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. Finally, the performance of the graph learning algorithms depends not only on the size but also on the graph datasets structure. We show how our framework generalizes across a set of datasets, mimicking both structural and feature distributions and the ability to scale them across varying dataset sizes. Code can be found at \href{https://github.com/}{https://github.com}
Keywords: Large Scale Dataset, Graphs, Synthetic Data, Generative Modeling
@inproceedings{mlg2023_19,
title={A Large Scale Synthetic Graph Dataset Generation Framework},
author={Sajad Darabi, Piotr Bigaj, Dawid Majchrowski, Artur Kasymov, Pawel Morkisz and Alex Fit-Florea},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
GraphBoost: Adaptive Boosting Node Generation for Class-Imbalanced GraphsPDF Yuhe Gao, Sheng Zhang and Rui Song
Abstract: Classification in imbalanced data, where the majority class has a much larger representation than the minority class, has been a significant topic in recent decades. Two popular approaches for handling this issue are (1) rebalancing the sizes of classes through reweighting, resampling, or synthetic nodes generating, and (2) focusing on the data points that are hard to classify to enhance the classifier performance. In graphical data, several methods, such as GraphSMOTE and GraphENS, from the first type have been developed recently for class-imbalanced node classification tasks, but few adaptations of the second approach have been proposed. In response to this gap , we present a novel multi-stage boosting framework inspired by the second approach. In particular, the framework proposed in this research paper jointly generates the topological structure and features of synthetic nodes by minimizing the distance of synthetic nodes and misclassified nodes from previous training stages. Our experiments on class-imbalanced graphs show that our novel framework outperforms standard graph neural networks. Furthermore, our framework can be combined with existing methods (such as GraphENS) resulting in further performance enhancements.
@inproceedings{mlg2023_11,
title={GraphBoost: Adaptive Boosting Node Generation for Class-Imbalanced Graphs},
author={Yuhe Gao, Sheng Zhang and Rui Song},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Effect of Deception in Influence Maximization and Polarization on Social Networks: A Sheaf Laplacian ApproachPDF Mehmet Aktas, Esra Akbas, Ashley Hahn and Mehmet Ahsen
Abstract: In the contemporary era of social media and online communication, comprehending the dynamics of information diffusion in social networks has become crucial. This research article investigates the effects of deception on information diffusion, specifically focusing on influence maximization and polarization in social networks. We propose an analytic model of deception within social networks. Building upon the sheaf Laplacian diffusion model derived from algebraic topology, we examine opinion dynamics in the presence of deception. Next, we redefine the Laplacian centrality, an influential node detection method originally designed for regular graphs, to quantify the influence of deception in influence maximization using the sheaf Laplacian. Additionally, we employ the sheaf Laplacian to model polarization in networks and investigate the impact of deception on polarization using two distinct polarization measures. Through extensive experiments conducted on synthetic and real-world networks, our findings suggest that deceptive individuals wield more influence than honest users within social networks. Furthermore, we demonstrate that deception amplifies polarization in networks, with influential individuals playing a significant role in deepening the polarization phenomenon.
Keywords: information diffusion, sheaf Laplacian, influence maximization, polarization, deception
@inproceedings{mlg2023_23,
title={Effect of Deception in Influence Maximization and Polarization on Social Networks: A Sheaf Laplacian Approach},
author={Mehmet Aktas, Esra Akbas, Ashley Hahn and Mehmet Ahsen},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Semi-Supervised Embedding of Attributed Multiplex Networks*PDF Ylli Sadikaj, Justus Rass, Yllka Velaj and Claudia Plant
Abstract: Complex information can be represented as networks (graphs) characterized by a large number of nodes, multiple types of nodes, and multiple types of relationships between them, i.e. multiplex networks. Additionally, these networks are enriched with different types of node features.
We propose a Semi-supervised Embedding approach for Attributed Multiplex Networks (SSAMN), to jointly embed nodes, node attributes, and node labels of multiplex networks in a low dimensional space. Network embedding techniques have garnered research attention for real-world applications. However, most existing techniques solely focus on learning the node embeddings, and only a few learn class label embeddings. Our method assumes that we have different classes of nodes and that we know the class label of some, very few nodes for every class. Guided by this type of supervision, SSAMN learns a low-dimensional representation incorporating all information in a large labeled multiplex network. SSAMN integrates techniques from Spectral Embedding and Homogeneity Analysis to improve the embedding of nodes, node attributes, and node labels. Our experiments demonstrate that we only need very few labels per class in order to have a final embedding that preserves the information of the graph. To evaluate the performance of SSAMN, we run experiments on four real-world datasets. The results show that our approach outperforms state-of-the-art methods for downstream tasks such as semi-supervised node classification and node
clustering.
This paper has also appeared in the Proceedings of the ACM
Web Conference 2023, former WWW.
∗This paper has appeared in WWW ’23: Proceedings of the ACM Web Conference
2023, April 2023, Pages 578–587, https://doi.org/10.1145/3543507.3583
@inproceedings{mlg2023_26,
title={Semi-Supervised Embedding of Attributed Multiplex Networks*},
author={Ylli Sadikaj, Justus Rass, Yllka Velaj and Claudia Plant},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
UGGS: A Unified Graph Generation Framework Based on Self-Supervised LearningPDF Sajad Ramezani and Soroor Motie
Abstract: Deep learning on graphs has gained interest in recent years. The applicability of graphs to model problems in various domains, such as chemical molecules, financial transactions, parse trees, etc., has encouraged researchers to develop and extend machine learning methods from other data modalities, such as text and image, to graphs. Generative models have been used extensively in recent years and have achieved significant milestones, especially in text and image generation. However, graph generative models have not been developed as extensively, and fundamental problems are still in the discussion phase. This work addresses some of these problems, such as the lack of an integrated framework and interpretable evaluation metric, by introducing a unified framework for the graph generation task. The base of the proposed framework is on the appropriate graph and node embeddings to estimate graphs' distribution. Hence it composes of graph neural networks to embed the nodes and graphs and also enhances the quality of graph embeddings via the introduction of pseudo tasks in a self-supervised fashion. Self-supervised techniques have proven useful in enhancing generative models to be more robust and generalizable. This work proposes several pseudo tasks and evaluates their performance on common graph datasets. It also emphasizes the problem of graph decoding and speculates that graph generation strategy matters, and one can establish more complex graph generation models to generate higher-quality graphs. It also proposes a distance metric in embedding space for generated graphs to filter out poorly generated data. In the end, the proposed framework achieves competitive results compared to previously proposed models while having fewer parameters and a shorter training time. We have also made our framework implementation available.
@inproceedings{mlg2023_20,
title={UGGS: A Unified Graph Generation Framework Based on Self-Supervised Learning},
author={Sajad Ramezani and Soroor Motie},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Graph Model Explainer ToolPDF Yudi Zhang, Phanindra Reddy Madduru, Naveed Janvekar and Nitika Bhaskar
Abstract: Graph Neural Networks (GNNs) have gained popularity in various fields, such as recommendation systems, social network analysis and fraud detection. However, despite their effectiveness, the topological nature of GNNs makes it challenging for users to understand the model predictions. To address this challenge, we built a user-friendly UI to visualize the most important relationships for both homogeneous and heterogeneous static graphs models, which a post-hoc explanation technique called GNNExplainer is implemented. This UI can be applied to a wide range of applications that use graph models. It offers an intuitive and interpretable way for users to understand the complex relationships within a graph and how they influence the model's predictions.
Keywords: GNN, GNNExplainer, Visualization
@inproceedings{mlg2023_5,
title={Graph Model Explainer Tool},
author={Yudi Zhang, Phanindra Reddy Madduru, Naveed Janvekar and Nitika Bhaskar},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Computation of Node Distances on HypergraphsPDF Enzhi Li and Bilal Fadlallah
Abstract: A hypergraph is a generalization of a graph that arises naturally when we consider attribute-sharing among entities. Although a hypergraph can be converted into a graph by expanding its hyperedges into fully connected subgraphs, going the reverse way is computationally complex and NP-complete. We hence hypothesize that a hypergraph contains more information than a graph. Moreover, it is more convenient to manipulate a hypergraph directly, rather than expanding it into a graph. An open problem in hypergraphs is how to accurately and efficiently calculate their node distances. Once node distances are defined, we can find a node's nearest neighbors, and perform label propagation on hypergraphs using a K-nearest neighbors (KNN) approach. In this paper, we propose two methods to achieve this. In the first, we compute expected hitting times of random walks on a hypergraph as node distances; in the second, we generalize the DeepWalk method to hypergraphs. We note that simple random walks (SRW) cannot accurately describe highly complex real-world hypergraphs, which motivates us to introduce frustrated random walks (FRW) to better describe them. Using real-world datasets, we show that FRW and DeepWalk can beat SRW with a large margin. For large and sparse hypergraphs, our method for computing the expected hitting times of random walks is approximately linear in time complexity, rendering it superior to the DeepWalk method.
@inproceedings{mlg2023_8,
title={Computation of Node Distances on Hypergraphs},
author={Enzhi Li and Bilal Fadlallah},
booktitle={Proceedings of the 19th International Workshop on Mining and Learning with Graphs (MLG)},
year={2023}
}
Call for Papers
This workshop is a forum for exchanging ideas and methods for mining and learning with graphs, developing new common understandings of the problems at hand, sharing of data sets where applicable, and leveraging existing knowledge from different disciplines. The goal is to bring together researchers from academia, industry, and government, to create a forum for discussing recent advances in graph analysis. In doing so, we aim to better understand the overarching principles and the limitations of our current methods and to inspire research on new algorithms and techniques for mining and learning with graphs.
To reflect the broad scope of work on mining and learning with graphs, we encourage submissions that span the spectrum from theoretical analysis to algorithms and implementation, to applications, empirical studies and reflection papers. As an example, the growth of user-generated content on blogs, microblogs, discussion forums, product reviews, etc., has given rise to a host of new opportunities for graph mining in the analysis of social media. More recently, the advent of neural methods for learning graph representations has spurred numerous works in embedding network entities for diverse applications including ranking and retrieval, traffic routing and drug-discovery. We encourage submissions on theory, methods, and applications focusing on a broad range of graph-based approaches in various domains.
Topics of interest include, but are not limited to:
Theoretical aspects:
Computational or statistical learning theory related to graphs
Theoretical analysis of graph algorithms or models
Sampling and evaluation issues in graph algorithms
Analysis of dynamic graphs
Algorithms and methods:
Graph mining
Probabilistic and graphical models for structured data
Heterogeneous/multi-model graph analysis
Graph neural networks and graph representation learning
Statistical models of graph structure
Combinatorial graph methods
Semi-supervised learning, active learning, transductive inference, and transfer
learning in the context of graph
Applications and analysis:
Analysis of social media
Analysis of biological networks
Knowledge graph construction
Large-scale analysis and modeling
We welcome many kinds of papers, such as, but not limited to:
Novel research papers
Demo papers
Work-in-progress papers
Visionary papers (white papers)
Appraisal papers of existing methods and tools (e.g., lessons learned)
Evaluatory papers which revisit validity of domain assumptions
Relevant work that has been previously published
Work that will be presented at the main conference
Authors should clearly indicate in their abstracts the kinds of submissions
that the papers belong to, to help reviewers better understand their contributions.
All papers will be peer reviewed, single-blinded.
Submissions must be in PDF, no more than 8 pages long — shorter papers are
welcome — and formatted according to the standard double-column ACM
Proceedings Style.
The accepted papers will be published on the workshop’s website and will not be considered
archival for resubmission purposes.
Authors whose papers are accepted to the workshop will have the opportunity to participate
in a spotlight and poster session, and some set will also be chosen for oral presentation.
To receive updates about the current and future workshops and the Graph Mining community,
please join the Mailing
List, or follow the Twitter
Account.
Important Dates
Paper Submission Deadline: May 30, 2023
Author Notification: June 23, 2023
Camera Ready: July 10, 2023
Workshop: August 7, 2023
Workshop Organizers
Neil Shah
Lead Research Scientist Snap Inc.
Shobeir Fakhraei
Senior Applied Scientist Amazon
Da Zheng
Senior Applied Scientist Amazon
Bahare Fatemi
Research Scientist Google Research
Leman Akoglu
Associate Professor CMU
Program Committee
Anton Tsitsulin (Google)
Evangelos Papalexakis (University of California Riverside)
Elena Zheleva (University of Illinois at Chicago)
Yozen Liu (University of Southern California)
Mohammad Hasan (IUPUI)
Stefan Wrobel (Fraunhofer IAIS & Univ. of Bonn)
Jan Ramon (INRIA)
David Gleich (Purdue University)
John Palowitch (Google Research)
Sami Abu-El-Haija (Google)
Ali Pinar (Sandia National Laboratories)
Puja Trivedi (University of Michigan)
Hocine Cherifi (University of Burgundy)
Boris Knyazev (Samsung)
Tong Zhao (Snap Inc.)
Aris Anagnostopoulos (Sapienza University of Rome)
Zhongfei Zhang (SUNY Binghamton)
Shichang Zhang (University of California, Los Angeles)
Perouz Taslakian (McGill University)
Ivan Brugere (University of Illinois at Chicago)
William Shiao (UC Riverside)