There is a great deal of interest in analyzing data that is best represented as a graph.
Examples include the WWW, social networks, biological networks, communication networks,
transportation networks, energy grids, and many others. These graphs are typically
multi-modal, multi-relational and dynamic. In the era of big data, the importance of being
able to effectively mine and learn from such data is growing, as more and more structured
and semi-structured data is becoming available. The workshop serves as a forum for
researchers from a variety of fields working on mining and learning from graphs to share and
discuss their latest findings.
There are many challenges involved in effectively mining and learning from this kind of
data, including:
Understanding the different techniques applicable, including graph mining algorithms,
network embeddings, graphical models, latent variable models, matrix factorization
methods and more.
Dealing with the heterogeneity of the data.
The common need for information integration and alignment.
Handling dynamic and changing data.
Addressing each of these issues at scale.
Traditionally, a number of subareas have contributed to this space: communities in graph
mining, learning from structured data, statistical relational learning, inductive logic
programming, and, moving beyond subdisciplines in computer science, social network analysis,
and, more broadly network science.
All accepted papers will be presented in the poster session.
Based on scheduling constraints, some papers are selected as contributed talks or spotlight
presentations.
-Held jointly with International Workshop on Deep Learning on Graphs (KDD-DLG)-
Morning Sessions
8:00 am
Opening Remarks
8:15 am
Keynote:
Jure Leskovec
Graph Structure of Neural Networks: Good Neural Networks Are Alike
8:45 am
Keynote:
Philip Yu
Broad Learning Via Heterogenous Information Networks
9:15 am
Parallel Contributed Talks
Junchen Jin
Understanding and Evaluating Structural Node Embeddings
Caleb Belth
Mining Persistent Activity in Continually Evolving Networks
9:45 am
Keynote:
Fei Wang
Deep Graph Mining for Healthcare
10:15 am
Coffee Break/Social Networking
10:30 am
Keynote:
Danai Koutra
The Power of Summarization in Network Analysis Video
11:00 am
Keynote:
Petar Veličković
Algorithmic Inductive Biases
11:30 am
Parallel Poster Session (Spotlight Talks + Live Q/A) Breakout Z-rooms for both DLG
and MLG
12:00 pm
Lunch Break
Afternoon Sessions
1:00 pm
Keynote:
Jimeng Sun
Deep Learning for Drug Development
1:30 pm
Parallel Contributed Talks
William Shiao
BRGAN: Generating Graphs of Bounded Rank
Christopher Tran
Heterogeneous Threshold Estimation for Linear Threshold Modeling
2:00 pm
Keynote:
Tyler Derr
Self-supervised Learning on Graphs: Deep Insights and New Directions
2:30 pm
Coffee Break/Social Networking
2:45 pm
Keynote:
Muhan Zhang
Learning Graph Strcuture Features for Inductive Link Prediction and Matrix Completion
3:15 pm
Keynote:
Le Song
Cybersecurity with Graph Neural Networks
3:45 pm
Best Paper Award Ceremony + Closing Remarks
4:00 pm
Parallel Poster Session (Spotlight Talks + Live Q/A) Breakout Z-rooms for both DLG
and MLG
Keynote Speakers
Jointly with the International Workshop on Deep Learning
on Graphs (KDD-DLG)
Danai Koutra
University of Michigan Ann Arbor
Jure Leskovec
Stanford University
Fei Wang
Cornell University
Philip Yu
University of Illinois at Chicago
Le Song
Georgia Institute of Technology
Jimeng Sun
University of Illinois Urbana-Champaign
Petar Velickovic
Deepmind
Muhan Zhang
Facebook AI
Tyler Derr
Vanderbilt University
Accepted Papers
A Scalable Parallel Hypergraph Generator (HyGen)PDFVideo S.M.Shamimul Hasan, Neena Imam and Ramakrishnan Kannan
Abstract: Graphs are extensively used to model real-world complex systems. An
edge in a graph can model pairwise relationships. However, multiway relationships (connections
between three or more vertices) are common in many complex systems such as cellular process,
image segmentation, and circuit design. A graph edge cannot model multiway relationships. A
hypergraph, which can connect more than two vertices, is thus a better option to model multiway
relationships. A large-scale hypergraph analysis has the potential to find useful insights from
a complex system and assist in knowledge discovery. Currently a limited number of hypergraphs
exists that are representative of real-world datasets. Moreover, real-world hypergraph datasets
are small in size and inadequate to incorporate future needs. A graph generator that can produce
large-scale synthetic hypergraphs can solve the above mentioned problems. In this paper, we
present a scalable parallel hypergraph generator (HyGen) based on the Message Passing Interface
(MPI) standard. To generate hypergraphs, HyGen takes the following parameter values as inputs:
i) number of vertices, ii) number of hyperedges, iii) number of clusters, iv) vertex
distribution, v) hyperedge distribution, vi) local cluster cardinality, and vii) global cluster
cardinality. We have demonstrated that HyGen can generate hypergraphs of various sizes in a
scalable fashion. HyGen takes approximately four minutes to generate a hypergraph with 4.8
million vertices, 1.6 million hyperedges, and 800 clusters using 1,024 processes on a leadership
class computing platform. Our strong and weak scaling experiments on supercomputers demonstrate
that HyGen can quickly create large-scale hypergraphs in a parallel manner, thus providing a
useful capability for hypergraph analysis.
Keywords: Hypergraph, MPI, C++, OLCF, Rhea
@inproceedings{mlg2020_5,
title={A Scalable Parallel Hypergraph Generator (HyGen)},
author={S.M.Shamimul Hasan, Neena Imam and Ramakrishnan Kannan},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Active Learning on Graphs with Geodesically Convex ClassesPDFVideo Maximilian Thiessen and Thomas Gärtner
Abstract: We study the problem of actively learning the vertex labels of a
graph, assuming the classes form geodesically convex subgraphs, which is related to linear
separability in the Euclidean setting. The main result of this paper is a novel query-efficient
active learning algorithm with label-independent upper bounds on the number of queries needed to
learn all labels. For that, we use shortest path covers and provide a logarithmic approximation
for the subproblem of computing a shortest path cover of minimum size. We extend the approach to
arbitrarily labeled graphs using a convexity-based selection criterion. Finally, we discuss
whether the convexity assumption holds on real-world data and give some first preliminary
results on citation and image benchmark datasets.
@inproceedings{mlg2020_40,
title={Active Learning on Graphs with Geodesically Convex Classes},
author={Maximilian Thiessen and Thomas Gärtner},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Adaptive Granularity in Time Evolving Graphs as TensorsPDFVideo Ravdeep Pasricha, Ekta Gujral and Evangelos Papalexakis
Abstract: Data collected at very frequent intervals is usually extremely sparse
and has no structure that is exploitable by modern tensor decomposition algorithms. Thus the
utility of such tensors is low, in terms of the amount of interpretable and exploitable
structure that one can extract from them. In this paper, we introduce the problem of finding a
tensor of adaptive aggregated granularity that can be decomposed to reveal meaningful latent
concepts (structures) from datasets that, in their original form, are not amenable to tensor
analysis. Such datasets fall under the broad category of sparse point processes that evolve over
space and/or time. To the best of our knowledge, this is the first work that explores adaptive
granularity aggregation in tensors. Furthermore, we formally define the problem and discuss what
different definitions of “good structure” can be in practice, and show that optimal solution is
of prohibitive combinatorial complexity. Subsequently, we propose an efficient and effective
greedy algorithm which follows a number of intuitive decision criteria that locally maximize the
“goodness of structure”, resulting in high-quality tensors. We evaluate our method on synthetic
and semi-synthetic data where ground truth is known. Our proposed method constructs tensors that
have very high structure quality.
@inproceedings{mlg2020_35,
title={Adaptive Granularity in Time Evolving Graphs as Tensors},
author={Ravdeep Pasricha, Ekta Gujral and Evangelos Papalexakis},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Adversarial Learning for Debiasing Knowledge Graph EmbeddingsPDFVideo Mario Arduini, Lorenzo Noci, Federico Pirovano, Ce Zhang, Yash Raj Shrestha and Bibek
Paudel
Abstract: Knowledge Graphs (KG) are gaining increasing attention in both
academia and industry. Despite their diverse benefits, recent research have identified social
and cultural biases embedded in the representations learned from KGs.
Such biases can have detrimental consequences on different population and minority groups as
applications of KG begin to intersect and interact with social spheres.
This paper describes our work-in-progress, which aims at identifying and mitigating such biases
in Knowledge Graph (KG) embeddings.
As a first step, we explore popularity bias --- the relationship between node popularity and
link prediction accuracy.
In case of node2vec graph embeddings, we find that prediction accuracy of the embedding is
negatively correlated with the degree of the node.
However, in case of knowledge-graph embeddings (KGE), we observe an opposite trend.
As a second step, we explore gender bias in KGE, and a careful examination of popular KGE
algorithms suggest that sensitive attribute like the gender of a person can be predicted from
the embedding.
This implies that such biases in popular KGs is captured by the structural properties of the
embedding.
As a preliminary solution to debiasing KGs, we introduce a novel framework to filter out the
sensitive attribute information from the KG embeddings, which we call FAN (Filtering Adversarial
Network).
We also suggest the applicability of FAN for debiasing other network embeddings which could be
explored in future work.
@inproceedings{mlg2020_39,
title={Adversarial Learning for Debiasing Knowledge Graph Embeddings},
author={Mario Arduini, Lorenzo Noci, Federico Pirovano, Ce Zhang, Yash Raj Shrestha and Bibek
Paudel},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
BRGAN: Generating Graphs of Bounded RankPDFVideo William Shiao and Evangelos Papalexakis
Abstract: Graph generation is a task that has been explored with a wide variety
of methods. Recently, several papers have applied Generative Adversarial Networks (GANs) to this
task, but most of these methods result in graphs of full or unknown rank. Many real-world graphs
have low rank, which roughly translates to the number of communities in that graph. Furthermore,
it has been shown that taking the low rank approximation of a graph can defend against
adversarial attacks. This suggests that testing models against graphs of different rank may be
useful.
However, current methods provide no way to control the rank of generated graphs. In this paper,
we propose BRGAN: a GAN architecture that generates synthetic graphs, which in addition to
having realistic graph features, also have bounded (low) rank. We extensively evaluate BRGAN and
show that it is able to generate synthetic graphs competitive with state-of-the-art models, with
rank equal to or lower than the desired rank.
@inproceedings{mlg2020_3,
title={BRGAN: Generating Graphs of Bounded Rank},
author={William Shiao and Evangelos Papalexakis},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
CONE-Align: Consistent Network Alignment with Proximity-Preserving Node
EmbeddingPDFVideo Xiyuan Chen, Mark Heimann, Fatemeh Vahedian and Danai Koutra
Abstract: Network alignment, the process of finding correspondences between
nodes in different graphs, has many scientific and industrial applications. Existing
unsupervised network alignment methods find suboptimal alignments that break up node
neighborhoods, i.e. do not preserve matched neighborhood consistency. To improve this, we
propose CONE-Align, which models intra-network proximity with node embeddings and matches nodes
across networks by comparing the embeddings after aligning their subspaces. Experiments on
diverse, challenging datasets show that CONE-Align is robust and obtains up to 49% greater
accuracy than the state-of-the-art graph alignment algorithms.
@inproceedings{mlg2020_4,
title={CONE-Align: Consistent Network Alignment with Proximity-Preserving Node Embedding},
author={Xiyuan Chen, Mark Heimann, Fatemeh Vahedian and Danai Koutra},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Characterising the atomic structure of mono-metallic nanoparticles from x-ray scattering
data using conditional generative modelsPDFVideo Andy S. Anker, Emil T. S. Kjær, Erik B. Dam, Simon J. L. Billinge, Kirsten M. Ø. Jensen and
Raghavendra Selvan
Abstract: The development of new nanomaterials for energy technologies is
dependent on understanding the intricate relation between material properties and atomic
structure. It is, therefore, crucial to be able to routinely characterise the atomic structure
in nanomaterials, and a promising method for this task is Pair Distribution Function (PDF)
analysis. The PDF can be obtained through Fourier transformation of x-ray total scattering data,
and represents a histogram of all interatomic distances in the sample. Going from the distance
information in the PDF to a chemical structure is an unassigned distance geometry problem
(uDGP), and solving this is often the bottleneck in nanostructure analysis. In this work, we
propose to use a Conditional Variational Autoencoder (CVAE) to automatically solve the uDGP to
obtain valid chemical structures from PDFs. We use a simple model system of hypothetical
mono-metallic nanoparticles containing up to 100 atoms in the face centered cubic (FCC)
structure as a proof of concept. The model is trained to predict the assigned distance matrix
(aDM) from a simulated PDF of the structure as the conditional input. We introduce a novel
representation of structures by projecting them inside a unit sphere and adding additional
anchor points or satellites to help in the reconstruction of the chemical structure. The
performance of the CVAE model is compared to a Deterministic Autoencoder (DAE) showing that both
models are able to solve the uDGP reasonably well. We further show that the CVAE learns a
structured and meaningful latent embedding space which can be used to predict new chemical
structures.
Keywords: generative modeling, mono-metallic nanoparticles, CVAE,
Pair Distribution Function
@inproceedings{mlg2020_22,
title={Characterising the atomic structure of mono-metallic nanoparticles from x-ray scattering
data using conditional generative models},
author={Andy S. Anker, Emil T. S. Kjær, Erik B. Dam, Simon J. L. Billinge, Kirsten M. Ø. Jensen
and Raghavendra Selvan},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Collective Bio-Entity Recognition in Scientific Documents using Hinge-Loss Markov Random
FieldsPDFVideo Alexander Miller, Naum Markenzon, Varun Embar and Lise Getoor
Abstract: Identifying biological entities such as genes and proteins from
scientific documents is crucial for further downstream tasks such as question answering and
information retrieval. This task is challenging because the same surface text can refer either
to a gene or a protein based on the context. Traditional approaches such as Huang et al.
consider the words present in the surrounding text to infer the context. However, they fail to
consider the semantics of these words which are better represented by contextual word embeddings
such as BERT. Deep learning based approaches, on the other hand, fail to make use of the
relational structure of scientific documents. We introduce a novel probabilistic approach that
jointly classifies all entity references using a class of undirected graphical models called
hinge-loss Markov random fields. Our approach can combine relational information with
embedding-based word semantics. Further, our approach can be easily extended to incorporate new
sources of information. Our initial evaluation on the JNLPBA shared task corpus shows that our
joint classification approach outperforms both traditional machine learning approaches and
semantic models based on word embeddings by up to 7.5% on F1 score.
Keywords: probabilistic graphical models, embeddings, information
extraction, word sense disambiguation, graphs, collective inference
@inproceedings{mlg2020_28,
title={Collective Bio-Entity Recognition in Scientific Documents using Hinge-Loss Markov Random
Fields},
author={Alexander Miller, Naum Markenzon, Varun Embar and Lise Getoor},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Comparison of Graph Generation Models focusing on Accuracy and VariationPDFVideo Mei Fukuda, Kazuki Nakajima and Kazuyuki Shudo
Abstract: Generation models of graphs have been used to compare and analyze the
properties of graph structures and to produce graphs that resemble real-world networks. When
using a generation model to mimic a real-world network, it is desirable for the error in the
properties between the target graph and the generated graph and the variation of the errors
between generated graphs are small. However, since many existing generation models generate
graphs by adding edges at random, the extent of the error and its variation for each generated
graph is unclear.
This paper studies the error and the variation of properties of graphs generated using the
dK-series framework, which has been proposed to analyze the topology of a network based on the
degree of nodes. In addition, we propose a new graph generation model that takes the degree
distribution and degree-dependent clustering coefficient as inputs. We show that the proposed
model is able to reduce the error to a greater extent than other generation models.
Keywords: network analysis, graph generation models, estimation,
sampling, social networks
@inproceedings{mlg2020_2,
title={Comparison of Graph Generation Models focusing on Accuracy and Variation},
author={Mei Fukuda, Kazuki Nakajima and Kazuyuki Shudo},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Decoupled Smoothing in Probabilistic Soft LogicPDFVideo Yatong Chen, Bryan Tor, Eriq Augustine and Lise Getoor
Abstract: Node classification in networks is a common graph mining task. In
this paper, we examine how separating identity(a node’s attribute)and preference(the kind of
identities to which a node prefers to link)is useful for node classification in social networks.
Building upon recent work by Chin et al., where the separation of identity and preference is
accomplished through a technique called “decoupled smoothing”, we show how models that
characterize both identity and preference are able to capture the underlying structure in a
network, which leads to performance improvement in node classification tasks. Specifically, we
use probabilistic soft logic (PSL), a flexible and declarative statistical reasoning framework,
to model identity and preference. We compare our method with the original decoupled smoothing
method and other node classification methods implemented in PSL, and show that our approach
outperforms the state-of-the-art decoupled smoothing method as well as the other node
classification methods across all evaluation metrics on a real-world Facebook Dataset.
@inproceedings{mlg2020_31,
title={Decoupled Smoothing in Probabilistic Soft Logic},
author={Yatong Chen, Bryan Tor, Eriq Augustine and Lise Getoor},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Effectiveness of Sampling Strategies for One-shot Active Learning from Relational
DataPDFVideo Ragib Ahsan and Elena Zheleva
Abstract: Relational classification exploits structural information in network
data to improve predictive performance. However, the large size of real-world networks causes
two main scalability issues for relational classification. First, training supervised models on
large networks is computationally expensive. Second, label acquisition for large samples can be
costly and unrealistic. The goal of Active learning is to query informative labels and reduce
labeling cost. However, state-of-the-art active learning strategies require multiple iterations
of learning, in order to pick the best labels at each iteration, which incurs higher
computational cost. In this work, we focus on a constrained version of the problem, named
one-shot active learning where the active learner has to decide which nodes to sample in one
shot, rather than iteratively. We consider several simple and network-based sampling strategies
as potential solutions to this problem. In our experiments, we show a comprehensive evaluation
of eleven different sampling methods on four real world network datasets using four relational
classifiers (wvRN, ICA, SGC, GraphSage), offering the first comparison between collective
classification and neural network approaches for one-shot active learning. Moreover, we propose
a novel sampling method based on Weisfeiler-Lehman graph labeling algorithm which shows overall
best performance across all classifiers and datasets. We empirically show that some of the
computationally cheaper one-shot active learning approaches can achieve comparable Micro-F1
scores to existing active learning methods that require multiple iterations.
Keywords: Graph sampling, Active learning, Relational
classification
@inproceedings{mlg2020_30,
title={Effectiveness of Sampling Strategies for One-shot Active Learning from Relational
Data},
author={Ragib Ahsan and Elena Zheleva},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Efficient Algorithms to Mine Maximal Span-Trusses From Temporal GraphsPDFVideo Quintino Francesco Lotito and Alberto Montresor
Abstract: Over the last decade, there has been an increasing interest in
temporal graphs, pushed by a growing availability of temporally-annotated network data coming
from social, biological and financial networks.
Despite the importance of analyzing complex temporal networks, there is a huge gap between the
set of definitions, algorithms and tools available to study large static graphs and the ones
available for temporal graphs.
An important task in temporal graph analysis is mining dense structures, i.e., identifying
high-density subgraphs together with the span in which this high density is observed.
In this paper, we introduce the concept of (k, \Delta)-truss (span-truss) in temporal graphs, a
temporal generalization of the k-truss, in which k captures the information about the density
and \Delta captures the time span in which this density holds. We then propose novel and
efficient algorithms to identify maximal span-trusses, namely the ones not dominated by any
other span-truss neither in the order k nor in the interval \Delta, and evaluate them on a
number of public available datasets.
Keywords: temporal graphs, graph mining, dense structures,
community detection, social networks analysis
@inproceedings{mlg2020_11,
title={Efficient Algorithms to Mine Maximal Span-Trusses From Temporal Graphs},
author={Quintino Francesco Lotito and Alberto Montresor},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Examining COVID-19 Forecasting using Spatio-Temporal GNNsPDFVideo Amol Kapoor, Xue Ben, Luyang Liu, Bryan Perozzi, Matt Barnes, Martin Blais and Shawn
O'Banion
Abstract: In this work, we examine a novel forecasting approach for COVID-19
that uses Graph Neural Networks and mobility data. In contrast to existing time series
forecasting models, the proposed approach learns from a single large-scale spatio-temporal
graph, where nodes represent the region-level human mobility, spatial edges represent the human
mobility based inter-region connectivity, and temporal edges represent node features through
time. We evaluate initial experiments on the US county level COVID-19 dataset, and show that the
rich spatial and temporal information unified by the graph neural network allows the model to
learn complex dynamics and make more accurate forecasts compared to autoregressive statistical
learning and sequence deep learning approaches. We believe this novel source of information
combined with graph based deep learning approaches can be a powerful tool to understand the
spread and evolution of COVID-19. We encourage others to further develop a novel modeling
paradigm for infectious disease based on this high resolution mobility data.
@inproceedings{mlg2020_26,
title={Examining COVID-19 Forecasting using Spatio-Temporal GNNs},
author={Amol Kapoor, Xue Ben, Luyang Liu, Bryan Perozzi, Matt Barnes, Martin Blais and Shawn
O'Banion},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
First and Higher-Order Bipartite EmbeddingsPDFVideo Justin Sybrandt and Ilya Safro
Abstract: Typical graph embeddings may not capture type-specific bipartite
graph features that arise in such areas as recommender systems, data visualization, and drug
discovery. Machine learning methods utilized in these applications would be better served with
specialized embedding techniques. We propose two embeddings for bipartite graphs that decompose
edges into sets of indirect relationships between node neighborhoods. When sampling higher-order
relationships, we reinforce similarities through algebraic distance on graphs. We also introduce
ensemble embeddings to combine both into a "best of both worlds" embedding. The proposed methods
are evaluated on link prediction and recommendation tasks and compared with other
state-of-the-art embeddings. Our embeddings are found to perform better on recommendation tasks
and equally competitive in link prediction. Although all considered embeddings are beneficial in
particular applications, we demonstrate that none of those considered is clearly superior (in
contrast to what is claimed in many papers). Therefore, we discuss the trade offs among them,
noting that the methods proposed here are robust for applications relying on same-typed
comparisons.
Reproducibility: Our code, data sets, and results are all publicly available online at:
sybrandt.com/2020/fobe_hobe.
Keywords: bipartite graphs, hypergraphs, graph embedding, algebraic
distance on graphs, recommendation, link prediction
@inproceedings{mlg2020_15,
title={First and Higher-Order Bipartite Embeddings},
author={Justin Sybrandt and Ilya Safro},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
GATCheck: A Detailed Analysis of Graph Attention NetworksPDFVideo Lovish Madaan and Siddhant Arora
Abstract: Graph Attention Networks (GATs) are widely used for Representation
Learning in Graphs, but there is no proper study highlighting on what tasks GATs perform better
than other models and why. In this appraisal paper, we aim to improve our understanding of GATs
on a variety of tasks, including link prediction, multi-class node classification, and pairwise
node classification on benchmark datasets. We also perform ablation studies on the various
hyperparameters of GATs and try to reason about the importance of each of these in node
classification and link prediction tasks. Our study offers insights into the effectiveness of
GATs as compared to other techniques, and we make our code public so as to facilitate future
exploration.
Keywords: representation learning, node embeddings in graph
networks, machine learning for graphs, graph attention networks
@inproceedings{mlg2020_21,
title={GATCheck: A Detailed Analysis of Graph Attention Networks},
author={Lovish Madaan and Siddhant Arora},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Graph Clustering with Graph Neural NetworksPDFVideo Anton Tsitsulin, John Palowitch, Bryan Perozzi and Emmanuel Müller
Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art results
on many graph analysis tasks such as node classification and link prediction. However, important
unsupervised problems on graphs, such as graph clustering, have proved more resistant to
advances in GNNs. In this paper, we study unsupervised training of GNN pooling in terms of their
clustering capabilities.
We start by drawing a connection between graph clustering and graph pooling: intuitively, a good
graph clustering is what one would expect from a GNN pooling layer. Counterintuitively, we show
that this is not true for state-of-the-art pooling methods, such as MinCut pooling. To address
these deficiencies, we introduce Deep Modularity Networks (DMoN), an unsupervised pooling method
inspired by the modularity measure of clustering quality, and show how it tackles recovery of
the challenging clustering structure of real-world graphs. In order to clarify the regimes where
existing methods fail, we carefully design a set of experiments on synthetic data which show
that DMoN is able to jointly leverage the signal from the graph structure and node attributes.
Similarly, on real-world data, we show that DMoN produces high quality clusters which correlate
strongly with ground truth labels, achieving state-of-the-art results.
@inproceedings{mlg2020_42,
title={Graph Clustering with Graph Neural Networks},
author={Anton Tsitsulin, John Palowitch, Bryan Perozzi and Emmanuel Müller},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Graph Frequency Analysis of COVID-19 Incidence in the United StatesPDFVideo Yang Li and Gonzalo Mateos
Abstract: The COVID-19 pandemic markedly changed the way of life in the United
States (US). From early isolated regional outbreaks to ongoing country-wise spread, the
contagion exhibits different patterns at various timescales and locations. Thus, a close study
of the COVID-19 spread patterns can offer valuable insights on how counties were affected by the
virus. In the present work, a graph frequency analysis was conducted to investigate the spread
pattern of COVID-19 in the US. A geographical graph was constructed by computing the geodesic
distance between 3142 US counties. The numbers of daily confirmed COVID-19 cases per county were
collected and represented as graph signals, then mapped into the frequency domain via the graph
Fourier transform. The concept of graph frequency in Graph Signal Processing (GSP) enables the
decomposition of graph signals (i.e. daily confirmed cases) into modes with smooth or rapid
variations with respect to the underlying graph connectivity. Follow-up analysis revealed the
relationship between graph frequency components and the COVID-19 spread pattern within and
across counties. Specifically, our preliminary graph frequency analysis mined (and learned from)
confirmed case counts to unveil spatio-temporal contagion patterns of COVID-19 incidence for
each US county. Overall, results here support the promising prospect of using GSP tools for
epidemiology knowledge discovery on graphs.
Keywords: Graph data mining, graph signal processing, frequency
analysis, contagion pattern recognition
@inproceedings{mlg2020_24,
title={Graph Frequency Analysis of COVID-19 Incidence in the United States},
author={Yang Li and Gonzalo Mateos},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Graph Summarization and Graph Embeddings: Towards A Spectral ConnectionPDFVideo Arpit Merchant and Michael Mathioudakis
Abstract: The graph summarization problem is to define a compressed data
structure that can concisely describe the original graph. A standard class of techniques for
summarization involves grouping nodes into supernodes via aggregation or clustering such that
the lp-reconstruction error, i.e. the p-norm between the original adjacency matrix and the
adjacency matrix recovered from the compressed summary, is minimized. Our main result shows that
graph summarization can be reformulated as a trace maximization problem, the relaxed version of
which can be solved exactly by all the eigenvectors of the adjacency matrix. We also prove a
lower bound on the optimal solution which uses k eigenvectors for a summary with k supernodes.
Our results motivate a simple spectral clustering algorithm that can yield excellent summaries.
Our experiments validate the quality of the resultant summaries.
Keywords: graph summarization, graph embeddings, spectral graph
theory
@inproceedings{mlg2020_17,
title={Graph Summarization and Graph Embeddings: Towards A Spectral Connection},
author={Arpit Merchant and Michael Mathioudakis},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Graph-based State Representation for Deep Reinforcement LearningPDFVideo Vikram Waradpande, Daniel Kudenko and Megha Khosla
Abstract: Deep RL approaches build much of their success on the ability of the
deep neural network to generate useful internal representations. Nevertheless, they suffer from
a high sample-complexity and starting with a good input representation can have a significant
impact on the performance. In this paper, we exploit the fact that the underlying Markov
decision process (MDP) represents a graph, which enables us to incorporate the topological
information for effective state representation learning.
Motivated by the recent success of node representations for several graph analytical tasks we
specifically investigate the capability of node representation learning methods to effectively
encode the topology of the underlying MDP in Deep RL. To this end, we perform a comparative
analysis of several models chosen from different classes of representation learning algorithms
for policy learning in grid-world navigation tasks, which are representative of a large class of
RL problems. We find that all embedding methods outperform the commonly used matrix
representation of grid-world environments in all of the studied cases. Moreoever, graph
convolution based methods are outperformed by simpler random walk based methods and graph linear
autoencoders.
Keywords: Deep Reinforcement Learning, Deep Q-Learning, Markov
Decision Processes, Unsupervised Node Embeddings, Graph Neural Networks, Graph representation
@inproceedings{mlg2020_20,
title={Graph-based State Representation for Deep Reinforcement Learning},
author={Vikram Waradpande, Daniel Kudenko and Megha Khosla},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Heterogeneous Threshold Estimation for Linear Threshold ModelingPDFVideo Christopher Tran and Elena Zheleva
Abstract: Social networks play a central role in the spread of diseases, ideas,
and beliefs. The Linear Threshold Model (LTM) is a prominent model which describes the process
of diffusion through the network and how nodes become "infected" based on a threshold of number
of neighbors who are already "infected." LTM is often used with the assumption that node
thresholds are globally unique or randomly distributed. In many cases, however, thresholds can
differ between individuals, and knowing individual-level thresholds can lead to better diffusion
predictions. In this work, we propose a causal inference approach for estimating node
thresholds. We develop a Structural Causal Model to show the identifiability of causal effects
in the Linear Threshold Model, and map the threshold estimation problem to heterogeneous
treatment effect estimation. Through experimental results on real-world and synthetic datasets,
we show that individualized thresholds play an important part in reliable long-term diffusion
prediction.
Keywords: heterogeneous treatment effects, linear threshold model,
causal inference, information diffusion
@inproceedings{mlg2020_23,
title={Heterogeneous Threshold Estimation for Linear Threshold Modeling},
author={Christopher Tran and Elena Zheleva},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Hop Sampling: A Simple Regularized Graph Learning for Non-Stationary
EnvironmentsPDFVideo Young-Jin Park, Kyuyong Shin and Kyungmin Kim
Abstract: Graph representation learning is gaining popularity in a wide range
of applications, such as social networks analysis, computational biology, and recommender
systems. However, different with positive results from many academic studies, applying graph
neural networks (GNNs) in a real-world application is still challenging due to non-stationary
environments. The underlying distribution of streaming data changes unexpectedly, resulting in
different graph structures (a.k.a., concept drift). Therefore, it is essential to devise a
robust graph learning technique so that the model does not overfit to the training graphs. In
this work, we present Hop Sampling, a straightforward regularization method that can effectively
prevent GNNs from overfishing. The hop sampling randomly selects the number of propagation steps
rather than fixing it, and by doing so, it encourages the model to learn meaningful node
representation for all intermediate propagation layers and to experience a variety of plausible
graphs that are not in the training set. Particularly, we describe the use case of our method in
recommender systems, a representative example of the real-world non-stationary case. We
evaluated hop sampling on a large-scale real-world LINE dataset and conducted an online A/B/n
test in LINE Coupon recommender systems of LINE Wallet Tab. Experimental results demonstrate
that the proposed scheme improves the prediction accuracy of GNNs. We observed hop sampling
provides 7.97% and 16.93% improvements for NDCG and MAP compared to non-regularized GNN models
in our online service. Furthermore, models using hop sampling alleviate the oversmoothing issue
in GNNs enabling a deeper model as well as more diversified representation.
Keywords: Graph neural networks, Graph representation,
Non-stationary graphs, Recommender System, Mobile Coupon Service
@inproceedings{mlg2020_43,
title={Hop Sampling: A Simple Regularized Graph Learning for Non-Stationary Environments},
author={Young-Jin Park, Kyuyong Shin and Kyungmin Kim},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Influence of Asymmetry and Structural Roles on Triad Patterns in Undirected
NetworksPDFVideo Milos Kudelka, Eliska Ochodkova, Jakub Plesnik and Sarka Zehnalova
Abstract: Triads, i.e., variously interconnected triplets of nodes,
significantly affect the network structure. Closed triads, for instance, are the building blocks
of communities. Our study focuses on the analysis of triads in which the ego is connected to two
alters, with the alters not having to be connected; therefore, the triads that are studied need
not be closed. The analysis uses two approaches based on asymmetric relationships between pairs
of nodes. In the first approach, we work with three different node roles, in which the ego and
its alters can appear in triads. We get a total of eighteen different role-based triad patterns.
The second approach allows us to work with a total of four different types of ties and six
different alter-pair patterns. In experiments with four different types of real-world networks,
we show how the properties of these networks differ in terms of role-based triad patterns. In
some of these networks, we further show that the triad-based properties remain stable during
network growth. The main contribution of our paper is the use of asymmetric relations for the
definition of four types of dependency-based tie strengths between nodes and the analysis of
their influence on the occurrence of different triad-based patterns in networks.
@inproceedings{mlg2020_18,
title={Influence of Asymmetry and Structural Roles on Triad Patterns in Undirected
Networks},
author={Milos Kudelka, Eliska Ochodkova, Jakub Plesnik and Sarka Zehnalova},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Karate Club: An API Oriented Open-Source Python Framework for Unsupervised Learning on
GraphsPDFVideo Benedek Rozemberczki, Olivér Kiss and Rik Sarkar
Abstract: Graphs encode important structural properties of complex systems.
Machine learning on graphs has therefore emerged as an important technique in research and
applications. We present Karate Club - a Python framework combining more than 30
state-of-the-art graph mining algorithms. These unsupervised techniques make it easy to identify
and represent common graph features. The primary goal of the package is to make community
detection, node and whole graph embedding available to a wide audience of machine learning
researchers and practitioners. Karate Club is designed with an emphasis on a consistent
application interface, scalability, ease of use, sensible out of the box model behaviour,
standardized dataset ingestion, and output generation. This paper discusses the design
principles behind the framework with practical examples. We show Karate Club's efficiency in
learning performance on a wide range of real world clustering problems and classification tasks
along with supporting evidence of its competitive speed.
@inproceedings{mlg2020_8,
title={Karate Club: An API Oriented Open-Source Python Framework for Unsupervised Learning on
Graphs},
author={Benedek Rozemberczki, Olivér Kiss and Rik Sarkar},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Learning Distributed Representations of Graphs with Geo2DRPDFVideo Paul Scherer and Pietro Lio
Abstract: We present Geo2DR (Geometric to Distributed Representations), a GPU
ready Python library for unsupervised learning on graph-structured data using discrete
substructure patterns and neural language models. It contains efficient implementations of
popular graph decomposition algorithms and neural language models in PyTorch which can be
combined to learn representations of graphs using the distributive hypothesis. Furthermore,
Geo2DR comes with general data processing and loading methods to bring substantial speed-up in
the training of the neural language models. Through this we provide a modular set of tools and
building blocks to quickly construct methods capable of learning distributed representations of
graphs. This is useful for replication of existing methods, modification, and development of
completely new methods. This paper serves to present the Geo2DR library and perform a
comprehensive comparative analysis of existing methods re-implemented using Geo2DR across widely
used graph classification benchmarks. Geo2DR displays a high reproducibility of results of
published methods and interoperability with other libraries useful for distributive language
modelling, making it a useful addition to the graph representation learning toolkit.
Keywords: Graph Representation Learning, Research Toolkit,
Reproducibility, Distributed Representations, Neural Language Model, Graph Kernel
@inproceedings{mlg2020_10,
title={Learning Distributed Representations of Graphs with Geo2DR},
author={Paul Scherer and Pietro Lio},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Learning Generic Representations for Dynamic Social InteractionPDFVideo Yanbang Wang, Pan Li, Chongyang Bai, V.S. Subrahmanian and Jure Leskovec
Abstract: Social interaction, such as eye contact, speaking and listening, are
ubiquitous in our life and carries important clues of human's social status and psychological
state. With evolving dynamics fundamentally different from people's social relationships, the
complex interactions among a group of people are another informative resource to analyze
patterns of humans' social behaviors and characteristics. Despite the great importance, previous
approaches on extracting patterns from such dynamic social interactions are still underdeveloped
and overly task-specific. We fill this gap by proposing a temporal network formulation of the
problem, combined with a novel representation learning framework, temporal network-diffusion
convolution networks (TNDCN) that accommodates the many downstream tasks with a unified
structure: we creatively propagate people's fast-changing descriptive traits among their
evolving gazing networks with specially designed (1) network diffusion scheme and (2)
hierarchical pooling to learn high-quality embeddings for downstream tasks using a consistent
structure and minimal feature engineering. Analysis show that (1) can not only capture patterns
from existed interactions but also people's avoidance of interactions that turn out just as
critical. (2) allows us to flexibly collect different fine-grained critical interaction features
scattered over an extremely long time span, which is also key to success but essentially fails
all the previous temporal GNNs based on recurrent structures. We evaluate our model over three
different prediction tasks, detecting deception, dominance and nervousness. Our model not only
consistently outperforms previous baselines but also provides good interpretation by implying
two important pieces of social insight derived from the learned coefficients.
Keywords: Social Interaction, Dynamic Network, Representation
Learning, Deception Detection
@inproceedings{mlg2020_6,
title={Learning Generic Representations for Dynamic Social Interaction},
author={Yanbang Wang, Pan Li, Chongyang Bai, V.S. Subrahmanian and Jure Leskovec},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Link Predictions in an Online Health Community for Smoking CessationPDFVideo Sulyun Lee, Hankyu Jang, Kang Zhao, Michael S. Amato and Amanda L. Graham
Abstract: Effective link predictions in online social networks can help to
improve user experience and engagement, which are often associated with better health outcomes
for users of online health communities (OHCs). However, limited attention has been paid to
predicting social network links in OHCs. This paper explores link predictions in an OHC for
smoking cessation by considering it as a multi-relational social network that incorporates
multiple types of social relationships. We demonstrate that leveraging information from multiple
networks built based on different types of relationships is superior to using only information
from a single network or the aggregated network. In addition, adding community structures and
nodal similarities based on network embedding can help link predictions in different ways. Our
work has implications for the design and management of a successful online health community.
@inproceedings{mlg2020_25,
title={Link Predictions in an Online Health Community for Smoking Cessation},
author={Sulyun Lee, Hankyu Jang, Kang Zhao, Michael S. Amato and Amanda L. Graham},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Little Ball of Fur: A Python Library for Graph SamplingPDFVideo Benedek Rozemberczki, Olivér Kiss and Rik Sarkar
Abstract: Sampling graphs is an important task in data mining. In this paper,
we describe Little Ball of Fur a Python library that includes more than twenty graph sampling
algorithms. Our goal is to make node, edge, and exploration-based network sampling techniques
accessible to a large number of professionals, researchers, and students in a single streamlined
framework. We created this framework with a focus on a coherent application public interface
which has a convenient design, generic input data requirements, and reasonable baseline settings
of algorithms. Here we overview these design foundations of the framework in detail with
illustrative code snippets. We show the practical usability of the library by estimating various
global statistics of social networks and web graphs. Experiments demonstrate that Little Ball of
Fur can speed up node and whole graph embedding techniques considerably with mildly
deteriorating the predictive value of distilled features.
@inproceedings{mlg2020_9,
title={Little Ball of Fur: A Python Library for Graph Sampling},
author={Benedek Rozemberczki, Olivér Kiss and Rik Sarkar},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams PDFVideo Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin and Christos Faloutsos
Abstract: Given a stream of graph edges from a dynamic graph, how can we assign
anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior,
using constant time and memory? Existing approaches aim to detect individually surprising edges.
In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly
arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of
service attacks in network traffic data. MIDAS has the following properties: (a) it detects
microcluster anomalies while providing theoretical guarantees about its false positive
probability; (b) it is online, thus processing each edge in constant time and constant memory,
and also processes the data 162-644 times faster than state-of-the-art approaches; (c) it
provides 42%-48% higher accuracy (in terms of AUC) than state-of-the-art approaches. Category:
Relevant work that has been previously published at AAAI '20.
@inproceedings{mlg2020_41,
title={MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams },
author={Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin and Christos Faloutsos},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Mining Persistent Activity in Continually Evolving NetworksPDFVideo Caleb Belth, Xinyi Zheng and Danai Koutra
Abstract: Work that will be presented at the main conference. Frequent pattern
mining is a key area of study that gives insights into the structure and dynamics of evolving
networks, such as social or road networks.
However, not only does a network evolve, but often the way that it evolves, itself evolves.
Thus, knowing, in addition to patterns' frequencies, for how long and how regularly they have
occurred—i.e., their persistence—can add to our understanding of evolving networks. In this
work, we propose the problem of mining activity that persists through time in continually
evolving networks—i.e., activity that repeatedly and consistently occurs. We extend the notion
of temporal motifs to capture activity among specific nodes, in what we call activity snippets,
which are small sequences of edge-updates that reoccur. We propose axioms and properties that a
measure of persistence should satisfy, and develop such a persistence measure. We also propose
PENminer, an efficient framework for mining activity snippets' Persistence in Evolving Networks,
and
design both offline and streaming algorithms. We apply PENminer to numerous real, large-scale
evolving networks and edge streams, and find activity that is surprisingly regular over a long
period of time, but too infrequent to be discovered by aggregate count alone, and bursts of
activity exposed by their lack of persistence. Our findings with PENminer include neighborhoods
in NYC where taxi traffic persisted through Hurricane Sandy, the opening of new bike-stations,
characteristics of social network users, and more. Moreover, we use PENminer to identify subtle
anomalies in a bike network and catch attacks in an IP-network, outperforming baselines by 8% in
AUC.
@inproceedings{mlg2020_14,
title={Mining Persistent Activity in Continually Evolving Networks},
author={Caleb Belth, Xinyi Zheng and Danai Koutra},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Network Embedding with Attribute RefinementPDFVideo Tong Xiao, Yuan Fang, Hongxia Yang and Vincent W. Zheng
Abstract: Network embedding has been an active research area given its
effectiveness in mapping nodes to low-dimensional representations.
While previous studies mostly focus on network topology, recent
advances have shown that rich node-level information, known as
attributes, often exist and can substantially benefit embedding based
on the assumption of homophily: nodes often connect to other nodes
similar to themselves. However, we find that inconsistencies often
occur in node attributes in real-world data, which can undermine the
homophily assumption and thus degrade the performance of attributed network embedding. To
address this drawback, in this paper,
we present a novel framework for unsupervised network embedding
with attribute refinement. In particular, we propose a learnable filter
to automatically refine the individual attributes of every node. To
overcome the challenge of no supervision, we leverage homophily to
guide the refinement—attributes should be fine-tuned in a way to reinforce the correlation with
topology. Finally, we conduct extensive
experiments on three benchmark real-world datasets, which show
that our model significantly outperforms state-of-the-art methods
on both node classification and link prediction tasks. Furthermore,
we perform model analysis to demonstrate that our framework can
effectively refine attributes.
@inproceedings{mlg2020_19,
title={Network Embedding with Attribute Refinement},
author={Tong Xiao, Yuan Fang, Hongxia Yang and Vincent W. Zheng},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Network Experiment Design for Estimating Direct Treatment EffectsPDFVideo Zahra Fatemi and Elena Zheleva
Abstract: Network experiment design refers to the design of controlled
experiments for interacting units with the goal of estimating a causal effect of interest.
Estimating the effect of treatment alone on units' outcome, known as direct treatment effect, in
network experiments is challenging due to information spillover between peers through shared
edges. Prominent methods for network experiment design mostly focus on estimating total
treatment effects, the combination of peer effects and direct treatment effects. Less focus has
been given to approaches that provide an unbiased estimation of direct treatment effect.
We present a framework that takes advantage of independent sets and assigns treatment and
control only to a set of non-adjacent nodes in a graph, in order to disentangle peer effects
from direct treatment effect estimation. Randomizing over independent set nodes removes peer
effects between nodes in the experiment while canceling out the peer effects from nodes outside
the experiment.
Through a series of simulated experiments on synthetic and real-world network datasets, we show
that our framework significantly increases the accuracy of direct treatment effect estimation in
network experiments.
@inproceedings{mlg2020_34,
title={Network Experiment Design for Estimating Direct Treatment Effects},
author={Zahra Fatemi and Elena Zheleva},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
On Structural vs. Proximity-based Temporal Node EmbeddingsPDFVideo Puja Trivedi, Alican Büyükçakır, Yin Lin, Yinlong Qian, Di Jin and Danai Koutra
Abstract: We investigate the representation power of static node embeddings in
dynamic or temporal settings. To this end, we introduce a framework that incorporates different
design options for extending static node embeddings to temporal settings: temporal combination
schemes to introduce dynamics in otherwise static approaches, alignment methods that lead to
comparability of embedding dimensions across time steps, and different edge operators for
generating edge embeddings from node embeddings. In our empirical analysis, we evaluate the
performance of both proximity-based and structural node embedding methods in a temporal link
prediction task over four time-evolving networks. Our results show that proper choice over these
designs yields up to 20% absolute improvement over baselines that do not leverage temporal
combination and embedding alignment. We further present broad trends to guide design decisions
for embedding methods in temporal settings.
Keywords: temporal graphs, graph embeddings, temporal link
prediction
@inproceedings{mlg2020_37,
title={On Structural vs. Proximity-based Temporal Node Embeddings},
author={Puja Trivedi, Alican Büyükçakır, Yin Lin, Yinlong Qian, Di Jin and Danai Koutra},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Robust Unsupervised Mining of Dense Sub-Graphs at Multiple ResolutionsPDFVideo Neil Gupta, Gunjan Gupta, Joydeep Ghosh, Sheshank Shankar and Alex Mallen
Abstract: Category: Novel research paper. Whereas in traditional partitional
clustering, each data point belongs to a cluster, there are several applications where only some
of the points form relatively homogenous or "dense" groups, and points that don't seem to belong
to any cluster need to be ignored. Moreover, different clusters may emerge at different scales
or density levels. This makes it difficult to identify them using a single density threshold,
especially if we also want to ignore the non-clustering data. If data is represented in a metric
space, then recent extensions of a classical approach called Hierarchical Mode Analysis (HMA)
are able to identify clusters at multiple resolutions, while ignoring "non-dense" areas.
However, this approach does not apply when the relations between pairs of data points can only
be represented as a (sparse) similarity or affinity graph. Motivated by two complex, real-life
applications where one needs to identify dense subgraphs at multiple resolutions, while ignoring
nodes that are not well connected in the similarity graph, we introduce a novel algorithm called
HIMAG (Hierarchical Incremental Mode Analysis for Graphs) that provides capabilities analogous
to HMA based methods but applicable to graphs. We also provide a powerful multi-resolution
visualization tool customized for the new algorithm. We present results on the two motivating
real-world applications as well as two standard benchmark social graph datasets, to show the
power of our approach and compare it with some standard graph partitioning algorithms that were
retrofitted to produce dense clusters by pruning non-dense data in a non-trivial manner. We are
also open-sourcing the new dense graph datasets and tools to the community.
@inproceedings{mlg2020_36,
title={Robust Unsupervised Mining of Dense Sub-Graphs at Multiple Resolutions},
author={Neil Gupta, Gunjan Gupta, Joydeep Ghosh, Sheshank Shankar and Alex Mallen},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
SNAPSKETCH: Graph Representation Approach for Intrusion Detection in a Streaming
GraphPDFVideo Ramesh Paudel and William Eberle
Abstract: In this paper, we propose a novel unsupervised graph representation
approach in a graph stream called SNAPSKETCH that can be used for anomaly detection. It first
performs a fixed-length random walk from each node in a network and constructs n-shingles from a
walk path. The top discriminative n-shingles identified using a frequency measure are projected
into a dimensional projection vector chosen uniformly at random. Finally, a graph is sketched
into a low-dimensional sketch vector using a simplified hashing of the projection vector and the
cost of shingles. Using the learned sketch vector, anomaly detection is done using the
state-of-the-art anomaly detection approach called RRCF [1]. SNAPSKETCH has several advantages,
including fully unsupervised learning, constant memory space usage, entire-graph embedding, and
real-time anomaly detection.
@inproceedings{mlg2020_1,
title={SNAPSKETCH: Graph Representation Approach for Intrusion Detection in a Streaming
Graph},
author={Ramesh Paudel and William Eberle},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Scalable and Consistent Estimation in Continuous-time Networks of Relational
EventsPDFVideo Makan Arastuie, Subhadeep Paul and Kevin S. Xu
Abstract: In many application settings involving networks, such as messages
between users of an on-line social network or transactions between traders in financial markets,
the observed data consist of timestamped relational events, which form a continuous-time
network. We propose the Community Hawkes Independent Pairs (CHIP) generative model for such
networks. We show that applying spectral clustering to adjacency matrices constructed from
relational events generated by the CHIP model provides consistent community detection for a
growing number of nodes. We also develop consistent and computationally efficient estimators for
the model parameters. We demonstrate that our proposed CHIP model and estimation procedure
scales to large networks with tens of thousands of nodes and provides superior fits than
existing continuous-time network models on several real networks.
This submission is a novel research paper.
Keywords: Community Hawkes Independent Pairs model, event-based
network, continuous-time network, timestamped network, relational events, spectral clustering,
Hawkes process
@inproceedings{mlg2020_27,
title={Scalable and Consistent Estimation in Continuous-time Networks of Relational
Events},
author={Makan Arastuie, Subhadeep Paul and Kevin S. Xu},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Scale-Free, Attributed and Class-Assortative Graph Generation to Facilitate
Introspection of Graph Neural NetworksPDFVideo Neil Shah
Abstract: Semi-supervised node classification on graphs is a complex interplay
between graph structure, node features and class-assortative (homophily) properties, and the
flexibility of a model to capture these nuances. Modern datasets used to push the frontier for
these tasks exhibit diverse properties across these fronts, making it challenging to study how
these properties individually and jointly influence performance of modern embedding-based
methods like graph neural networks (GNNs) for this task. In this work-in-progress, we propose an
intuitive and flexible scale-free graph generation model, CaBaM, which enables simulation of
class-assortative and attributed graphs via the well-known Barabasi-Albert model. We show
empirically and theoretically how our model can easily describe a variety of graph types, while
imbuing the generated graphs with the necessary ingredients for attribute, topology, and
label-aware semi-supervised node-classification.
@inproceedings{mlg2020_33,
title={Scale-Free, Attributed and Class-Assortative Graph Generation to Facilitate Introspection
of Graph Neural Networks},
author={Neil Shah},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Substitution Techniques for Grocery Fulfillment and Assortment Optimization Using
Product GraphsPDFVideo Amit Pande, Aparupa Das Gupta, Kai Ni, Rahul Biswas and Sayon Majumdar
Abstract: Identifying substitutable pairs or groups of products is key to
building relevant product assortment for brick-and-mortar stores as well as to efficiently
handle out of stock scenarios. In this work, we describe the unique challenges with a retailer’s
data to identify substitutable product pairs from a large catalog and nationwide store
transactions. We apply some of the well established approaches in data mining and machine
learning to customer store purchase data and product attributes data to generate networks of
substitutable products. This paper presents a novel application of substitutable product
networks integrated with an assortment optimization engine that was developed in-house to select
the optimal assortment of products for the stores. The outcomes from a large scale experiment
conducted across 120 stores within United States demonstrates a unit sale lift in excess of 11%.
In another set of tests, we analyze the performance of various algorithms for product
substitution and fulfillment when customers encounter Out of Stock (OOS) scenario while shopping
groceries for same day delivery.
Keywords: Product Substitutes, Data Mining, Graph, Assortment
Optimization
@inproceedings{mlg2020_7,
title={Substitution Techniques for Grocery Fulfillment and Assortment Optimization Using Product
Graphs},
author={Amit Pande, Aparupa Das Gupta, Kai Ni, Rahul Biswas and Sayon Majumdar},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At
FacebookPDFVideo Nima Noorshams, Saurabh Verma and Aude Hofleitner
Abstract: Since its inception, Facebook has become an integral part of the
online social community. People rely on Facebook to connect with others and build communities.
As a result, it is paramount to protect the integrity of such a large network in a fast and
scalable manner. In this paper, we present our efforts to protect various social media entities
at Facebook from people who try to abuse our platform. We present a novel Temporal Interaction
EmbeddingS (TIES) model that is designed to capture rogue social interactions and flag them for
further suitable actions. TIES is a supervised, deep learning, production ready model at
Facebook-scale networks. Prior works on integrity problems are mostly focused on capturing
either only static or certain dynamic features of social entities. In contrast, TIES can capture
both these variant behaviors in a unified model owing to the recent strides made in the domains
of graph embedding and deep sequential pattern learning. To show the real-world impact of TIES,
we present a few applications especially for preventing spread of misinformation, fake account
detection, and reducing ads payment risks in order to enhance Facebook platform’s integrity.
Keywords: temporal embeddings, sequence modeling, social media
integrity
@inproceedings{mlg2020_38,
title={TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At
Facebook},
author={Nima Noorshams, Saurabh Verma and Aude Hofleitner},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Understanding and Evaluating Structural Node EmbeddingsPDFVideo Junchen Jin, Mark Heimann, Di Jin and Danai Koutra
Abstract: While most network embedding techniques model the proximity between
nodes in a network, recently there has been significant interest in structural embeddings that
are based on node equivalences, a notion rooted in sociology: equivalences or positions are
collections of nodes that have similar roles—i.e., similar functions, ties or interactions with
nodes in other positions—irrespective of their distance or reachability in the network. Unlike
the proximity-based methods that are rigorously evaluated in the literature, the evaluation of
structural embeddings is less mature. or real networks with labels that are arbitrarily defined,
and its connection to sociological equivalences has hitherto been vague and tenuous. To fill in
this gap, we set out to understand what types of equivalences structural embeddings capture. We
are the first to contribute rigorous intrinsic and extrinsic evaluation methodology for
structural embeddings, along with carefully-designed, diverse datasets of varying sizes. We
observe a number of different evaluation variables that can lead to different results (e.g.,
choice of similarity measure or label definitions). We find that degree distributions within
nodes’ local neighborhoods can lead to simple yet effective baselines. We hope that our findings
can influence the design of further node embedding methods and also pave the way for future
evaluation of existing methods. **Category: appraisal paper**
@inproceedings{mlg2020_29,
title={Understanding and Evaluating Structural Node Embeddings},
author={Junchen Jin, Mark Heimann, Di Jin and Danai Koutra},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Unsupervised Hierarchical Graph Representation Learning by Mutual Information
MaximizationPDFVideo Fei Ding, Xiaohong Zhang, Justin Sybrandt and Ilya Safro
Abstract: Graph representation learning based on graph neural networks (GNNs)
can greatly improve the performance of downstream tasks, such as node and graph classification.
However, the general GNN models do not aggregate node information in a hierarchical manner, and
can miss key higher-order structural features of many graphs. The hierarchical aggregation also
enables the graph representations to be explainable. In addition, supervised graph
representation learning requires labeled data, which is expensive and error-prone. To address
these issues, we present an unsupervised graph representation learning method, Unsupervised
Hierarchical Graph Representation (UHGR), which can generate hierarchical representations of
graphs. Our method focuses on maximizing mutual information between "local" and high-level
"global" representations, which enables us to learn the node embeddings and graph embeddings
without any labeled data. To demonstrate the effectiveness of the proposed method, we perform
the node and graph classification using the learned node and graph embeddings. The results show
that the proposed method achieves comparable results to state-of-the-art supervised methods on
several benchmarks. In addition, our visualization of hierarchical representations indicates
that our method can capture meaningful and interpretable clusters.
@inproceedings{mlg2020_13,
title={Unsupervised Hierarchical Graph Representation Learning by Mutual Information
Maximization},
author={Fei Ding, Xiaohong Zhang, Justin Sybrandt and Ilya Safro},
booktitle={Proceedings of the 16th International Workshop on Mining and Learning with Graphs
(MLG)},
year={2020}
}
Call for Papers
Due to public health concerns in light of the unfolding COVID-19 outbreak. We will follow
exactly the option that ACM SIGKDD and the KDD 2020 organizing committee will suggest and
will follow the style that the KDD conference adopts. Please check this website regularly
for updates.
This workshop is a forum for exchanging ideas and methods for mining and learning with
graphs, developing new common understandings of the problems at hand, sharing of data sets
where applicable, and leveraging existing knowledge from different disciplines. The goal is
to bring together researchers from academia, industry, and government, to create a forum for
discussing recent advances in graph analysis. In doing so, we aim to better understand the
overarching principles and the limitations of our current methods and to inspire research on
new algorithms and techniques for mining and learning with graphs.
To reflect the broad scope of work on mining and learning with graphs, we encourage
submissions that span the spectrum from theoretical analysis to algorithms and
implementation, to applications and empirical studies. As an example, the growth of
user-generated content on blogs, microblogs, discussion forums, product reviews, etc., has
given rise to a host of new opportunities for graph mining in the analysis of social media.
We encourage submissions on theory, methods, and applications focusing on a broad range of
graph-based approaches in various domains.
Topics of interest include, but are not limited to:
Theoretical aspects:
Computational or statistical learning theory related to graphs
Theoretical analysis of graph algorithms or models
Sampling and evaluation issues in graph algorithms
Analysis of dynamic graphs
Algorithms and methods:
Graph mining
Probabilistic and graphical models for structured data
Heterogeneous/multi-model graph analysis
Network embedding models
Statistical models of graph structure
Combinatorial graph methods
Semi-supervised learning, active learning, transductive inference, and transfer
learning in the context of graph
Applications and analysis:
Analysis of social media
Analysis of biological networks
Knowledge graph construction
Large-scale analysis and modeling
We welcome many kinds of papers, such as, but not limited to:
Novel research papers
Demo papers
Work-in-progress papers
Visionary papers (white papers)
Appraisal papers of existing methods and tools (e.g., lessons learned)
Relevant work that has been previously published
Work that will be presented at the main conference
Authors should clearly indicate in their abstracts the kinds of submissions
that the papers belong to, to help reviewers better understand their contributions.
All papers will be peer reviewed, single-blinded.
Submissions must be in PDF, no more than 8 pages long — shorter papers are
welcome — and formatted according to the standard double-column ACM
Proceedings Style.
The accepted papers will be published on the workshop’s website and will not be considered
archival for resubmission purposes.
Authors whose papers are accepted to the workshop will have the opportunity to participate
in a spotlight and poster session, and some set will also be chosen for oral presentation.
To receive updates about the current and future workshops and the Graph Mining community,
please join the Mailing
List, or follow the Twitter
Account.
Important Dates
Paper Submission Deadline:June 15, 2020
Author Notification:July 15, 2020
Camera Ready:August 1, 2020
Workshop: August 24, 2020
Workshop Organizers
Shobeir Fakhraei
Machine Learning Scientist Amazon
Aude Hofleitner
Research Scientist Manager Facebook
Julian McAuley
Associate Professor University of California San Diego
Bryan Perozzi
Research Scientist Google Research
Tim Weninger
Associate Professor University of Notre Dame
Program Committee
Aris Anagnostopoulos (Sapienza University of Rome)
Ana Paula Appel (IBM Research Brazil)
Christian Bauckhage (Fraunhofer)
Austin Benson (Cornell University)
Siddharth Bhatia (National University of Singapore)
Aleksandar Bojchevski (Technical University of Munich)
Ulf Brefeld (Leuphana Universität Lüneburg)
Marco Bressan (Sapienza University of Rome)
Ivan Brugere (University of Illinois at Chicago)
Ting Chen (University of California, Los Angeles)
Hocine Cherifi (University of Burgundy)
Aaron Clauset (University of Colorado Boulder)
Alessandro Epasto (Google)
Dhivya Eswaran (Amazon)
Yuan Fang (Singapore Management University)
William Hamilton (Stanford University)
Larry Holder (Washington State University)
Bryan Hooi (National University of Singapore)
Jin Kyu Kim (Facebook)
Danai Koutra (University of Michigan)
Stefano Leucci (University of L'Aquila)
Jundong Li (University of Virginia)
Fred Morstatter (University of Southern California)
Blaz Novak (Jozef Stefan Institute)
John Palowitch (Google)
Evangelos Papalexakis (University of California Riverside)
Ali Pinar (Sandia National Laboratories)
Jan Ramon (INRIA)
Sucheta Soundarajan (Syracuse University)
Acar Tamersoy (NortonLifeLock Research Group)
Hanghang Tong (University of Illinois at UC)
Stefan Wrobel (Fraunhofer IAIS & Univ. of Bonn)
Xin-Zeng Wu (Information Sciences Institute)
Zhongfei Zhang (SUNY Binghamton)