There is a great deal of interest in analyzing data that is best represented as a graph. Examples include the WWW, social networks, biological networks, communication networks, transportation networks, energy grids, and many others. These graphs are typically multi-modal, multi-relational and dynamic. In the era of big data, the importance of being able to effectively mine and learn from such data is growing, as more and more structured and semi-structured data is becoming available. The workshop serves as a forum for researchers from a variety of fields working on mining and learning from graphs to share and discuss their latest findings.
There are many challenges involved in effectively mining and learning from this kind of data, including:
Traditionally, a number of subareas have contributed to this space: communities in graph mining, learning from structured data, statistical relational learning, inductive logic programming, and, moving beyond subdisciplines in computer science, social network analysis, and, more broadly network science.
Morning Sessions | |
---|---|
8:50 am | Opening Remarks |
9:00 am | Keynote: Lars Backstrom Serving a Billion Personalized News Feeds |
9:40 am | Paper Spotlights 1 |
10:00 am | Coffee Break |
10:30 am | Paper Spotlights 2 |
10:50 am | Keynote: Leman Akoglu Communities and Anomalies in Attributed Networks |
11:30 am | Poster Session |
12:00 pm | Lunch (+ Poster Session) |
Afternoon Sessions | |
---|---|
1:10 pm | Keynote: Tamara Kolda Correctly Modeling Networks |
1:50 pm |
Scaling Overlapping Clustering Incremental Method for Spectral Clustering of Increasing Orders |
2:10 pm | Keynote: Yizhou Sun Node Representation in Mining Heterogeneous Information Networks |
2:50 pm |
Distance-Based Influence in Networks: Computation and Maximization Measuring Graph Proximity with Blink Model |
3:10 pm | Coffee Break |
3:40 pm | Keynote: Jennifer Neville Statistical Methods for Modeling Network Distributions |
4:20 pm | Keynote: SVN Vishwanathan Exploiting the Computation Graph for Large Scale Distributed Machine Learning |
5:00 pm | Closing Remarks |
Assistant Professor
Stony Brook University
Given a network in which nodes are associated with a list of attributes, how can we define and characterize communities? How can we spot anomalous communities and anomalies within communities?
Networks have long been studied and focus has most recently shifted to 'networks with content'. Long-studied network questions, such as ranking, clustering, and similarity, are reconsidered for such networks, as the new information such as node/edge attributes and types help enrich the formulations and increase our understanding of real-world networks.
In this talk, I will introduce our work on spotting anomalies in networks with node attributes. Our main approach to anomaly mining in attributed networks is through communities. In particular, we quantify the degree that a community can be characterized through (a subset of) attributes on which its members 'click'. We then use such a quantity as a 'normality' score, based on which we identify individual anomalous nodes inside communities as well as communities that are anomalous as a group of nodes due to their low normality.
Director of Engineering
Facebook
Feed ranking's goal is to provide people with over a billion personalized experiences. We strive to provide the most compelling content to each person, personalized to them so that they are most likely to see the content that is most interesting to them. Similar to a newspaper, putting the right stories above the fold has always been critical to engaging customers and interesting them in the rest of the paper. In feed ranking, we face a similar challenge, but on a grander scale. Each time a person visits, we need to find the best piece of content out of all the available stories and put it at the top of feed where people are most likely to see it. To accomplish this, we do large-scale machine learning to model each person, figure out which friends, pages and topics they care about and pick the stories each particular person is interested in. In addition to the large-scale machine learning problems we work on, another primary area of research is understanding the value we are creating for people and making sure that our objective function is in alignment with what people want.
Distinguished Member
Sandia National Labs
Understanding and modeling go hand in hand – we develop models not only to make predictions but also to see where the models fail and there is more to do. Large-scale networks are immensely challenging to model mathematically. In this talk, we present our arguments for what features are important to measure and reproduce. In the undirected case, we show that graphs with high clustering coefficients (i.e., many triangles) must have dense Erdȍs-Rényi subgraphs. This is a key theoretical finding that may yield clues in understanding network structure. Following this line, we propose the Block Two-level Erdȍs-Rényi (BTER) model because it reproduces a given degree distribution and clustering coefficient profile (i.e., the triangle distribution), scales linearly in the number of edges, and is easily parallelized. We also consider the extension of this work to bipartite graphs, where we consider bipartite four-cycles, and propose a bipartite BTER (biBTER) model. These models can be used to generate artificial graphs that capture salient features of real graphs. We compare the artificial and real-world graphs so that we can understand where the models are accurate or not. Time permitting, we also explain how these models can be specified with very few parameters, which is useful for benchmarking purposes. We close with open questions for future investigations. This is joint work with S. Aksoy, A. Pinar, T. Plantenga, and C. Seshadhri.
Associate Professor
Purdue University
The recent interest in analyzing the network structure of complex systems has fueled a large body of research on both models of network structure and algorithms to automatically discover patterns for use in predictive models. However, robust statistical models, which can accurately represent distributions over graph populations, and sample efficiently from those distributions, are critical to assess the evaluate the performance of analytic algorithms and the significance of discovered patterns. However, unlike metric spaces, the space of graphs exhibits a combinatorial structure that poses significant theoretical and practical challenges to accurate estimation and efficient sampling/inference. In this talk, I will discuss our recent work on modeling distributions of networks, both attributed and unattributed, and outline how the methods can be used for inference and evaluation.
Assistant Professor
University of California, Los Angeles
One of the challenges in mining information networks is the lack of intrinsic metric in representing nodes into a low dimensional space, which is essential in many mining tasks, such as recommendation and anomaly detection. Moreover, when coming to heterogeneous information networks, where nodes belong to different types and links represent different semantic meanings, it is even more challenging to represent nodes properly. In this talk, we will focus on two mining tasks, i.e., (1) content-based recommendation and (2) anomaly detection in heterogeneous categorical events, and introduce (1) how to represent nodes when different types of nodes and links are involved; and (2) how heterogeneous links play different roles in these tasks. Our results have demonstrated the superiority as well as the interpretability of these new methodologies.
Professor | Principal Scientist
UCSC | Amazon
Many machine learning algorithms minimize a regularized risk. It is well known that stochastic optimization algorithms are both theoretically and practically well motivated for regularized risk minimization. Unfortunately, stochastic optimization is not easy to parallelize. In this talk, we take a radically new approach and show that working with the saddle-point problem that arises out of the Lagrangian has a very specific computational graph structure which can be exploited to allow for a natural partitioning of the parameters across multiple processors. This allows us to derive a new parallel stochastic optimization algorithm for regularized risk minimization. Joint work with: Inderjit Dhillon, Cho-Jui Hsieh, Shihao Ji, Shin Matsushima, Parameshwaran Raman, Hsiang-Fu Yu, and Hyokun Yun.
A Graph Analytics Framework for Ranking Authors, Papers and Venues
PDF
Arindam Pal and Sushmita Ruj.
Adaptive Neighborhood Graph Construction for Inference in Multi-Relational Networks
PDF
Shobeir Fakhraei, Dhanya Sridhar, Jay Pujara and Lise Getoor.
Adding Structure: Social Network Inference with Graph Priors
PDF
Han Liu, Stratis Ioannidis, Smriti Bhagat and Chen-Nee Chuah.
Detecting Concept Drift in Classification Over Streaming Graphs
PDF
Yibo Yao and Lawrence Holder.
Distance-Based Influence in Networks: Computation and Maximization
PDF
Edith Cohen, Daniel Delling, Thomas Pajor and Renato Werneck.
Distributed Community Detection on Edge-labeled Graphs using Spark
PDF
San-Chuan Hung, Miguel Araujo and Christos Faloutsos.
Efficient Comparison of Massive Graphs Through The Use Of ‘Graph Fingerprints’
PDF
Stephen Bonner, John Brennan, Georgios Theodoropoulos, Stephen McGough and Ibad Kureshi.
Entity Resolution in Familial Networks
PDF
Pigi Kouki, Christopher Marcum, Laura Koehly and Lise Getoor.
Entity Typing: A Critical Step for Mining Structures from Massive Unstructured Text
PDF
Xiang Ren, Wenqi He, Ahmed El-Kishky, Clare R. Voss, Heng Ji, Meng Qu and Jiawei Han.
Fast Patchwork Bootstrap for Quantifying Estimation Uncertainties in Sparse Random Networks
PDF
Yulia Gel, Vyacheslav Lyubchich and Leticia Ramirez Ramirez.
IGLOO: Integrating global and local biological network alignment
PDF
Lei Meng, Joseph Crawford, Aaron Striegel and Tijana Milenkovic.
Identifying Anomalies in Graph Streams Using Change Detection
PDF
William Eberle and Lawrence Holder.
Incremental Method for Spectral Clustering of Increasing Orders
PDF
Pin-Yu Chen, Baichuan Zhang, Mohammad Hasan and Alfred Hero.
Investigating the impact of graph structure and attribute correlation on collective classification performance
PDF
Giselle Zeno and Jennifer Neville.
Local Spectral Diffusion for Robust Community Detection
PDF
Kun He, Pan Shi, John E. Hopcroft and David Bindel.
Measuring Graph Proximity with Blink Model
PDF
Haifeng Qian, Hui Wan, Mark N. Wegman, Luis A. Lastras and Ruchir Puri.
Perseus3: Visualizing and Interactively Mining Large-Scale Graphs
PDF
Di Jin, Christos Faloutsos, Danai Koutra and Ticha Sethapakdi.
Predicting risky behavior in social communities
PDF
Olivia Simpson and Julian McAuley.
Real-Time Community Detection in Large Social Networks on a Laptop
PDF
Ben Chamberlain and Marc Deisenroth.
Reducing Million-Node Graphs to a Few Structural Patterns: A Unified Approach
PDF
Yike Liu, Tara Safavi, Neil Shah and Danai Koutra.
Relational Similarity Machines
PDF
Ryan Rossi, Rong Zhou and Nesreen Ahmed.
Scaling Overlapping Clustering
PDF
Kyle Kloster, Merrielle Spain and Stephen Kelley.
Sparse Network Inference using the k-Support Norm
PDF
Aman Gupta, Haohan Wang and Rama Kumar Pasumarthi.
Subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs
PDF
Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, Yang Liu and Santhosh Kumar Saminathan.
The Infinity Mirror Test for Analyzing the Robustness of Graph Generators
PDF
Salvador Aguinaga and Tim Weninger.
Training Iterative Collective Classifiers with Back-Propagation
PDF
Shuangfei Fan and Bert Huang.
User Action Prediction for Computational Advertisement Using Local Graph Algorithms
PDF
Hongxia Yang, Yada Zhu and Jingrui He.
Using MapReduce for Impression Allocation in Online Social Networks
PDF
Inzamam Rahaman and Patrick Hosein.
Within-network classification with label-independent features and latent linkages
PDF
Christopher Ryther, Jakob Simonsen and Andreas Koch.
This workshop is a forum for exchanging ideas and methods for mining and learning with graphs, developing new common understandings of the problems at hand, sharing of data sets where applicable, and leveraging existing knowledge from different disciplines. The goal is to bring together researchers from academia, industry, and government, to create a forum for discussing recent advances graph analysis. In doing so we aim to better understand the overarching principles and the limitations of our current methods, and to inspire research on new algorithms and techniques for mining and learning with graphs.
To reflect the broad scope of work on mining and learning with graphs, we encourage submissions that span the spectrum from theoretical analysis to algorithms and implementation, to applications and empirical studies. As an example, the growth of user-generated content on blogs, microblogs, discussion forums, product reviews, etc., has given rise to a host of new opportunities for graph mining in the analysis of social media. We encourage submissions on theory, methods, and applications focusing on a broad range of graph-based approaches in various domains.
Topics of interest include, but are not limited to:
We invite the submission of regular research papers (6-8 pages) as well as position papers (2-4 pages).
We recommend papers to be formatted according to the standard double-column ACM Proceedings Style.
All papers will be peer reviewed, single-blinded.
Authors whose papers are accepted to the workshop will have the opportunity to participate in a poster session, and some set may also be chosen for oral presentation.
The accepted papers will be published online and will not be considered archival.
For paper submission, please proceed to the submission website.
Please send enquiries to chair@mlgworkshop.org.
To receive updates about the current and future workshops and the Graph Mining community, please join the Mailing List, or follow the Twitter Account.
Paper Submission Open: March 16, 2016
Paper Submission Deadline: May 27, 2016 (May 16)
Author Notification: June 13, 2016
Final Version: June 25, 2016
Workshop: August 14, 2016
University of Maryland
College Park
University of California
Santa Cruz
University of Michigan
Ann Arbor
University of California
San Diego
Facebook
Menlo Park
Leman Akoglu (Stony Brook University)
Aris Anagnostopoulos (Sapienza University of Rome)
Arindam Banerjee (University of Minnesota)
Christian Bauckhage (University of Bonn)
Hendrik Blockeel (K.U. Leuven)
Ulf Brefeld (Leuphana University of Lüneburg)
Aaron Clauset (University of Colorado Boulder)
Seshadhri Comandur (University of California Santa Cruz)
Bing Tian Dai (Singapore Management University)
Thomas Gärtner (University of Nottingham)
David Gleich (Purdue University)
Mohammad Hasan (Indiana University Purdue University)
Jake Hofman (Microsoft Research)
Larry Holder (Washington State University)
Bert Huang (Virginia Tech)
Kristian Kersting (Technical University of Dortmund)
Jennifer Neville (Purdue University)
Ali Pinar (Sandia National Laboratories)
Jan Ramon (K.U. Leuven)
Jiliang Tang (Yahoo Labs)
Hanghang Tong (Arizona State University)
Chris Volinsky (AT&T Labs-Research)
Stefan Wrobel (University of Bonn)
Xifeng Yan (University of California Santa Barbara)
Mohammed Zaki (Rensselaer Polytechnic Institute)
Elena Zheleva (Vox Media)
Zhongfei Zhang (Binghamton University)
2013, Chicago, USA (co-located with KDD) 2012, Edinburgh, Scotland (co-located with ICML) 2011, San Diego, USA (co-located with KDD) 2010, Washington, USA (co-located with KDD) 2009, Leuven, Belgium (co-located with SRL and ILP) 2008, Helsinki, Finland (co-located with ICML) 2007, Firenze, Italy 2006, Berlin, German (co-located with ECML and PKDD) 2005, Porto, Portugal, October 7, 2005 2004, Pisa, Italy, September 24, 2004 2003, Cavtat-Dubrovnik, Croatia