The analysis of graphs has become important in many domains such as social media, marketing, the life sciences, telecommunication, counter-terrorism and crime fighting. Here the graphs are often so big that specialized techniques and algorithms are necessary to compute the analysis, for example by distributing them in a Spark-cluster. Data analysis tasks on real-world web-scale datasets often involve analysing properties of the graphs represented by those datasets. This includes computing graph queries such as Shortest Path, PageRank, Mutual Neighbours (finding mutual neighbours of two nodes),Connected Components (finding maximal sets of nodes that are all directly or indirectly connected), Triangles (finding all sets of three nodes that are all connected), Clustering Coefficients and Betweenness Centrality.
The research in WISE focuses on
- Large-scale community detection: Specifically the method called the Louvain method is investigated and how it can be implemented in a distributed manners, on for example Spark, such that it can deal with large scale graphs.
- Generating benchmark graphs: For testing and comparing algorithms for graph analytics it is crucial that we can generate large graphs of a desired size that have the same properties as real-world graphs. Two important properties are scale-free property and the small-world property. The first says that the distribution of degrees of the nodes follows a power law, and the second says that the the average distance between two nodes is roughly logarithmic in the number of nodes. WISE investages the possibility to generate such graphs, even for very large sizes, in a distributed fashion that optimally uses the characteristics of frameworks such as Spark.