rpicompsci

Research Blog for Department of Computer Science @ Rensselaer Polytechnic Institute

Finding prominent people in large scale social networks

Xiaohui Lu, Graduate Student at Rensselaer Polytechnic Institute, Department of Computer Science


My name is Xiaohui Lu. I am Phd student in computer science working with Prof. Sibel Adali. My main area of research is ranking actors in social networks. I also work closely with researchers from Social Cognitive Network Academic Research Center. I have finished my BS in computer science from SUNY at Albany in 2008, and MS in computer science from Rensselaer Polytechnic Institute in 2011.

Who are prominent people in a social network? Prominent people are viewed as important by the others in the network. What that means really depends on the underlying network. For example, girls with popular fashion tastes on youtube are targeted by many advertisers for their potential influence over others’ fashion choices; a charismatic person on twitter or facebook can easily organize a campaign for a specific cause; In academic networks, prominent people create research areas that are recognized as important by their peers. However, it is not trivial to identify these “prominent” people even in small social networks – terrorism networks are good examples where “important” actors usually disguise themselves carefully. In other networks, it is important to understand how the prominence manifests itself in the network activities.

We develop a method to identify prominent people in a large scale social collaborative networks. In these networks, people collaborate with each other to create objects like academic papers or movies. Our method works on the following assumption:

  1. Important people create important objects. Not all objects of a prominent person are important. But, their best creations are generally more important than other objects in the network. Objects are generally created by collaboration with other members.
  2. Good objects belong to pools that bring together related objects with similar importance. Not everyone can participate in these pools though. For example, conferences have a selection criteria for papers.

We find these pools by using machine learning algorithms that group objects that are more similar to each other than other objects into clusters. Our algorithm then works by propagating the prominence of people, their objects and the object groups in an iterative manner until it converges.

We test our method in a large scale data set — the DBLP data set. The DBLP data set is a collection of authors, papers, and conferences information in computer science area. In the data set, a paper reflects strong collaboration among its authors.

We show our method is  superior to various ranking methods, such as Degree, PageRank, and HITS. Moreover, we observe that the grouping of objects provides valuable information to identify prominent actors. In a way, it reduces the noise inherent in such data. But, interestingly, our method works better if we disregard the natural groupings of objects like conferences and journals, and substitute our groupings instead. The figure on the left shows the distribution of scores for the DBLP dataset. As the DBLP collection of publications was originally started by researchers from the database community, this area appears to be central in the underlying data set. We show a number of research areas represented by this dataset.

To validate our algorithm, we use a number of external measures of prominence based on citation, h-index, the number of citations of top 10 papers for each author, etc. We look at the top scoring individuals of our algorithm as well as the ordering of all the individuals returned by our algorithm and compare it with the true ordering with respect to a given measure of prominence using Kendall-tau rank correlation. Our algorithm shows better performance over all other algorithms across all the different measures of performance we study. The code for this algorithm can be found here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on August 13, 2012 by in Machine Learning, Social Networking.
%d bloggers like this: