This is the second post on www2011 conference. First post is here.
Social Network Algorithms
Network Bucket Testing paper was from facebook. This solves the interesting problem of how to select a set of users who're connected (friends) but still form a sample representative of the overall population. This is really important when you want to do A/B (bucket) testing of a social feature. It uses a novel walk based method to do the same.
Limiting the spread of mis information in social networks deals with the problem of identifying nodes in a social graph who will help to stop the "bad campaign" and save the nodes. This paper also deals with the issue of the state of a node not being known (affected/unaffected).
Information Credibility on twitter is another paper which yahoo! research is part of. It tries to automatically classify tweets as credible or not credible based on certain features. They use lot of features around characteristics of the tweet (size, url, hashtags etc), network (author, friends, followers), propagation (retweets, num tweets), popularity etc. Lots of useful ideas and information which can be used in any user generated contents.
Who says what to whom on twitter is from Cornell and yahoo! research. Some interesting claims 50% of URLs consumed in twitter are generated by 20K elite users. The URLs broadcast by different categories of users have different lifespans. Most users get their contents from other ordinary users (who're well connected and follow elite) in a two step process. News urls are short-lived, blog urls are long lived and music/videos persist nearly forever.
Information spreading in context has a surprising conclusion that how many people a user forwards the information and the total coverage the information reaches, can be captured
by a simple stochastic branching model and largely independent of context.
I'll cover the posters in the next blog.