What is twInfluence?

twInfluence is a simple tool using the Twitter API to to measure the combined influence of twitterers and their followers, with a few social network statistics thrown in as bonus.

We know that "A-List" Twitterers like Scoble, LeoLaporte, and BarackObama have a lot of influence on Twitter, because they have tens of thousands of followers. However, social network analysis teaches us that there is a "horizon of communication" that extends beyond your own direct contacts, and this is demonstrated whenever somebody "retweets" a message. The significance is that not all followers are equal.

Imagine Twitterer1, who has 10,000 followers - most of which are bots and inactives with no followers of their own. Now imagine Twitterer2, who only has 10 followers - but each of them has 5,000 followers. Who has the most real "influence?" Twitterer2, of course.

As of right now, 20,055 twitterers have profiles analyzed on twinfluence.com.

 

Privacy and Caches

The Twitter API requires authentication to accomplish a couple of the tasks that twinfluence performs. However, I DO NOT record, cookie, or cache your Twitter login credentials or follower lists - unless you request that I tweet you when a particular user's profile is updated offline, at which point I delete any credentials I have for you. I DO store completed search results and statistics in order to calculate rankings. TwInfluence mostly automates what you could do by surfing twitterers and their followers manually... if you were really, really determined and were willing to add up tens of thousands of network values.

Once a twitterer is analyzed, scores are cached for 48 hours before they will be recalculated in order to keep repeat requests to the Twitter API to a minimum (the cache is increased to two weeks for particularly large twitterers with 30,000 or more followers). Until then, cached values will be displayed even if a new request is made. I try to manually refresh the cache for all top 50 users every two weeks or so.

 

About The Rankings

Twinfluence calculates two types of ranks - one assigning an absolute rank (#1!) compared to all analyzed twitterers to date, and one that assigns a value and category relative to other twitterers that have more or less the same number of followers as you.

Reach Ranking. Reach rankings take the form of "Rank: #XXX (YY%)". The #XXX score is your overall rank compared to all other twitterers that have been analyzed by twinfluence. If your rank is #400, that means there are 399 other twitterers in the system who have higher reach scores than you. The (YY%) score is your grade; if you have a grade of 75%, it means that you have a higher reach than 75% of the other twitterers we have analyzed.

Relative Scores. There are three derived statistics twinfluence calculates - velocity, social capital, and centralization - that don't really make sense without additional context. For each of these, we compare your values to other twitterers with plus or minus 50% as many followers as you have - so this is essentially a comparison against others with about the same number of followers as you. They take the form of ±X.X Description. The number is how far from average you score (in standard deviations); the second part provides a rough description category for how positive or negative you compare. For example, if you had a social capital score of "+2.1 Very High", you would know that you scored much higher than other people like you for social capital.

 

About The Code

Twinfluence is not a crawler or spider. Having a "real time" network analyzer is simply not possible using Twitter's API. Twinfluence only collects information on a twitterer's followers, and the raw follower counts for each of them. It does not crawl out and collect each followers' actual networks, comparing all of their connections to create a true bidirectional social network graph.

Here's some quick figures to demonstrate our limitations. Let's take an average target user that has 462 followers (that's average in the twinfluence.com database). Each of their followers has an average of 462 followers, too - creating a total of 213,444 total twitterers we'd have to spider and examine in realtime. For larger users, it's much worse - for BarackObama, for example, we would have to essentially spider the entire twitterverse of 5+ million twitterers to measure his second-order network accurately. Neither you nor Twitter wants to wait for that many pages to load.

That's why twinfluence reach is best considered an estimate of potential, not actual, connections. It also explains why some top twitterers have reach values that may exceed recent estimates of the size of the entire twitterverse. I'm working on an "offline" twinfluence engine to spider actual twitter networks, which would allow to generate some really interesting network metrics like betweenness, equivalence, and clustering / communities.

 

Definitions

First and Second Order Networks: From the perspective of graph theory, a Twitterer's followers would be considered their first-order network, and their "followers count" the same as their "degree". "Degree" is a simple form of centrality measurement that equates to "prestige" or "popularitiy"; different types of centrality can measure connectivity, authority, and control in a network. The following diagram demonstrates the different "neighborhoods" in a network. The Twitterer is the primary node (shown in red); its first-order neighbors (shown in green) surround it, and its second-order neighbors (shown in blue) surround the outside.

Reach: Reach is the number of followers a Twitterer has (first-order followers), plus all of their followers (second-order followers). In the diagram above, the reach would be 27 (there are 28 nodes, including the Twitterer). This is by necessity a crude maximum estimate, since there will definitely be duplicates and overlaps that could only be eliminated by up to thousands of API calls. Reach is a measurement of potential audience and listeners, a best estimate of the number of people that a given Twitterer could quickly get a message to.

Velocity: Velocity merely averages the number of first- and second-order followers attracted per day since the Twitterer first established their account. The larger the number is, the faster that Twitterer has accumulated their influence. Of course, this number could jump significantly with the addition of a few high-profile followers. Velocity is scored from "very slow" to "very fast" relative to other twitterers at your network size.

The following chart is generated dynamically and shows that as twitterers build their follower network, their velocity tends to increase. In other words, the more followers you get, the faster you get them, and the faster your reach builds through sort of a "snowball" effect! It tails off at the end, which could mean there's a sort of saturation point where everyone who is interested in what you have to say is already following you, or that we still need to collect more profiles.

Social Capital: OK, I'm abusing the academic term "social capital" a little to indicate the average first-order network of a Twitterer's followers. It's essentially a measure of how influential are a twitterer's followers. A high value indicates that most of that Twitterer's followers have a lot of followers themselves. Social Capital is scored from "very low" to "very high" relative to other twitterers at your network size.

The following chart is generated dynamically and shows that as twitterers build their follower network, their social capital tends to start very high, build for a while, then slowly decrease. This is probably because as most people start tweeting, they follow a few high-profile twitterers who may reciprocate. Over time, however, they attract more and more people - and that means more and more people with few followers, including bots, spammers, and silent followers.

Centralization: This is a measure of how much a Twitterer's influence (reach) is invested in a small number of followers. Centralization scores range from 0% (completely decentralized) to a theoretical 100% (completely dependent on one Twitterer). In social network analysis, a high centralization indicates dependency of the network on just a few nodes to maintain the connectivity of the entire network. Twitterers with low centrality networks would not have their reach greatly reduced if a few high-profile people stopped following them. Centralization is scored from "very fragile" to "very resilient" relative to other twitterers at your network size, implying that a network with only a few high-profile followers is very sensitive to collapsing if those followers leave. Conversely, a network with low centralization is not very dependent upon any few followers for its collective reach.

I use a simple algorithm based upon Freeman's degree centrality for graphs to calculate centralization. Lin's not on twitter, but he is a friend of mine!

The following chart is generated dynamically and shows that as twitterers build their follower network, their network decentralizes. Again, this is probably because the typical beginning twitterer has only one or two influential followers in their network.

Efficiency: What happened to the 'efficiency' score? After discussions with some other social media experts, we decided that the idea of twitter efficiency is an interesting one, but there really isn't any way to effectively measure behavior in the context of how a twitterer actually keeps on top of their tweetstream. I've retired this concept for now, but leave the definition below for the sake of discussion.

The more people you follow, the more time you have to spend reading and filtering tweets. I've heard it argued that nobody can effectively and consistently follow more than a couple hundred Twitterers and actually keep up with the tweetstream. Efficiency was simply my measure of how many people a Twitterer had to follow in order to build up their reach, as a percentage. Highly "efficient" people like Will Wheaton follow only a few dozen people even though they themselves have thousands of followers! Ideally, this would measure a Twitterer's Friends' collective "number of updates" relative to reach.

 

Version History

2008-10-11: Version 1.0

2008-10-14: Version 2.0

  • Fixed SQL bug happening to twitterers with an apostrophe in their name.
  • Added top-fifty lists for centralization, social capital, velocity and reach.
  • Added scoring and ranking system for all major statistics.
  • Got API whitelisted by the Twitter.com admins, removing API limitations causing incorrect (too small) statistics for twitterers with more than 7,000 users. Thanks guys!
  • Added the "send as tweet" option.
  • Added slick charts!
  • Dropped "efficiency" stat.

2008-10-20: Version 2.1

  • Implemented faster, more CPU-friendly group centralization measure suggested by my fellow social network analyst, the ever-insightful Dr. Ulrik Brandes of the University of Konstanz.
  • Implemented new interface.

2008-11-10: Version 2.2

  • Addressed a server time-out issue caused occasionally by long searches
  • Added dynamic progress bar during calculations.

2008-11-24: Version 2.2

  • Added recordkeeping for social network analysis.
  • Added better error handling and maintenance alerts - less downtime!

2009-01-05: Version 2.3

  • By request, permalinks!
  • Added offline social network analysis link spidering.