Politwitter is now indexing Canadian political hashtags on Google+. Here is an example of #cdnpoli http://politwitter.ca/page/googleplus/hash/cdnpoli
With Politwitter you can search, filter & sort the posts. For example here is the most plusone'd posts from #cndpoli in 2012.
Let me know what the other popular hashtags used on G+ are and I will add them to the Politwitter hashtag directory.
Politwitter has already been indexing G+ posts from MPs since 2011
But there isn't many MPs using G+, if you know of any let me know!
I plan on creating more statistics for Google+ on Politwitter right now there isn't much. There is a "Top Contributors block on the right side of the page when viewing a G+ hashtag.???
The backend workings of Politwitter have changed and evolved since its launch in April 2009. The API's that Politwitter uses to aggregate data are always changing so its always bit of work keeping up with these changes. Also with the ever growing amount of data and the increase in volumne I've had to change things to handle the capacity over this time.
Politwiiter doesn't load any of the social media data like tweets & Facebook posts on demand, this would be very slow and with things like Twitter it would very quickly reach their "rate limits". So I store all of this content locally in the politwitter database and this content is fetched by background processes, so the frontend pageloads never have to wait to fetch content. This allows me to also generate all the fun stats on this data and provide a permanent historical archive of elected officials social media usage.
At first I used the Twitter REST API to fetch tweets with backcound processes that ran ever couple minutes. It was always a challenge dealing with the Twitter rate limiting, so in the fall of 2009 I rewrote the backend to use the new Twitter Streaming API Alpha. Since then Politwitter gets 99% of the Twitter data in realtime using the streaming API and rate-limits are not an issue. But I still use the REST API to do "backup scrapes" to get any tweets that might have been missed by the streaming API, or if the streaming process crashes for some reason I can still get all the missed tweets for the archive.
Here is some short video of the 2 background streaming processes running and one REST scraper process running.
Politwitter displays a lot of content & statistics on every page and has many detailed statistics pages. Since the size of the database has grown so large, most of this data is processed in the background on a periodic bases rather than on demand.
For example the "Todays Federal Stats" block on the homepage, this data takes a long time to generate and would cause the pageload times to be greater than 1 minute. So a background process runs every 10 minutes that generates this data and is stored in the database. This data is stored over time so trends can be produced or historical stats retreived. This is the case for a lot of the stats like follower counts or sentiment.
Content that isn't generated in the backcound is still almost always cached, there are over 20,000 cache files stored. Almost all the "side blocks" are cached every 5-10 minutes and some pages like statisitc pages are fully cached.
All of the other social media content besides Twitter is fetched on a periodic bases as they don't have an equivalent "Streaming API". For example Facebook posts are fetched every hour. There are a lot of these background processes running 24/7 to keep Politwitter updated, generating stats and keeping pageload times to a minimum.
I have a backend page to monitor how long these various background processes are taking to run, to watch for potential problems.
Even with all of this background pre-processing and caching there are still many pages on Politwitter that are slow to load and the site can slow down under high traffic. I have done a lot of optimizing of database queries but the size of the tweets database has just grown so large its always a challenge.
The tweets database is now over 5.3 million tweets!
The major issue I have with the database is with "table locking". The main tweets mysql table is MyISAM format which makes most of the read queries fast and the writing very fast. But the major problem is MyISAM locks the table during a query, so a backlog of queries can form if a complex query takes a long time. If a page is taking a long time to load on Politwitter this is most likely the reason.
In the winter of 2011 I greatly helped this situation by adding a 2nd staging tweets table that the streaming API writes to, then a 2nd process grabs tweets from this staging table and moves them to the main tweets table afterwards. This allows for high volumn of incoming tweets to not slowdown the site and seperates the reading from writing, greatly reducing the table locking. This also means the streaming API won't miss tweets due to not being able to write a tweet due to table locking. I've tested over 3,000 tweets a minute during the Royal Wedding test.
But since the main tweets table is getting so massive some of the read queries take longer to execute and table locking is often an issue. Especialy when bots from Google or Bing crawl the site to weird urls that produce some complex queries. I've tried to reduce some of these queries and block bots from some of these pages, but table locking is still my evil nemesis.
I have investigated changing the tweets table to InnoDB format which has "row level locking" that would potentialy resolve the table locking issue, but its not that simple. The overall performance of InnoDB compared to MyISAM in this use-case is slower and many of the queries I use are VERY slow in InnoDB. Queries with functions like COUNT() are deadly slow in InnoDB and many of the index optimizations i've made in MyISAM work differently in Innodb.
I'm planning on taking another look at converting to Innodb, which basically means I have to check every query on the site and rewrite or find alternate methods for many. This is very time consuming, but its something that will probably have to happen as the database continues to grow and traffic to the website continues to grow. There are over 2,000 average pageviews per day.
I have considered looking at other database solutions like MongoDB, but this would require major reworking of the Politwititer code base. But if InnoDB turns out to not be viable, this may be the next step.
If you want to help with the continued development & maintainance of Politwiiter you can donate here.
gov.politwitter.ca is a companion tool to Politwitter that tracks social media activity by Canadian government departments, institutions & organizations.
Many of the features, & statistic tracking from Politwitter is now available to help track & analyze how the Canadian government is using social media and how citizens are interacting with it.
Back in 2009 after launching Politwitter.ca I thought about creating sub-sites for local government, newsmedia & government. In 2009 I started creating newstweets.ca but soon realised there was not much interest. I created local.politwitter.ca to start collecting municipal twitter data, but also realised I didn't have the time to maintain a database of all the municipalities in Canada. Keeping Politwitter going with federal & provincial politics is already a big undertaking.
I started on a Government site but with federal & provincial elections on the go I was focused on Politwitter.ca. Well in March 2012 a new site zegov.ca created a directory of Government Twitter & Facebook accounts so it reminded & prompted me to finish the Politwitter GOV site. Keeping a directory of Governments social media is manageable, the numbers aren't huge. But as with the main Politwitter site the directory can be updated by anyone.
Politwitter not only lists government Twitter & Facebook accounts but also includes Youtube, Flickr & indexing of photos & links. Politwitter also aggregates all of this government social media data for permanent archival and analysis, with features & tools people have loved on the Politwitter political side.
Politwitter allows sorting, filtering, searching and statistics of the Government social media. More statistics will become available once more data is collected.
I don't mean to step on Zegov's toes, but I've had this on the backburner for 2 years and already had all the infrastructure built for Politwitter. Using the Politwiter platform gives much more functionality than exists on the Zegov website. I've also seen several projects like these popup over the years that fizzle out or aren't maintained. Politwitter has a proven track record which media, politicians and government already rely on. I can also share data between the government & political sites for deeper analysis going forward.
I'm always open to working with others or having people help develop Politwitter, but for the most part no one takes me up on that offer and it's been a solo undertaking. Of course if you're not a web developer you can always help by keeping the directory updated, telling others about Politwitter or donating.
If you have any suggestions for the new Government Politwitter let me know! If you see a government twitter account missing you can add it here.
I thought it would be fun to track the tweets during the Royal Wedding along with providing a test for some backend changes to Politwitter i've made to better handle high volume of tweets on election day.
I ran into a problem though! The Twitter Streaming API seemed to cap-out at 51 tweets/sec (3060 tweets/min). I know my database can handle inserts per/sec MUCH higher than that from my tests, so it seems to be a ceiling on the Twitter API. If you know more about this please let me know! I doubt Canadian political Twitter will ever reach that volume though. So the stats are not totally accurate because for 2 hours the tweets/sec were greater than this limit as you can see in the chart below. Lets get to the stats I was able to collect!
From 3:00am - 6:00pm (London time) there were 1,242,321 tweets!
Top Retweeted Tweets
The Royal Wedding dominated the Twitter top trends
Just a quick post with a chart showing the number of tweets by candidates broken down by province.
The NDP had an online town hall today with questions from callers and people on Twitter using the #asklayton hashtag. The actual live video stream had many problems in the first half, low audio volume then suddenly spiking high. The video would often freeze or cut out completely, but these issues seemed to fade. It appeared that most questions were from callers on the phone and not twitter though.
So how popular was this on Twitter? There was 752 tweets during the townhall. So nothing mind blowing, not breaking into the Twitter.com top trends, but a clever use of Twitter none the less. Also interesting to note that they were moderating tweets that were displayed on the NDP website using the hashtag, probably a good idea.
3 MPs and 7 candidates tweeted with the hashtag.
Here are some of the most tweeted topics: promises, jobs, childcare, reform, postsecondary, internet
Here is a short video clip of a few of the always running processes on the Politwitter server during the french language leaders debate.
Along with a few processes that are running 24/7 there are 40 or so tasks that run in the background on a periodic basis to keep the site updated, refresh stats along with other things. Doing all this processing in the background allows the website to load fast, which has always been a priority from day one.
Politwiter also caches the output of a lot of content, like the sidebar blocks, many are cached anywhere from 1 minute to 30 minutes. There are over 11,000 files in the cached content folder right now. Gzip is also used to speed up delivery.
Politwitter is run on a Xeon Quad core Dell Poweredge server running Windows Web Server 2008. A second Dell poweredge server is used as a dedicated database server. Bother servers have 4GB of ram, which is used to give MySQL plenty of room for in-memory caching. Politwitter is run on IIS7 using PHP. The website is custom coded by Trevor May and doens't use any 3rd party frameword or CMS. The uses jquery along with other 3rd party platfrms like Goole Visualization API for charts.
That's all for now, I might write a post detailing some of the ways Politwitter deals with the large amounts of data and how the Twitter API's are used.
I am excited to announce that Politwitter is teaming up with Global News to help provide them with insights into the election campaign on Twitter & other social media. Politwitter is providing content 'widgets' that will soon showup on www.globalnews.ca along with data used for articles like this. In return Politwitter is getting some great National exposure and we have some additional things planned for debate and election nights.
I also wanted to mention that Politwitter and myself remain completely indipendant from Global and they have no control over the content or direction of what I do on politwiter.ca. Additionaly while some of the widgets and data I am providing Global will be exclusive, anything on Politwitter.ca is open for media to source (credit and link to politwitter.ca is greatly appreciated). I also remain open to talk with other news media, if you have questions or are interested in an interview.
Thanks to Global for helping promote Politwitter, stay tuned for more!
As the first week of the election campaign wraps up, Twitter has shown that it has become a powerful tool during political campaigns. There have already been some high profile uses of Twitter, such as being able to see clear evidence of public support for the Green Party's Elizabeth May in her debate exclusion, Harper and Ignatieff trading challenges for a one-on-one debate and @senatorjake taking a shot at reporters.
Twitter has seen a surge of activity in the Canadian political discussion, there have already been over 130,000 tweets in the first week of the campaign. The media has been highlighting the impressive numbers of Twitter volume - 30,000 tweets in first 4-5 days of the campaign. This is not the whole picture though, these numbers only reflect tweets using the election hashtags like #elxn41. Politwitter.ca has a much larger data pool to sample from, including a large directory of MPs, candidates, riding associations and more. Over 600 accounts are tracked in the federal list. Politwitter also has numerous hashtags that are indexed along with per-riding hashtags and a list of keywords that are tracked. This enables Politwitter to provide a much broader picture of the Canadian political twitterverse.
With over 130,000 tweets in the first week (18,000 tweets a day) compared to around 44,000 tweets in the week before the election campaign began. However, only around 43% of these tweets are original content, the rest being retweets. So far, Liberal MPs and candidates have been out-tweeting their competitors by almost 2:1, but the Conservatives have more MPs tweeting each day. I suspect these numbers will even out as the parties get more of their candidates on Twitter. There are currently 170 MPs on Twitter, despite my assumptions I was surprised to find that nearly 90% of the their accounts are currently active! This number is much higher than during the pre-election period. MPs and candidates have only contributed 4,000 tweets (3% of the total amount). Liberals are retweeted the most, and Liberals and Conservatives are about even for replies to their tweets.
The now widely known #elxn41 hashtag has accounted for more than half the tweets during the first week. The French language version #fed2011 has only around 10% of #elxn41's volume.
One of the intriguing things that Politwitter has been doing with all of this data is sentiment tracking. Tweets are marked as positive, negative or neutral using a method developed by Stanford University called "Sentiment Classification using Distant Supervision." Only around 20% of Tweets can be classified, the remainder don't have any or enough emotional context. But, even with that low percentage we can produce interesting results because of the volume of Tweets during this election. For example, you can see the Green party's positive to negative percentage shoot up on Wednesday when the Elizabeth May exclusion from the debates was the hot topic. Sentiment for the Liberals has stayed more positive versus the Conservatives and the NDP for most of the campaign so far; the Bloc Quebecois has enjoyed high sentiment throughout the whole period.
Sentiment is also tracked for each of the party leaders, these numbers have moved around a fair bit through the week but at the weeks end the ranking is Gilles Duceppe, Elizabeth May, Michael Ignatieff , Stephen Harper, Jack Layton. Harper has the greatest volume of positive tweets, but the previous ranking comparing the ratio between positive and negative is more insightful. Elizabeth May has been tweeting more than the other leaders which makes sense since she declared the Greens would be using Social Media heavily and she needs to get her message out more than the others.
Politwitter is also tracking popularity and sentiment of the parties election promises. In the first week the promises by the NDP have received the greatest positive response followed by the Liberals then Conservatives. The Liberal promises have been the most popular on Twitter having the most Tweets and retweets mentioning them.
The first week of the "Twitter Election" started strong and I prediect will continue to grow in numbers with huge spikes during the debate and election day, May 2nd. What further surprises and insights will Twitter provide during the rest of the campaign? If the first week is any indication there will be more and it seems like it's the thing to be watching. The great unknown is if any of this actually affects voters and election results, time may tell.
On march 15th I started my push for riding level hashtags, with the promotion of a “riding hashtag id lookup tool”. A week later a blogger posted a blog with suggestions for alpha-acronym based riding hashtags. I have had a couple people suggest they would be better than the EDID numeric hashtags I am using, this is my response to those queries.
I created the riding edid hashtags before that blog post was made and the edid hashtags have since really started to gather attention and usage. The Hashtag lookup tool has been retweeted by over 100 people http://politwitter.ca/page/edid_lookup and the riding hashtags have been used 220 times already.
I explained most my reasoning on the lookup page, but here is some more detail.
Why do we use hashtags? To categorize tweets so others can find tweets along the same topic, so uniqueness of a hashtag is paramount. This is one of the main reasons I chose not to pursue acronyms, so many acronyms or short-forms of riding names would result in hashtags already in wide usage on Twitter for things not related to Canadian Politics. If they are not in-use now, there is also a chance they could be taken over by some obscure foreign topic in the future. The author of the hashtag blog list didn’t take much care in choosing hashtags, he just used obvious shortforms or acronyms, SOOO many of those result in hashtags that would be flooded with non-relevant tweets. Making following those hashtags rather useless for those ridings.
I decided several months ago I wanted to capture riding level tweets, so I considered several options, and consulted with several others about it. Having unique hashtags is important for the user experience but also for Politwitter since I want to index all these riding level tweets so I can do riding by riding analysis of things like sentiment and volume.
Since riding hashtags are meant to be for tweets about a specific riding, it’s not as important that I know what the riding hashtag for Oakville is, All I really need to know is my own riding. For people who are interested in other ridings, I have made it easier to see what the riding hashtag represents on Politwitter by creating a hover box with details about the hashtag when you place your mouse over it. I am also working on a couple new features on Politwitter to make remembering riding hashtags easier on Politwitter, for people that use the site to tweet.
People can also use other hashtags in their tweets that are more human readable outside the Politwitter site, but it just makes it so much easier to index riding related tweets using the edid tag. With the tweets that contain hashtags, many often contain multiple hashtags anyway, It’s very. So I encourage people to sue both.
I am also indexing tweets that contain the riding name in the tweet, so riding tweets can also be indexed that way.
I cannot endorse or promote that list of hashtags from the blog, not enough thought was given in their uniqueness. I started down the acronym road over a month ago but quickly found out it was not practical, trying to find acronyms that weren’t already used, resulted in hashtags that were no more recognizable that the edid number. However if a particular riding acronym hashtag is found to be in wide use I will of course add it to the Politwitter database like I do for any Canadian political hashtag. As long as it doesn’t have a large percentage of non-relevant tweets also using the same hashtag. And I encourage anyone to contact me via email or a tweet with any suggestions of riding hashtags they think are in wide use. But I will continue to promote the riding edid hashtags as the defacto standard, its usage is growing each day.
But Like I said if there is a another hashtag growing in unique usage in a riding, I will add it to the Politwitter database and riding hashtag lookup tool, which will suggest both.
This action requires you to be logged into Politwitter. No regisrtation is required, just authenticate using your Twitter account.