A lot of academic work that draws on tweets as primary data will use hashtag archives as the basis of their study. What’s nice about that is that you can use tools that capture data and present them to you in a usable manner (e.g. a CSV file). If you’re doing something a little different, like reviewing tweets from a group of individuals, that’s a little harder.
I’ve been working with my BCU colleague Inger-Lise Bore on some research into fan fiction written on Twitter (it started with this blog post – we’re presenting it at MeCCSA 2011 tomorrow). There’s no hashtag used to label the tweets we want to study – we were looking instead at the entire output from a few dozen tweets. We found a few web services that promised ways of capturing and archiving this type of Twitter data for us, but they didn’t work. At all. So instead we had to use some pretty unsophisticated means to grab the data. Continue reading Cleaning up Twitter data in Excel for analysis