Big Data, Small Data, and Little Data

Big Data has won. For years we’ve been hearing about all the things Big Data would do to make our lives better. Then, around two years ago, people started predicting that Small Data would be the next Big Thing. But now, in 2015, we see the President of the United States appointing a US Chief Data Scientist. The person he appointed, DJ Patil, is tasked with “act as an evangelist for new applications of big data across all areas of government”

Small data, meanwhile, has an article on Wikipedia in which the descriptive text occupies less space than the warning box listing all the problems with the article. Mostly just that there isn’t much there. There is something else out there, though. Alongside all those giant repositories of big data are little websites and applications that pick through it and provide little, interesting results.

Since the term small data has been used and used up, let’s call this thing “little data.” This is not the next big thing, and it won’t supplant Big Data. In fact, these websites serve big data by taking off sections and making it fun and useful.

One example is the big, big database called IMDB, which is a treasure trove of user-created information about nearly every movie and television show in existence. It’s big, and you can spend hours in there looking around.

But for those who just want a little bit of data, a little fun with some of the stuff in there without drowning, there are other sites, like this one, which lets you graph the ratings of various TV shows over time. The site is more fun than it is useful, but it provides some surprising results. For instance, it’s common to say that shows like Lost and Battlestar Galactica lost popularity in the later seasons. But the data says that users rated the final seasons of Battlestar Galactica most highly.

Fullscreen capture 2212015 63746 PM

Lost does show a decline towards the last season, but not nearly as dramatic as you would think hearing people talk about how the show went off the rails.

The Simpsons, on the other hand, does seem to reflect the popular notion that the best seasons were some time ago. The overall trend doesn’t go down, but notice that after the 14th seasons, there are no longer any super-highly rated episodes.

Fullscreen capture 2212015 64156 PM

If your interest in the Simpsons is more about what episode that quote stuck in your head is from, this tool will help you look it up quickly.

The website known as Reddit is also a huge trove of data. There are over 7 billion pages, and over 150 million people visit it each month. But what if you, as an individual user, just want to know about your own activity, how can you grab that data and put it into nice easy to read charts? The website Snoopsnoo does exactly that.

Many Reddit users often reference a web comic known as XKCD. If you want to know which XKCD comics have been referenced most often on Reddit, and by whom, the place to go is this website. will tell you what XKCD comics have been referenced most on Reddit, and allows you to look up any comic and see how often it has been mentioned. You can even find out which Reddit users are referencing them, and which subreddits have the most XKCD references. Why did someone build a website to gather this particular information? It’s not particularly useful, but it is fun.

Instagram is a giant stream of user-generated content. There are many ways to explore the content already built into the system, but someone thought you needed a way to just look at pictures coming from a particular location. Actually, more than one person thought that would be a good tool to build. You can also look at social media feeds by location, here.

What is the point of all these little data sites? Often, there isn’t one. But the truth is that as we build bigger and bigger troves of data, we are often standing outside something like a super-super-super-market and we are really more in the mode for a little hole in the wall store that just sells bracelets.

Here are a few more little data websites:


Fullscreen capture 2212015 63746 PM

