Earlier this year, the largest mathematics proof ever was announced, presenting a solution to a simple sounding problem of using red and blue to colour in all the integers, with the restriction that no pythagorean triple could be all the same colour. The computer generated proof showed that this isn't possible, in a 200 terabyte file, which took two days of solid processing on the University of Texas's super computer to produce. There's been plenty of commentary about how large this proof is, with comparisons to the size of the Library of Congress, but in the modern world of Big Data is 200 terabytes really anything special?

The proof is essentially a brute force solution, where the team have checked every possible solution until they found one that didn't work. According to IBM, the world produces around 2.5 quintillion bytes of data per day, making this proof equivalent to 6.9 seconds of global data. With that amount of data being produced every year, there is plenty of potential for other conundrums to be solved as well.

The difficulty is that while Big Data has been used here to answer the question posed, it hasn't actually increased our understanding of the problem or knowledge of the area in general. On its own the usefulness of Big Data is limited to the questions we think of asking, and the answers provided aren't going to advance the state of human knowledge. The exciting part comes when you think of combining Big Data with AI, when the entity asking the questions is scaled up to match the data set.