Beelzebub Jones' Blog: A Sprinkling of Pixel Dust

A Sprinkling of Pixel Dust

OK, so it has been a long time since my last post. A lot has happened between then and now. The biggest of which is that I quit my job at a major Australian Bank to undertake a doctorate in mathematical science. I have to admit that I love the student life. It lets me work on things I find fascinating and there is not the stress associated with working a full time job in corporate land.

But that's not what I wanted to blog about. I wanted to share something I have been working on - for fun. First a little math lesson - I'll try to keep it high level so you can get a appreciation for what I am doing without getting bogged down in too much notation.

A Short Math Lesson

Here is a picture of a puppy.

I know many of us will look at it and say Awwww, but I want to look at this picture as a mathematician would. At it's most basic this picture is a matrix. It just so happens that the numbers that make up this matrix represent shades of grey. That's what the legend on the right is telling us. We could completely describe this image by knowing all of those numbers and the dimensions of the matrix. And, because it is a matrix, you can do all sorts of mathy things to it and I am going to describe one of them here.

It just so happens that to get a good approximation of this matrix we don't need to know all of the numbers that sit behind every pixel by using a bit of matrix algebra called the Singular Value Decomposition (SVD). What this does is break a matrix into three components. Recombining those components will give you your original matrix back. Like this:

However, recombining just a portion of those three components, as shown in red below, will give you an approximation of the original matrix. The greater the portion of those components used the better that approximation. It works like this:

So, you can capture an approximation of a matrix with a smaller set of numbers than those contained in the original matrix. That, boys and girls, is how jpeg compression works. It's why .jpg files are usually smaller than .bmp files. It's how a rover on Mars can transmit a high definition picture of the Martian surface without sending every pixel. Let's try this on our puppy matrix.

Left to Right: 2 features, 5 features, 10 features, 20 features, 50 features

As we move from 2 to 50 features the approximation gets better. In order to recover the full image we would need 300 features but you can see how just 50 features was able to get a good approximation of the image. It is also worth noting that the higher order features contain less and less information about the matrix. Put simply, most of the relevant information you need about a matrix is captured by the lower order ("rank" in matrix language) features. A mathematician would say that the image on the left was a rank 2 approximation and the image on the right was a rank 50 approximation. Now, as long as a matrix has some structure in it, the the mathematics of the SVD will find it. The more obvious the structure the fewer features required to describe it. Images are matrices with structure so it works very well with them but this idea will work just as well on any matrix.

Ok, the math lesson is over. If you want to know more about the SVD though you can check out a lecture by Dr Gilbert Strang from MIT here

I'm now going to give you a flavour of what I have been playing around with. Not because its part of my doctoral research, but because it is very cool.

Zombie Matrices

I first wrote the code I am about the describe to use on a Kaggle data set. I used it on a data challenge being faced by Amazon but, because it took me so long to figure it out, I missed the submission deadline. This particular problem was one which could be expressed as a sparsely populated matrix. This sparseness meant that I could not apply the SVD method I described above but I still needed to know what the features of the matrix were so that I could make a good prediction about the missing values. I needed to bring the matrix back from the dead - and I found that I could with some math voodoo.

After a discussion at a workshop related to my doctoral program I was fired up to revisit this code again. I wanted to make it less tailored to the Amazon Challenge and give it a more general application. I'm using images because they contain structure that everyone can see, but remember this can work on any matrix.

I'll start by sending a random 20% of our puppy image to pixel heaven. Same puppy, but this time he looks like he's in the snow.

So how do we clean this image up? This is the really cool part. It turns out that the answer is related to the SVD we discussed earlier. Instead of deriving the features from the original image as the SVD does I had to write some code to learn the features from those points that were available and then use these to try to recover the original matrix. My approach is based on (but not exactly the same as) the approach discussed by Simon Funk here. He used it when competing in the Netflix prize. My code has a better initialisation approach and a configurable regularisation parameter to help protect against overfitting.

I used my code to train 10 features from the snowy puppy image above and merged the results back into the image. I got the image below. Notice how we've got the detail back on his nose, his whiskers are visible again and his fur is more easily discernible.

What if we remove 50% of the data? Here's the before picture...

...and after there's still enough information left to estimate the original picture. Again I only trained 10 features and we've still got the detail on his nose and eyelids, we're still able to make out his whiskers and, while it's not as crisp as it was before, we can still make out his fur. He now looks like an impressionist painting.

Ok, one more time. Let's blitz 90% of the pixels so there's not much left at all.

After the features of this pixel dust are learned by my code we get something that is clearly recognisable as the dog from the original image at the top, albeit a bit blurry. I used 10 features again but if more are trained then a better approximation could possibly be obtained.

So what are the applications of zombie matrices?

I have shown that one application is in image processing. This could potentially be of benefit to law enforcement or enhancing satellite imagery. I also know from experience that real world data often contains missing values. For a variety of predictive modelling tasks having a more complete data set is better.

Direct examination of the features extracted gives also meaningful information. For example in the Netflix challenge it gave groups of customers that have similar taste in movies and groups of movies that had similar attributes. In the Amazon challenge it gave me system access rules that could be applied to the different job roles at Amazon. So this approach has a role to play in recommender systems.

My code still needs tweaking so that it runs more efficiently. That's why I limited the features trained to 10. For me, writing the code and seeing the results was just fun.

2 comments:

Unknown14 July 2014 at 01:35
Great article! I was looking at sparse matrix recommendation engines for use in a site to recommend jokes to people. I'd like to talk about implementing your code and seeing if the results are funny--in a good way. :-) The site is not up yet but I'm writing it in Django, which is in Python. Would that be compatible with your code?
(I think my first post disappeared while posting.)
Beelzebub Jones14 July 2014 at 10:21
Thanks Kurt,

At the moment the code is written in R. I am using Python for my research so I have been considering porting it into Python 3.4.

Since posting the article I have vectorised the code so that it runs a whole heap faster.

You have an interesting application. I assume that users would rate a joke on some scale, say 1 to 5, and based on their response other jokes they might like would be offered to them. At the moment my code works on interval data (the difference between two values has a consistent meaning). While ordinal data will also work it will give a fractional recommendation (3.6 say, instead of 4 or 3). I have a little trick for dealing with such data that I stumbled across in an old book by statistician J W Tukey which transforms ordinal data into the continuous space which I think would work best. For the Netflix challenge Simon Funk and the winners Bellkor's Pragmatic Chaos used a trimming approach to map to the ordinal rating. I do not believe that this would take a users rating behavior into account. For example some users are polarised in that they only give 1 or 5, others usually give 3 and sometime will give other ratings. I know when I rate things I tend to avoid the extremes. The transformation that I am thinking of would allow for such differences in rating behaviour.

Beelzebub Jones' Blog

Wednesday, 9 July 2014

A Sprinkling of Pixel Dust

A Sprinkling of Pixel Dust

A Short Math Lesson

Zombie Matrices

2 comments:

About Me

Blog Archive