As a first stab at working on the Galaxy Zoo problem, I ran least squares on a variety of compressed image dimensions.
Here we can see that optimal results came from 32×32 images, but that 24×24 performed about the same. This is good news considering the training data size grows very fast with increased dimensionality. My next move here is to try matrix factorization/pca to get a better least squares result, and then try regularization.
I’ve also noted that some form of data augmentation will be important in my final submissions, but I’m working with the hypothesis that what works well without data augmentation will also work well with it. This allows me to avoid its computational costs during this more experimental period.