Rice University logo
 
Top blue bar image
Kaggle competition
 

Archive for March, 2014


Learning curve for least squares

March 29th, 2014 by mjd2

From the learning curve for least squares we can see that the feature set that I’m using is simply not rich enough to capture enough information required for a competitive score on the leaderboard.

 I’m having a bit of trouble with my implementation of ridge regression, but I cannot figure what I’ve done wrong in my code.

Quick Description of Galaxy Zoo 2 (GZ2) data

March 29th, 2014 by mjd2

This information was obtained by reading the official GZ2 paper: http://arxiv.org/abs/1308.3496.

The GZ2 dataset comes from citizen scientists who voluntarily classify galaxies using a guided process, namely a multi-step decision tree. The dataset used in the Kaggle competition is the result of several debiasing procedures which produces likelihoods from the classifications. ML applications of this data tend to interpret these likelihoods instead as probabilistic weights.

Each row in the Y matrix that we wish to predict is then constituted by probabilistic weights. This does not mean that each row sums to 1. Instead, each Y value corresponds to each of the individual 37 responses. There are 11 sub-questions which sum up to 1, e.g. “Is the galaxy simply smooth and rounded, with no sign of a disk” has 3 responses: smooth, features, star which corresponds to Y(:,1), Y(:,2) and Y(:,3) and sum to 1.

Not all responses are answered for each individual classification, i.e. only parts of the decision tree from specific previous answers.