Monday, February 24, 2014

AIWorld6 - The Joy of Small Data - Follow up

I got some great feedback from people, props go to rolisz for suggesting a stack exchange conversation that lead me to Kernel density estimation.

You can see from the plot of evaluated and the plot of the derivative of the plot that it'll be easy to pick out what the species are. A naive parser that just looks at the local maximums as indicating a species and local minimums as indicating the difference between species did quite well on this data. I would have guessed [14900, 15050, 15300, 15500, 15650, 15900, 16100] and the KDE with this naive parsing went with [14732, 14861, 15054, 15674, 15725, 16087]. With the exception of 15674 and 15725 being too close to each other I think it was a total win. I'll be implementing this algorithm shortly. And perhaps even showing the histogram to let a viewer second guess the automatic stats gathering if they feel the need.

