13 Jan 2016

Packages, Linear Models, and S4

I have been taking Andrew Ng's Machine Learning course on Coursera, and am interested to implement my machine learning algorithms in R, as well as getting some practice with the S4 system. So here we have it, a very simple implementation of linear regression using an S4 class.

Read more

08 Dec 2015

k-NN Classification

I've been meaning for a while to start writing my own statistical learning algorithms. My motiviation isn't to reinvent the wheel, but to gain a proper understanding of how various techniques work. Here is the easiest of them all, k-nearest-neighbor classifier, where the input is numerical.

Read more

7 Dec 2015

Simple clustering

On the train the other day, I set myself a little challenge to write a compact little k-nearest neighbors clustering algorithm. I wrote it in R, so it definitely wouldn't stand to a C++ equivalent (as there is lots of looping involved). However, it is a nice little demonstration of how use of functional programming with the [*]apply family can make things neat and concise.

Read more

1 Dec 2015

Coursera Data Manipulation at Scale: Systems and Algorithms

What a fantastic course! I really enjoyed this one, and for anyone interested in a well rounded introduction to 'Big' data science, I wholeheartedly reccommend it.

Read more

22 Nov 2015

Bootstrapping made simple

I've dug up some experimenting I did a while ago, in the hopes that it may be an enlightening illustration for others. I'd been using the R caret package for a while, and taking for granted the simple and easy resampling available when training models. This is a nice, simple example of how resampling using the bootstrap can be peformed.

Read more

1 Nov 2015

Coursera Mathematical Biostatistics 2

Tough, but interesting. Really interesting.

Read more

10 OCT 2015

The Magic of Rcpp

A quick and easy R package with some initial mucking around integrating compiled C++ with R

Read more

01 Oct 2015

Churning with Caret: Linear Models

Predictive modeling on the churn data set with linear models. This is the first in a series of blogs where I investigate the data set- I will also be writing about non-linear and tree based models, and then I plan get carried away considering topics such as model ensembling and feature selection.

Read more

30 Sep 2015

Hadley Wickham's R Packages

A pretty comprehensive guide to creating R packages, which apparently we should all be doing!

Read more

30 Sep 2015

Coursera Johns Hopkins Data Science Specialisation

After many months, I have finished the taught courses in the Johns Hopkins Data Science Specialisation! Time for a bit of reflection...

Read more