15 Jan 2018
A primer for online learning
I have encountered a number of scenarios in my role as a data scientist which I think would be suitible for online machine learning algorithms. This post contains some basic examples to give some intuition where an online learning algorithm may be suitible.
Read more31 Oct 2017
A random forest of trees
I implemented a decision tree classifier in a previous post. Here, I extend the model to create a random forest model as an ensemble of trees.
Read more30 Sep 2017
a home grown tree
I had a project recently at work which, although it wasnt a typical classification problem, I found a nice solution involving a recursive partition tree. This got me thinking that I have never taken the time to implement a classification tree model from scratch.
Read more31 Jul 2017
Machine learning pipelines with Scikit-Learn
I've been aware of Scikit-Learn's Pipeline class for a long time, and for some reason have never really got around to having a play with it. Turns out you can do some pretty powerful things, and I will certainly be using it a lot more in future.
Read more29 Jun 2017
Bundling Python Packages for PySpark Applications
I've been using Spark for a fair few months now via the Python API. As with any rapidly developing technology with few experts around to learn from, I've found getting up and running to require a fair amount of effort. After a little bit of searching around, I found how to bundle python packages up to distribute around the cluster when submitting an application, and though some clear instructions could be of use...
Read more24 Mar 2017
pycaret- a python framework for classification and regression training
As a constructive way to improve my Python skills and understanding of supervised machine learning, I wrote a Python framework for classification and regression training, inspired by Max Kuhn's R caret package.
Read more15 Aug 2016
My Predictive Modeling Workflow
I've seen quite a few blogs about people describing their workflow for predictive modelling, from data preperation through to model evaluation. While I am adamant that there is no one size fits all approach, I thought I would share my template that I find serves as a good starting point.
Read more12 Jul 2016
Class imbalance in classification models
Class imbalance can have a massively negative impact on classification models. I investigated several means to remedy the problem, using the Adult data set as an example.
Read more06 June 2016
Quick, easy, secure file sharing
I wanted a way to share reasonably large files that is secure, free, and no hassle. I've found a solution using transfer.sh and gpg that is good up to 10Gb!
Read more20 May 2016
Model Ensembles: Regression
After gaining a lot of interest in the subject, I have started developing an R package to create model ensembles. Here, I focus on ensembles for regression modelling.
Read more