Some Notes on ML in Haskell following MLWeek
From Monday 2/11 to Thursday 5/11 I attended ML Week, a training session lead by Jeff Abrahamson who also organizes the Nantes ML Meetup. This session was great, not only because of its content, mostly an overview of the main techniques available and a hands-on dive into Python’s eco-system to support data analysis, but also because of the discussions we had with other attendees and with Jeff whose depth of knowledge on the subject is truely amazing.
However I was a bit frustrated of not being able to epxlorer the topic using my language of choice, namely Haskell. So I took opportunity of this training to start collecting links and ideas on how to do data analysis and ML in Haskell. Here are a few links and comments on my attempts to map the tools we were using in Python to equivalent things in the Haskell eco-system:
- All the hands-on codelabs for the training were provided in the form of IPython Notebooks, so I went to install IHaskell which provides a Haskell kernel for notebooks. It works great straight out of the box, I only had some minor glitches with Charts display. I must say that the community for IHaskell is very responsive!
- Kronos provides a packaged interactive data visualization tool that contains everythign that’s needed to run IHaskell out-of-the-box when you dont’ want to bother with installing Haskell eco-system,
- cassava provides type-safe parsing of CSV data,
- There is a base statistics package for Haskell: http://hackage.haskell.org/package/statistics, which is maintained by Brian O’Sullivan who is also behind wreq, the one-stop shop for making HTTP clients. This package among many other stuff provides linear regression,
- HLearn is an ambitious project to provide efficient pure Haskell implementations of various standard ML algorithms,
- Basic matrices operations are provided by hmatrix which is based on efficient routines implemented by LAPACK, BLAS, and GSL,
- hstatistics is another statistics package based on hmatrix,
- There is a very interesting series of post from Dominik Steinitz: The ones I have been particularly interested in are on linear and logistic regressions using automatic differentiation. There has been some code drift in AD since the posts were written so they don’t compile as-is using latest versions of libraries but modifications are minor,
- I thus turned to ad package by E.Kmett which happens to contain a routine for computing directly approximations of functions through gradient descent techniques,
- chatter implements some “standard” NLP algorithms which we had to deal with to implement a spam detector,
- Support Vector Machines support in Haskell is implemented in a couple of packages:
- There are haskell bindings to the (apparently) state-of-the-art library libsvm which is what is used by scikit-learn,
- svm is another package which seems a bit oldish and unmaintained,
- I don’t think there is a compelling implementation of general purpose neural networks of any kind, although there appear to be quite a few package dealing with those beasts on hackage,
- There are two libraries for computing K-means, both pretty recent:
- For Principal Component Analysis, there is hstatistics or hmatrix-nipals