Haskell-based Development Environment

Posted on May 23, 2016

In a previous post I described the overall design and architecture of Capital Match’s core system. I now turn to providing more details on our development and operations environment which uses mostly Haskell tools and code. As there are quite a lot of moving parts, this large topic will be covered in two posts: The present one will focus on basic principles, build tools and development environment ; it shall be followed by another post on configuration management, deployment and monitoring. I consider both development and production environments as a single integrated system as, obviously, there is a porous membrane between the two especially in a small company with 4 developers. Although I have been interested in that topic since my first systems programming course in university, some 18 years ago, I do not consider myself a genuine systems administrator and I made a lot of mistakes while building Capital Match platform. But I do believe in the “you build it, you run it” motto and this is all the more true for a small startup team. Hence I have tried to pay attention to building a flexible yet robust system.

Principles

When we started to setup this environment, we were guided by a few principles:

Automate

Automate all the things!

Deployment to production should be as much automated as possible, involving as few manual steps as possible. The end goal is to reach a state of continuous deployment where pushed changes are built, verified and deployed continuously over the day. This implies all the steps involved in getting some feature delivered to end-users should be identified and linked into a coherent process that is implemented in code, apart from the actual coding of the feature itself. There should be no fiddling with SSHing on production machine to fix some configuration script, no manual migration process when upgrading data schema, no copying of binaries from development environment to production…

Everything Docker

Containerize all the things!

docker is still a controversial technology, esp. among system and cloud specialists, and the topic of hot debates which is a sure sign it is a game changer. And back in 2014 when we started developing Capital Match’s platform, docker was in its infancy. I have had some experience in the past working with VServer and LXC and containers are definitely great as a way to package (parts of) a system. Using docker allows us to:

Note that we stuck to the initial docker “philosophy” of one process per container, except for some very specific needs (e.g. Selenium testing): It is not possible to ssh into our applicative containers.

Single Source of Authority

Version all the things!

This means we should of course version our application’s code, but also the system’s configuration and as much dependencies as possible. Ideally, we should be able to reconstruct the whole system from a handful commands:

In practice this is quite a bit more complicated as there are some glue parts missing to ensure the whole system can be rebuilt from scratch, but still we came quite close to that ideal. We are using 2 different repositories, one for the application code and one for the environment, mostly for technical reasons related to how our configuration management software works. The only unversioned part is the description of the “hardware” and the provisioning part which is still done “manually”.

Everything Haskell

Typecheck all the things!

There is not a dearth of tools when it comes to configuration management, systems provisioning and deployment, build tools… When starting small you usually don’t want to invest a lot of time in learning new tools hence a common choice is simply to start small with shell scripts. But tools usually exist for a reason: Scripts quickly become a tangled maze of scattered knowledge. Yet we have at our disposal a powerful tool: Haskell itself, the language and its ecosystem, hence we decided to try as much as possible to stick to using Haskell-based tools. Beside the obvious simplification this brings us (one compiler, one toolchain, one language…), the advantages Haskell provides over other languages (type safety, lazy evaluation, immutable data structures) seemed to be equally valuable at the applicative level than at the system level.

Overview

The above figure gives a high-level overview of the system:

Build & Toolchain

Build

Cabal

For building the Haskell part, we started obviously with Cabal which is the defacto build system/package manager in Haskell. The structure of GHC+Cabal packages system makes it quite hard to create insulated and reproducible build environments as there are various interactions between what’s installed globally (with GHC) and what’s installed per user and per project. There was no stack two years ago so we had to roll our own. Here are some key features of our cabal-based build:

Stack

stack represented a huge improvement for managing our build but it took us a few months to ensure it built consistently.

Leiningen

leiningen is (was?) the prominent build tool for clojure and clojurescript. We chose Clojurescript for the UI mostly because this allowed us to develop it using the excellent Om wrapper over React. It took us quite a lot of time to get our project build comfortable and it did not evolve as quickly as the Haskell one.

Javascript

When we introduced the mobile UI for Capital Match, we had to integrate its build inside our process. This caused some headache as this part of the system is developed in pure Javascript using Emberjs and relies on various tools in the JS ecosystem I was not familiar with. It also used sass to write CSS which means we needed ruby to run the compiler.

Shake

Given the diversity of tools and components we are building, we needed a way to orchestrate build of the full solution which could be easily run as part of Continuous integration. We settled on shake which is a Haskell-based tool similar to make.

Development Environment

Haskell

There has been various attempts at providing an IDE for Haskell: * leksah is an Eclipse-based Haskell IDE, * Haskell for Mac, * FPComplete used to provide some web-based environment.

Having used emacs for years, I feel comfortable with and besides there are actually benefits using a plain-text tool for coding when you are part of a distributed team: It allows you to easily setup a distributed pairing environment with minimal latency. Yet configuring a proper Haskell development environment in Emacs can be a challenging task, and it seems this is a moving target.

Here is my current .emacs content:

(eval-after-load "haskell-mode"
  '(progn
     (setq haskell-stylish-on-save t)
     (setq haskell-tags-on-save t)

     (setq haskell-process-type 'stack-ghci)
     (setq haskell-process-args-stack-ghci '("--test"))
     
     (define-key haskell-mode-map (kbd "C-,") 'haskell-move-nested-left)
     (define-key haskell-mode-map (kbd "C-.") 'haskell-move-nested-right)
     (define-key haskell-mode-map (kbd "C-c v c") 'haskell-cabal-visit-file)
     (define-key haskell-mode-map (kbd "C-c v c") 'haskell-cabal-visit-file)
     (define-key haskell-mode-map (kbd "C-c C-t") 'ghc-show-type)
     (define-key haskell-mode-map (kbd "C-x C-d") nil)
     (setq haskell-font-lock-symbols t)

     ;; Do this to get a variable in scope
     (auto-complete-mode)

     ;; from http://pastebin.com/tJyyEBAS
     (ac-define-source ghc-mod
       '((depends ghc)
         (candidates . (ghc-select-completion-symbol))
         (symbol . "s")
         (cache)))
     
     (defun my-ac-haskell-mode ()
       (setq ac-sources '(ac-source-words-in-same-mode-buffers
                          ac-source-dictionary
                          ac-source-ghc-mod)))
     (add-hook 'haskell-mode-hook 'my-ac-haskell-mode)
     
  
     (defun my-haskell-ac-init ()
       (when (member (file-name-extension buffer-file-name) '("hs" "lhs"))
         (auto-complete-mode t)
         (setq ac-sources '(ac-source-words-in-same-mode-buffers
                            ac-source-dictionary
                            ac-source-ghc-mod))))
     (add-hook 'find-file-hook 'my-haskell-ac-init)))

(add-hook 'haskell-mode-hook 'turn-on-haskell-decl-scan)
(add-hook 'haskell-mode-hook 'turn-on-haskell-indentation)
(add-hook 'haskell-mode-hook 'interactive-haskell-mode)

(add-hook 'haskell-interactive-mode-hook 'turn-on-comint-history)

(eval-after-load "which-func"
  '(add-to-list 'which-func-modes 'haskell-mode))

(eval-after-load "haskell-cabal"
    '(define-key haskell-cabal-mode-map (kbd "C-c C-c") 'haskell-compile))

Thanks to discussions with Simon and Amar I am now using the REPL much more than I used to. My current workflow when working on Haskell code looks like:

Clojurescript

The nice thing when using non-modern languages like Haskell and Clojure is that you only need to be able to edit text files to develop software, hence the choice of Emacs to develop both is kind of obvious. There is very good support for Clojure in emacs through nrepl and Cider but it seems having the same level of support for Clojurescript is still challenging.

Devbox

I already discussed in a previous blog post how we managed to do pair programming with a distributed team. One of the virtual machines we configured was our devbox which we used to do remote pairing and run experiments.

Discussion

Build process

In retrospect I think the biggest issue we faced while developing the platform and working on the dev and prod infrastructure was fighting back increase in build time as we were adding new features and services. Building a deployable container from scratch, including creation of the build machine, configuration of build tools, creation of the needed containers, download and build of dependencies, testing, packaging would take about 2 hours. Here is the breakdown of time for some of the build stages according to the CI:

Test Mean
IntegrationTest 7m51s
EndToEndTest 7m05s
Compile 6m05s
ParallelDeploy 1m12s
UITest 53.46s

Even if tests are run in parallel, this means it takes more than 10 minutes to get to the point where we can deploy code. Actually, CI tells us our mean time to deployable is about 30 minutes, which is clearly an issue we need to tackle. To reduce build time there is no better way than splitting the system into smaller chunks, something the team has been working on for a few months now and is paying off at least by ensuring we can add feature without increasing build time! The next step would be to split the core application which currently contains more than 80 files into more services and components.

On the positive side:

Development Environment

The single feature I miss from my former Java development environment is refactoring: The ability to safely rename, move, extract code fragments with a couple key strokes across the whole code base lowers the practical and psychological barrier to improve your code now. GHC (esp. with -Wall -Werror flags on) catches of course a whole lot more errors than Javac or gcc but the process of fixing compiler errors after some refactoring of a deeply nested core function is time consuming. On the other hand the lack of global refactoring capabilities is a strong incentive to modularize and encapsulate your code in small packages which can be compiled and even deployed independently.

To be continued…