Anatomy of a Haskell-based Application

Posted on November 16, 2015

This is the first post of a series I am planning to write about my experience developing software as CTO of Capital Match, a Singapore-based startup providing a peer-to-peer lending marketplace for Small and Medium Businesses and private and corporate investor.

This post is about the design and architecture of the system itself, the choices and tradeoffs that were made, whether good or bad. In the conclusion I try to provide an assessment of the current situation and reflect on those choices.

Fundamental Design Choices

Haskell

Basing Capital Match’s tech stack on Haskell was an obvious choice for me from the onset, even if I had had limited professional experience with Haskell before that:

Event Sourcing

The system was designed from the onset as an event-sourced application: The source of truth in the system is a sequence of events where each event defines a transition between two states. At any point in time, the state of the system is thus whatever state the current sequence of events leads to. Among the motivations behind using ES are:

Architecture

The main interface to the system is a REST-like API providing various resources and actions over those resources. Most exchanges with the outside world are done using JSON representation of resources, with some CSV. The User Interface is merely a client of the API and is (morally if not to the letter) a single page application. There is also a command-line client which offers access to the complete API and is used for administrative purpose.

Models

The core of the application is purely functional and made of several loosely coupled BusinessModel instances (think Aggregates in DDD parlance) that each manage a specific sub-domain: Accounting manages accounts and transactions, Facility manages facilities lifecycle, Investor and Borrower manage profiles and roles-dependent data, User manages basic registration, authentication and settings for users (e.g. features)…

A BusinessModel is defined as:

class BusinessModel a where
  data Event a :: *
  data Command a :: *
  init :: a
  act :: Command a -> a -> Event a
  apply :: Event a -> a  -> a

The state of each BusinessModel instance is computed upon application startup by loading all the events and applying each stored event to an initialised model. Models are then kept in memory while events are stored persistently. This initial startup process takes a couple of seconds given the small scale at which we operate.

Each model demarcates transactional boundaries and is the unit of consistency within the system. Commands and events on a single model are assumed to occur sequentially.

Services

Above BusinessModels are Services which provides the interface to the system. Services orchestrate the interactions of one or more Models. At the simplest level, a Service simply consists in the application of a single Command on some BusinessModel, but it can be more complex, synchronizing application of commands over several models. Based on the ideas exposed in Life Beyond Distributed Transactions, a Service represents the state of the interaction between a single user of the system, e.g. a request, and one or more piece of data maintained by the system.

Because they are the agents of the outside world in the system, Services operates in an impure context, hence in a dedicated Monad called WebM. Services typically return some representable type, when they are queries, or an Event denoting the outcome of the request. WebM is actually an instance of a monad transformer WebStateM over IO, hence it is impure. It has access to 2 pieces of state.

Here is the definition of WebStateM:

newtype WebStateM shared local m a = WebStateM { runWebM :: TVar shared -> local -> m a }

type WebM a = forall s . EventStore s => WebStateM SharedState LocalState s a

This is simply a Reader monad with two different pieces of data:

The vast majority of services use the generic applyCommand function which is the critical part of the system. This function is responsible for:

Web

The REST interface is provided by scotty which is a simple framework based on WAI and warp3. Most action handlers are pretty simple:

On top of REST endpoints sit some Middlewares which check or apply transformations to requests and/or responses:

Lost in Translation

Executing a user-triggered action is in a sense a series of translations occuring between different level of languages:

Conceptually, we have this hierarchy of monads, expressed in types:

model :: Command -> StateT STM Model (Event Model)
service :: WebM (Event Model)
web :: ActionT ()

This hierarchy of monads delineates, somewhat obviously, the following languages:

Cross-cutting Concerns

Concurrency

Concurrency is mostly handled at the REST layer through Warp and Scotty: Each request is handled concurrently by separate threads which are very lightweight in Haskell/GHC. On top of that we have a couple more threads in the application:

We used to run directly threads with forkIO and friends but finally moved to something simpler and much more robust: The async package. Concurrent updates to the model are handled through Software Transactional Memory: A TVar (transactional variable) holds a reference to the state and all operations on the state are thus transactional.

The initial goal was to enforce a strict separation between the various Business Models with an eye towards being able to deploy them as independent services exchange messages. But it happened this rule was broken more than once and a few months later we ended up having built a monolith with uncontrolled dependencies across domains and layers. We then undertook the necessary refactoring steps to get back to a “saner” state where STM transactions operate at the level of a single Command through the applyCommand function.

Persistence and Storage

Persistence is managed through a dedicated event bus: Event are first packaged into a more opaque StoredEvent object containing metadata useful for traceability of the system:

Then StoredEvents are pushed to a dedicated Driver thread which stores events in the underlying events file. Physical Storage is a simple append-only file which contains sequence of applied events serialized to some binary format (Kafka-like). We are in the process of moving to a much more robust storage solution:

Event Versioning

Events are stored with a version number which is a monotonically increasing number. This version number is bumped each time we change the structure of our events, e.g. adding a field to some record, changing a field’s type… When an event is persisted, it contains the current version number that’s defined in the application at that time. When the event is read back (i.e. deserialized) to rebuild the state of the system, this version number is used to select the correct read function.

Hence modifying the structure of events always entails the following steps in development:

This mechanism adds some interesting properties to our underlying storage:

User Interface

UI code still lives partly in the server and partly as pure client-side code:

But the grunt of UI work is done on the client with Om. Om is a clojurescript interface to Facebook’s React6. We treat the UI-backend interaction as a pure client-server: UI maintains its own state and interact with server through usual Ajax calls, updating state accordingly. The interface a user sees is a single-page application.

Logging and Monitoring

There is a Log structure which is a queue consuming logging events and handling it according to some function. We log all queries, all commands issued and various other events occuring in the system: application startup/stop, heartbeat, I/O errors, storage events… In order to prevent sensitive data to leak to logging, we have a redact function that rewrites commands to remove those data before passing it to logging system.

We currently have two different log backends:

At startup of application we also notify dev team by sending an email with the configuration. This is useful to check startup of production environment, as this email contains among other things the version of the application and the command line with which is has been started.

Reflection

It’s been a bit over a year since I have started working on Capital Match’s platform. I – we – have made mistakes, not everything went as smoothly as we would have liked and it is still just the beginning of a hopefully long adventure. One year is a good time to stop - or slow down a bit - and reflect on what’s been accomplished, what went wrong and what went well. In the next sections I try to provide a more or less objective assessment of the architecture we have put in place, and what would be our next steps.

The Good, the Bad and the Ugly

We have been live since March 2015, serving more than S$ 3 millions - and counting - in facilities for SMEs in Singapore without any major interruption of service. This in itself is an extremely positive fact: It works and it supports continuous improvements in a smooth way7.

Here are some benefits I see in our approach, encompassing both the technology used (Haskell, Om/Clojurescript) and the architecture:

Here are some mistakes I made:

What’s Next?

Within the span of a single year, much has happened in the Haskell ecosystem and things that were experimental or unwieldy one year ago are now mature and could easily make their way to production: ghcjs is now much easier to build and work with, there are more mature solutions in the front-end like reflex, build has never been easier thanks to stack, GHC 7.10 has brought a number of improvements (and controversial breaking changes like the TAP proposal)… Gabriel Gonzalez maintains a State of Haskell Ecosystem page that provides interesting overview of what’s hot and what’s not in the Haskell world.

Here are some major challenges that lie ahead of us to improve our system:

Conclusion

This article is already quite long yet it is only a brief overview of our system: It is hard to summarize one year of intense work! In future installments of this blog post series I plan to address other aspects of the system that were not covered here: Development and production infrastructure, user interface development from the point of view of a backend developer, development process.

As a final note, my highest gratitude goes to the following persons without the help of whom this adventure would not have been possible: Pawel Kuznicki, Chun Dong Chau, Pete Bonee, Willem van den Ende, Carlos Cunha, Guo Liang “Sark” Oon, Amar Potghan, Konrad Tomaszewski and all the great people at Capital Match. I also would like to thank Corentin Roux-dit-Buisson, Neil Mitchell, Joey Hess for their support and feedback.


  1. I have been using RDBMS since the 90’s, developed a point-of-sale application in Access, have been using PostgreSQL through its various versions since 1998, and recently worked on integrating DB migration process into a very large system. I am not an expert but I have had quite an extensive experience of relational databases over a significant number of years and I have always found that writing to DB quickly became a painful things.↩︎

  2. Although one could argue that there exists “languages” like Excel that allow you to write complex queries and explore data in a very sophisticated way without the use of SQL↩︎

  3. Servant is definitely on our roadmap.↩︎

  4. This thread is pretty much redundant with storage thread for the moment. The plan is to use it for serialising applyCommand operations on the models↩︎

  5. We use QuickCheck to generate a bunch of events for the type of interest.↩︎

  6. It looks like our Om will be soon superceded by om.next↩︎

  7. I plan to provide more insights on our development and operations process in another blog post, but to give rough ideas we have deployed our application about a hundred times in the past 6 months.↩︎

  8. Having a strong type system is no replacement for a decent test suite however, because obviously a lot of bugs happen at the boundaries of the system, e.g. when invoking REST API.↩︎

  9. Obviously, this works as long as your data fits in memory. My bet is this will be the case for quite a long time. Shall this ever become a problem, we will most probably be in a position to handle it.↩︎

  10. This is the approach I took in hdo because external representation was already defined, and in the end it makes encoding more explicit and easier to work with↩︎