Thursday, September 29, 2016
I work for Barclays, in London, working on a brand new Haskell project. We're looking for nine additional Haskell programmers to come and join the team.
What we offer
A permanent job, writing Haskell, using all the tools you know and love – GHC/Cabal/Stack etc. In the first two weeks in my role I've already written parsers with attoparsec, both Haskell and HTML generators and am currently generating a binding to C with lots of Storable/Ptr stuff. Since it's a new project you will have the chance to help shape the project.
The project itself is to write a risk engine – something that lets the bank calculate the value of the trades it has made, and how things like changes in the market change their value. Risk engines are important to banks and include lots of varied tasks – talking to other systems, overall structuring, actual computation, streaming values, map/reduce.
We'll be following modern but lightweight development principles – including nightly builds, no check-ins to master which break the tests (enforced automatically) and good test coverage.
These positions have attractive salary levels.
What we require
We're looking for the best functional programmers out there, with a strong bias towards Haskell. We have a range of seniorities available to suit even the most experienced candidates. We don't have anything at the very junior end; instead we're looking for candidates that are already fluent and productive. That said, a number of very good Haskell programmers think of themselves as beginners even after many years, so if you're not sure, please get in touch.
We do not require any finance knowledge.
The role is in London, Canary Wharf, and physical presence in the office on a regular basis is required – permanent remote working is not an option.
How to apply
To apply, email neil.d.mitchell AT barclays.com with a copy of your CV. If you have any questions, email me.
The best way to assess technical ability is to look at code people have written. If you have any packages on Hackage or things on GitHub, please point me at the best projects. If your best code is not publicly available, please describe the Haskell projects you've been involved in.
Sunday, August 14, 2016
Summary: Last year I made a list of four flaws with Haskell. Most have improved significantly over the last year.
No language/compiler ecosystem is without its flaws. A while ago I made a list of the flaws I thought might harm people using Haskell in an industrial setting. These are not flaws that impact beginners, or flaws that stop people from switching to Haskell, but those that might harm a big project. Naturally, everyone would come up with a different list, but here is mine.
Package Management: Installing a single consistent set of packages used across a large code base used to be difficult. Upgrading packages within that set was even worse. On Windows, anything that required a new
network package was likely to end in tears. The MinGHC project solved the network issue. Stackage solved the consistent set of packages, and Stack made it even easier. I now consider Haskell package management a strength for large projects, not a risk.
IDE: The lack of an IDE certainly harms Haskell. There are a number of possibilities, but whenever I've tried them they have come up lacking. The fact that every Haskell programmer has an entrenched editor choice doesn't make it an easier task to solve. Fortunately, with Ghcid there is at least something near the minimum acceptable standard for everyone. At the same time various IDE projects have appeared, notably the Haskell IDE Engine and Intero. With Ghcid the lack of an IDE stops being a risk, and with the progress on other fronts I hope the Haskell IDE story continues to improve.
Space leaks: As Haskell programs get bigger, the chance of hitting a space leak increases, becoming an inevitability. While I am a big fan of laziness, space leaks are the big downside. Realising space leaks were on my flaws list, I started investigating methods for detecting space leaks, coming up with a simple detection method that works well. I've continued applying this method to other libraries and tools. I'll be giving a talk on space leaks at Haskell eXchange 2016. With these techniques space leaks don't go away, but they can be detected with ease and solved relatively simply - no longer a risk to Haskell projects.
Array/String libraries: When working with strings/arrays, the libraries that tend to get used are
utf8-string. While each are individually nice projects, they don't work seamlessly together. The
utf8-string provides UTF8 semantics for
bytestring, which provides pinned byte arrays. The
text package provides UTF16 encoded unpinned
Char arrays. The
vector package provides mutable and immutable vectors which can be either pinned or unpinned. I think the ideal situation would be a type that was either pinned or unpinned based on size, where the string was just a UTF8 encoded array with a
newtype wrapping. Fortunately the
foundation library provides exactly that. I'm not brave enough to claim a 0.0.1 package released yesterday has derisked Haskell projects, but things are travelling in the right direction.
It has certainly been possible to use Haskell for large projects for some time, but there were some risks. With the improvements over the last year the remaining risks have decreased markedly. In contrast, the risks of using an untyped or impure language remain significant, making Haskell a great choice for new projects.
Thursday, August 04, 2016
I'm delighted to announce that I'll be giving a talk/hack session on Shake as part of the relatively new "Haskell Hacking London" meetup.
Title: Writing build systems with Shake
Date: Tuesday, August 16, 2016. 6:30 PM
Location: Pusher Office, 28 Scrutton Street, London
Abstract: Shake is a general purpose library for expressing build systems - forms of computation, with caching, dependencies and more besides. Like all the best stuff in Haskell, Shake is generic, with details such as "files" written on top of the generic library. Of course, the real world doesn't just have "files", but specifically has "C files that need to be compiled with gcc". In this hacking session we'll look at how to write Shake rules, what existing functions people have already layered on top of Shake for compiling with specific compilers, and consider which rules are missing. Hopefully by the end we'll have a rule that people can use out-of-the-box for compiling C++ and Haskell.
To put it another way, it's all about layering up. Haskell is a programming language. Shake is a Haskell library for dependencies, minimal recomputation, parallelism etc. Shake also provides as a layer on top (but inside the same library) to write rules about files, and ways to run command line tools. Shake doesn't yet provide a layer that compiles C files, but it does provide the tools with which you can write your own. The aim of this talk/hack session is to figure out what the next layer should be, and write it. It is definitely an attempt to move into the SCons territory of build systems, which knows how to build C etc. out of the box.
Monday, July 25, 2016
Summary: I'm looking for a maintainer to take over Derive. Interested?
The Derive tool is a library and set of definitions for generating fragments of code in a formulaic way from a data type. It has a mechanism for guessing the pattern from a single example, plus a more manual way of writing out generators. It supports 35 generators out of the box, and is depended upon by 75 libraries.
The tool hasn't seen much love for some time, and I no longer use it myself. It requires somewhat regular maintenance to upgrade to new versions of GHC and haskell-src-exts. There are lots of directions the tool could be taken, more derivations, integration with the GHC Generic derivation system etc. There's a few generic pieces that could be broken off (translation between Template Haskell and haskell-src-exts, the guessing mechanism).
Anyone who is interested should comment on the GitHub ticket. In the absence of any volunteers I may continue to do the regular upgrade work, or may instead have it taken out of Stackage and stop upgrading it.
Monday, July 18, 2016
Summary: Stack originally used Shake. Now it doesn't. There are reasons for that.
The Stack tool originally used the Shake build system, as described on the page about Stack's origins. Recently Edward Yang asked why doesn't Stack still use Shake - a very interesting question. I've taken the information shared in that mailing list thread and written it up, complete with my comments and distortions/inferences.
Stack is all about building Haskell code, in ways that obey dependencies and perform minimal rebuilds. Already in Haskell the dependency story is somewhat muddied. GHC (as available through
ghc --make) does advanced dependency tracking, including header includes and custom Template Haskell dependency directives. You can also run
ghc in single-shot mode, compiling a file at a time, but the result is about 3x slower and GHC will still do some dependency tracking itself anyway. Layered on top of
ghc --make is Cabal which is responsible for tracking dependencies with
.cabal files, configured Cabal information and placing things into the GHC package database. Layered on top of that is Stack, which has multiple projects and needs to track information about which Stackage snapshot is active and shared build dependencies.
Shake is good at taking complex dependencies and hiding all the messy details. However, for Stack many of these messy details were the whole purpose of the project. When Michael Snoyman and Chris Done were originally writing Stack they didn't have much experience with Shake, and opted to go for simplicity and directly managing the pieces, which they viewed to be less risky.
Now that Stack is written, and works nicely, the question changes to if it is worth changing existing working code to make use of Shake. Interestingly, at the heart of Stack there is a "Shake-lite" - see Control.Concurrent.Execute. This piece could certainly be replaced by Shake, but what would the benefit be? Looking at it with my Shake implementers hat on, there are a few things that spring to mind:
This existing code is O(n^2) in lots of places. For the size of Stack projects, compared to the time required to compile Haskell, that probably doesn't matter.
Shake persists the dependencies, but the Stack code does not seem to. Would that be useful? Or is the information already persisted elsewhere? Would Shake persisting the information make
stackbuilds which had nothing to do go faster? (The answer is almost certainly yes.)
Since the code is only used on one project it probably isn't as well tested as Shake, which has a lot of tests. On the other hand, it has a lot less features, so a lot less scope for bugs.
The code makes a lot of assumptions about the information fed to it. Shake doesn't make such assumptions, and thus invalid input is less likely to fail silently.
Shake has a lot of advanced dependency forms such as resources. Stack currently blocks when simultaneous configures are tried, whereas Shake would schedule other tasks to run.
Shake has features such as profiling that are not worth creating for a single project, but that when bundled in the library can be a useful free feature.
In some ways Stack as it stands avoids a lot of the best selling points about Shake:
If you have lots of complex interdependencies, Shake lets you manage
them nicely. That's not really the case for Stack, but is in large
heterogeneous build systems, e.g. the GHC build system.
If you are writing things quickly, Shake lets you manage
exceptions/retries/robustness quickly. For a project which has the
effort invested that Stack does, that's less important, but for things
like MinGHC (something Stack killed), it was critically important because no one cared enough to do all this nasty engineering.
If you are experimenting, Shake provides a lot of pieces (resources,
parallelism, storage) that help explore the problem space without
having to do lots of work at each iteration. That might mean Shake is
more of a benefit at the start of a project than in a mature project.
If you are writing a version of Stack from scratch, I'd certainly recommend thinking about using Shake. I suspect it probably does make sense for Stack to switch to Shake eventually, to simplify ongoing maintenance, but there's no real hurry.
Tuesday, July 05, 2016
Summary: Alex and Happy had three space leaks, now fixed.
Using the techniques described in my previous blog post I checked
alex for space leaks. As expected, both had space leaks. Three were clear and unambiguous space leaks, two were more nuanced. In this post I'll describe all five, starting with the obvious ones.
1: Happy - non-strict accumulating fold
Happy contains the code:
indexInto :: Eq a => Int -> a -> [a] -> Maybe Int indexInto _ _  = Nothing indexInto i x (y:ys) | x == y = Just i | otherwise = indexInto (i+1) x ys
This code finds the index of an element in a list, always being first called with an initial argument of 0. However, as it stands, the first argument is a classic space leak - it chews through the input list, building up an equally long chain of
+1 applications, which are only forced later.
The fix is simple, change the final line to:
let j = i + 1 in j `seq` indexInto j x ys
Or (preferably) switch to using the space-leak free
Data.List.elemIndex. Fixed in a pull request.
2: Happy - sum using foldr
Happy also contained the code:
foldr (\(a,b) (c,d) -> (a+b,b+d)) (0,0) conflictList
The first issue is that the code is using
foldr to produce a small atomic value, when
foldl' would be a much better choice. Even after switching to
foldl' we still have a space leak because
foldl' only forces the outer-most value - namely just the pair, not the
Int values inside. We want to force the elements inside the pair so are forced into the more painful construction:
foldl' (\(a,b) (c,d) -> let ac = a + c; bd = b + d in ac `seq` bd `seq` (ac,bd)) (0,0) conflictList
Not as pleasant, but it does work. In some cases people may prefer to define the auxiliary:
let strict2 f !x !y = f x y in foldr (\(a,b) (c,d) -> strict2 (,) (a+b) (b+d)) (0,0) conflictList
Fixed in a pull request.
3: Alex - lazy state in a State Monad
Alex features the code:
N $ \s n _ -> (s, addEdge n, ())
N roughly corresponds to a state monad with 2 fields,
n. In this code
n is a
Map, which operates strictly, but the
n itself is not forced until the end. We solve the problem by forcing the value before returning the triple:
N $ \s n _ -> let n' = addEdge n in n' `seq` (s, n', ())
Fixed in a pull request.
4: Alex - array freeze
Alex calls the
Data.Array.MArray.freeze function, to convert an
STUArray (unboxed mutable array in the
ST monad) into a
UArray (unboxed immutable array). Unfortunately the
freeze call in the
array library uses an amount of stack proportional to the size of the array. Not necessarily a space leak, but not ideal either. Looking at the code, it's also very inefficient, constructing and deconstructing lots of intermediate data. Fortunately under normal optimisation a rewrite rule fires for this type to replace the call with one to
freezeSTUArray, which is much faster and has bounded stack, but is not directly exported.
Usually I diagnose space leaks under
-O0, on the basis that any space leak problems at
-O0 may eventually cause problems anyway if an optimisation opportunity is lost. In this particular case I had to
-O1 that module.
5: Happy - complex fold
The final issue occurs in a function
fold_lookahead, which when given lists of triples does an
mconcat on all values that match in the first two components. Using the
extra library that can be written as:
map (\((a,b),cs) -> (a,b,mconcat cs)) . groupSort . map (\(a,b,c) -> ((a,b),c))
We first turn the triple into a pair where the first two elements are the first component of the pair, call
mconcat the result. However, in Happy this construction is encoded as a
foldr doing an insertion sort on the first component, followed by a linear scan on the second component, then individual
mappend calls. The
foldr construction uses lots of stack (more than 1Mb), and also uses an O(n^2) algorithm instead of O(n log n).
Alas, the algorithms are not identical - the resulting list is typically in a different order. I don't believe this difference matters, and the tests all pass, but it does make the change more dangerous than the others. Fixed in a pull request.
Thanks to Simon Marlow for reviewing and merging all the changes. After these changes Happy and Alex on the sample files I tested them with use < 1Kb of stack. In practice the space leaks discovered here are unlikely to materially impact any real workflows, but they probably go a bit faster.
Saturday, June 11, 2016
I've just released version 3.0.0, following on from jQuery 3.0.0 a few days ago. This release breaks compatibility with IE6-8, so if that's important to you, insert an upper bound on the package.