Neil Mitchell's Blog (Haskell etc): November 2014

Wednesday, November 19, 2014

Announcing Shake 0.14

Summary: Shake 0.14 is now out. The *> operator became %>. If you are on Windows the FilePath operations work differently.

I'm pleased to announce Shake 0.14, which has one set of incompatible changes, one set of deprecations and some new features.

The *> operator and friends are now deprecated

The *> operator has been the Shake operator for many years, but in GHC 7.10 the *> operator is exported by the Prelude, so causes name clashes. To fix that, I've added %> as an alias for *>. I expect to export *> for the foreseeable future, but you should use %> going forward, and will likely have to switch with GHC 7.10. All the Shake documentation has been updated. The &*> and |*> operators have been renamed &%> and |%> to remain consistent.

Development.Shake.FilePath now exports System.FilePath

Previously the module Development.Shake.FilePath mostly exported System.FilePath.Posix (even on Windows), along with some additional functions, but modified normalise to remove /../ and made </> always call normalise. The reason is that if you pass non-normalised FilePath values to need Shake can end up with two names for one on-disk file and everything goes wrong, so it tried to avoid creating non-normalised paths.

As of 0.14 I now fully normalise the values inside need, so there is no requirement for the arguments to need to be normalised already. This change allows the simplification of directly exporting System.FilePath and only adding to it. If you are not using Windows, the changes are:

normalise doesn't eliminate /../, but normaliseEx does.
</> no longer calls normalise. It turned out calling normalise is pretty harmful for FilePattern values, which leads to fairly easy-to-make but hard-to-debug issues.

If you are using Windows, you'll notice all the operations now use \ instead of /, and properly cope with Windows-specific aspects like drives. The function toStandard (convert all separators to /) might be required in a few places. The reasons for this change are discussed in bug #193.

New features

This release has lots of new features since 0.13. You can see a complete list in the change log, but the most user-visible ones are:

Add getConfigKeys to Development.Shake.Config.
Add withTempFile and withTempDir for the Action monad.
Add -j to run with one thread per processor.
Make |%> matching with simple files much faster.
Use far less threads, with corresponding less stack usage.
Add copyFileChanged.

Plans

I'm hoping the next release will be 1.0!

Thursday, November 13, 2014

Operators on Hackage

Summary: I wrote a script to list all operators on Hackage, and which packages they are used by.

In GHC 7.10 the *> operator will be moving into the Prelude, which means the Shake library will have to find an alternative operator (discussion on the mailing list). In order to pick a sensible operator, I wanted to list all operators in all Hackage packages so I could be aware of clashes.

See the list of all exported operators on Hackage.
The most exported operator is !, which is exported from 175 packages.
The package with most exported operators is plumbers which has 931 operators.

Note that exported operators is more than just those defined by the package, e.g. Shake exports the Eq class, so == is counted as being exported by Shake. However, in most cases, operators exported by a package are defined by that package.

Producing the file

First I downloaded the Hoogle databases from Hackage, and extracted them to a directory named hoogle. I then ran:

ghc --make Operators.hs && operators hoogle operators.txt

And uploaded operators.txt above. The code for Operators.hs is:

import Control.Exception.Extra
import Control.Monad
import Data.List.Extra
import System.Directory.Extra
import System.Environment
import System.FilePath
import System.IO.Extra

main = do
    [dir,out] <- getArgs
    files <- listFilesRecursive dir
    xs <- forM files $ \file -> do
        src <- readFileUTF8' file `catch_` \_ -> readFile' file `catch_` \_ -> return ""
        return [("(" ++ takeWhile (/= ')') x ++ ")", takeBaseName file) | '(':x <- lines src]
    writeFileUTF8 out $ unlines [unwords $ a : nub b | (a,b) <- groupSort $ concat xs]

This code relies on the normal packages distributed with GHC, plus the extra package.

Code explanation

The script is pretty simple. I first get two arguments, which is where to find the extracted files, and where to write the result. I then use listFilesRecursive to recursively find all extracted files, and forM to loop over them. For each file I read it in (trying first UTF8, then normal encoding, then giving up). For each line I look for ( as the first character, and form a list of [(operator-name, package)].

After producing the list, I use groupSort to produce [(operator-name, [package])] then writeFileUTF8 to produce the output. Running the script takes just over a minute on my ancient computer.

Writing the code

Writing the code to produce the operator list took about 15 minutes, and I made some notes as I was going.

I started by loading up ghcid for the file with the command line ghcid -t -c "ghci Operators.hs". Now every save immediately resulted in a list of warnings/errors, and I never bothered opening the file in ghci, I just compiled it to test.
I started by inserting take 20 files so I could debug the script faster and could manually check the output was plausible.
At first I wrote takeBaseName src rather than takeBaseName file. That produced a lot of totally incorrect output, woops.
At first I used readFile to suck in the data and putStr to print it to the console. That corrupted Unicode operators, so I switched to readFileUTF8' and writeFileUTF8.
After switching to writeFileUTF8 I found a rather serious bug in the extra library, which I fixed and added tests for, then made a new release.
After trying to search through the results, I added ( and ) around each operator to make it easier to search for the operators I cared about.

User Exercise

To calculate the stats of most exported operator and package with most operators I wrote two lines of code - how would you write such code? Hint: both my lines involved maximumBy.

Tuesday, November 11, 2014

Upper bounds or not?

Summary: I thought through the issue of upper bounds on Haskell package dependencies, and it turns out I don't agree with anyone :-)

There is currently a debate about whether Haskell packages should have upper bounds for their dependencies or not. Concretely, given mypackage and dependency-1.0.2, should I write dependency >= 1 (no upper bounds) or dependency >= 1 && < 1.1 (PVP/Package versioning policy upper bounds). I came to the conclusion that the bounds should be dependency >= 1, but that Hackage should automatically add an upper bound of dependency <= 1.0.2.

Rock vs Hard Place

The reason the debate has continued so long is because both choices are unpleasant:

Don't add upper bounds, and have packages break for your users because they are no longer compatible.
Add PVP upper bounds, and have reasonable install plans rejected and users needlessly downgraded to old versions of packages. If one package requires a minimum version of above n, and another requires a maximum below n, they can't be combined. The PVP allows adding new functions, so even if all your dependencies follow the PVP, the code might still fail to compile.

I believe there are two relevant relevant factors in choosing which scheme to follow.

Factor 1: How long will it take to update the .cabal file

Let us assume that the .cabal file can be updated in minutes. If there are excessively restrictive bounds for a few minutes it doesn't matter - the code will be out of date, but only by a few minutes, and other packages requiring the latest version are unlikely.

As the .cabal file takes longer to update, the problems with restrictive bounds become worse. For abandoned projects, the restrictive upper bounds make them unusable. For actively maintained projects with many dependencies, bounds bumps can be required weekly, and a two week vacation can break actively maintained code.

Factor 2: How likely is the dependency upgrade to break

If upgrading a dependency breaks the package, then upper bounds are a good idea. In general it is impossible to predict whether a dependency upgrade will break a package or not, but my experience is that most packages usually work fine. For some projects, there are stated compatibility ranges, e.g. Snap declares that any API will be supported for two 0.1 releases. For other projects, some dependencies are so tightly-coupled that every 0.1 increment will almost certainly fail to compile, e.g. the HLint dependency on Haskell-src-exts.

The fact that these two variable factors are used to arrive at a binary decision is likely the reason the Haskell community has yet to reach a conclusion.

My Answer

My current preference is to normally omit upper bounds. I do that because:

For projects I use heavily, e.g. haskell-src-exts, I have fairly regular communication with the maintainers, so am not surprised by releases.
For most projects I depend on only a fraction of the API, e.g. wai, and most changes are irrelevant to me.
Michael Snoyman and the excellent Stackage alert me to broken upgrades quickly, so I can respond when things go wrong.
I maintain quite a few projects, and the administrative overhead of uploading new versions, testing, waiting for continuous-integration results etc would cut down on real coding time. (While the Hackage facility to edit the metadata would be quicker, I think that tweaking fundamentals of my package, but skipping the revision control and continuous integration, seems misguided.)
The PVP is a heuristic, but usually the upper bound is too tight, and occasionally the upper bound is too loose. Relying on the PVP to provide bounds is no silver bullet.

On the negative side, occasionally my packages no longer compile for my users (very rarely, and for short periods of time, but it has happened). Of course, I don't like that at all, so do include upper bounds for things like haskell-src-exts.

The Right Answer

I want my packages to use versions of dependencies such that:

All the features I require are present.
There are no future changes that stop my code from compiling or passing its test suite.

I can achieve the first objective by specifying a lower bound, which I do. There is no way to predict the future, so no way I can restrict the upper bound perfectly in advance. The right answer must involve:

On every dependency upgrade, Hackage (or some agent of Hackage) must try to compile and test my package. Many Haskell packages are already tested using Travis CI, so reusing those tests seems a good way to gauge success.
If the compile and tests pass, then the bounds can be increased to the version just tested.
If the compile or tests fail, then the bounds must be tightened to exclude the new version, and the author needs to be informed.

With this infrastructure, the time a dependency is too tight is small, and the chance of breakage is unknown, meaning that Hackage packages should have exact upper bounds - much tighter than PVP upper bounds.

Caveats: I am unsure whether such regularly changing metadata should be incorporated into the .cabal file or not. I realise the above setup requires quite a lot of Hackage infrastructure, but will buy anyone who sorts it out some beer.

Sunday, November 02, 2014

November talks

Summary: I'll be talking at CodeMesh 2014 on 5th November and FP Days on 20th November.

I am giving two talks in London this month:

CodeMesh 2014 - Gluing Things Together with Haskell, 5th Nov

I'll be talking about how you can use Haskell as the glue language in a project, instead of something like Bash. In particular, I'll cover Shake, NSIS and Bake. The abstract reads:

A large software project is more than just the code that goes into a release, in particular you need lots of glue code to put everything together - including build systems, test harnesses, installer generators etc. While the choice of language for the project is often a carefully considered decision, more often than not the glue code consists of shell scripts and Makefiles. But just as functional programming provides a better way to write the project, it also provides a better way to write the glue code. This talk covers some of the technologies and approaches we use at Standard Chartered to glue together the quant library. In particular, we'll focus on the build system where we replaced 10,000 lines of Makefiles with 1,000 lines of Haskell which builds the project twice as fast. We'll also look at how to test programs using Haskell, how to replace ancillary shell scripts with Haskell, and how to use Haskell to generate installers.

FP Days 2014 - Building stuff with Shake, 20th Nov

I'll be giving a tutorial on building stuff with Shake. It's going to be less sales pitch, more how you structure a build system, and how you use Shake effectively. The abstract reads:

Build systems are a key part of any large software project, relied upon by both developers and release processes. It's important that the build system is understandable, reliable and fast. This talk introduces the Shake build system which is intended to help meet those goals. Users of Shake write a Haskell program which makes heavy use of the Shake library, while still allowing the full power of Haskell to be used. The Shake library provides powerful dependency features along with useful extras (profiling, debugging, command line handling). This tutorial aims to help you learn how to think about writing build systems, and how to make those thoughts concrete in Shake.

Neil Mitchell's Blog (Haskell etc)