Friday, January 25, 2013

Using Compiler Bugs for Fun and Profit: Introducing Cartesian Closed Constraints

Anyway, implicit parameters are not idiomatic Haskell these days, and unlikely to make it into a standard, and once you start to involve unsafeCoerce then you're clearly off the reservation.
You can't have an implicit parameter in the context of a class or instance declaration. For example, both these declarations are illegal: class (?x::Int) => C a where ... instance (?x::a) => Foo [a] where ... Reason: exactly which implicit parameter you pick up depends on exactly where you invoke a function. But the ``invocation'' of instance declarations is done behind the scenes by the compiler, so it's hard to figure out exactly where it is done. Easiest thing is to outlaw the offending types.
--The Glorious Glasgow Haskell Compilation System User's Guide, Version 7.4.2

Modern Haskell allows you to pass typeclass instances between functions by wrapping them with a GADT. But this is very limited. Wouldn't it be nice if there were a simple way to produce a typeclass instance from a record? This would only be useful if we could have multiple instances of the same typeclass/type combination, and everyone knows that is not permitted. Right? Almost a year ago I started this blog with a post on how to generate "First Class Instances" but the techniques were hackish and used unsafeCoerce in an unsafe way (okay, only as unsafe as GeneralizedNewtypeDeriving).

What would you say if I showed you a way to build arbitrary instances of typeclasses from dictionaries locally? A way that didn't use unsafeCoerce? A way that except for violating the "single instance" assumption, appears to be totally safe?

First thing is first. Haskell posts need to begin with the requisite language pragmas.

yes, you got that right, today we will be using SafeHaskell.

For an example, we will define differing ordering relations on a type. The techniques presented in this post will not allow one to reliably "shadow" an instance declaration in scope, so we define a newtype to work with

Since we plan to be working with ordering relations, lets define a record variant of the Ord type class. Actually, a single newtype will do.

Now here is the trick. The GHC docs say we can't use ImplicitParams in an instance declaration. Turns out the docs are wrong.

Now we can test it. Sometimes I want to compare strings by looking at length. Other times I want to alphabetize them.

I have tried this code in several versions of GHC. Just now I tried it on GHC 7.4.2 and 7.6.1. In each case I ran it both in GHCi and compiled at varios optimization levels. It always compilers with out error and produces the expected result, namely "True False."

This isn't scary

I noticed this behavior a long time ago, and other than showing it to some people at the Oregon Programming Language Summer School, I have kept pretty quite about it. The truth is I'm terrified that people will want to fix it. Don't. The compiler bug here is doing the right thing.

The common assumption of the "single global instance" simply does not hold anyways. At least, it doesn't hold in a way that can be used to ensure the kind of invariants that things like Data.Set and Data.Map assume. And, this should be obvious. Imagine a module (call it M1) that exports a type. Then two other modules (M2 and M3) each import that type, and declare the same (orphan) instance for that type. A forth module (M4) imports both instances. Obviously, M4 can't use either instance directly. But, this restriction does not prevent violating module invariants. If M2 exposes a Set or our type, and M3 exposes a function to modify Sets of our type, composing these two functions will have unpredictable module boundary breaking behavior: even though all instances obeyed the typeclass laws.

Try it. It is not that hard to break Data.Set and similar modules. Of course, using things like GADTs we can push the invariant breaking much further. The simple fact is that modules that rely on the single instance property for safety are fundamentally broken.

A Categorical Use Case

I have made several bold claims in this post. Here is another one. The ImplicitParams trick is useful and should be made official behavior.

Edward Kmett has written some pretty cool category theory packages, and has a very nice package for using GHC's support for ConstraintKinds. These packages don't quite work for what I want to show here, but they come close so I will use similar ideas.

After enabling some extensions (we are no longer in SafeHaskell land as I wrote this up quickly) and hiding a bunch of the Prelude, we can define some useful typeclasses for working with categories. CCCs show up a lot, as they provide a categorical version of the simply typed lambda calculus. Thus, a good example of a CCC is the category Hask. We could even write a program that took Haskell like lambda calculus syntax (perhaps generated with a TemplateHaskell or a finally tagless interface) and produced a Haskell term polymorphic in the CCC.

Put aside the category stuff, and think about the algebra of Constraints in Haskell. With current versions of GHC we can develop a rich set of tools for talking about constraints that last type class encodes as a constraint the judgment that one constraint implies another. That is, it internalizes the entailment relation.

We would actually like some instances of this relation, and that is where the trick in this post becomes useful. some more stuff that is not very interesting and we can begin to define some other standard connectives. For example, Haskell allows you to combine Constraints using tuple notation, but this isn't really considered a constructor in GHC. So, we define our own "product" constraint

What is the point of all this? Well, Edward Kmett noted that the judgment form of type class entailment (that is, the type version) can form a category using the pairs above though, we can make this into a Symmetric Monoidal Category and for that matter a CCC

Is this useful? Possibly. But it is super cool regardless. I strongly think that this should be made the official behavior and kept in the language. Having tons of categorical structure show up around one of your languages core features (in this case, typeclasses) is usually going to be a good thing. Let's let people see if we can make this useful.

The GHC docs give the justification that the behavior of what instance would be picked would be unclear. I don't think this is true. We already have both local evidence and implicit parameters which present exactly the same problems. It is time to document this behavior, and start using it for good.