Functional D: Is Transitive Const Fundamental?
As I’ve mentioned before, a pure functional subset is forming in the D Programming Language. According to the creators of D, transitive const is a key feature to make this work.
The future of programming will be multicore, multithreaded. Languages that make it easy to program them will supplant languages that don’t. Transitive const is key to bringing D into this paradigm. […] C++ cannot be retrofitted to supporting multiprogramming in a manner that makes it accessible. D isn’t there yet, but it will be, and transitive const will be absolutely fundamental to making it work.
What is transitive const?
Just a quick explanation for those of us who doesn’t have academic terms in close memory. Transitivity is a property of some binary relations, for example equality:
if A = B and B = C, then A = C
Applied to the concept of const it means, simply put, that anything reachable from a const type is also a const. So, for a declaration const int **p, p is const, as well as *p and **p.
The same is true in the case of composite types:
class A { int f; void set_f(int a_value) { f = a_value; } } const A a = new A(); a = new A(); // error a.a = 2; // error a.set_a(2); // error
All three reassignments above result in compiler errors due to the fact that a is const, and anything reachable from it is also const.
Why does it matter?
So, in what way is transitive const fundamental to concurrent programming? Well, it isn’t. What Walter Bright and his companions refer to is the fact that pure functional programming is thread-safe by design. That is, in a pure functional language the result of a function is solely dependent on its arguments. Thus, in a code like this:
val = some_func( a(), b(), c() );
functions a(), b() and c() can be safely executed concurrently in a multi-core architecture; Nothing a(), b() or c() does can affect each others results. This is not the case for imperative languages that builds on the notion of mutable and global state. With mutable state comes hidden side-effects (a(), b() or c() could change common data and cause raise conditions) that complicates multi-thread programming so much.
What the people behind D tries to do is to create a pure functional subset within the language. I like to refer to it as Functional D. Such a subset would allow us to write code that is thread-safe by default, all you have to do is to write Functional D code. The compiler would then be able to chisel out the functional code and fully utilize the advantages of functional programming.
Immutable data and pure functions are fundamental
To make this work we need a way to make data immutable and a way to shut down access to the global state. In D you use the invariant keyword to create immutable data. The pure keyword is used to mark functions that may only take invariant arguments, no access to the global state, and that may only invoke other pure functions. (As of this writing, the semantics of the pure keyword is not yet implemented).
int g = 0; pure int pure_func(invariant int a) { a = 0; // error, a is invariant g = 1; // error, can't write to global g writefln(a); // error, writefln is not pure return g + a; // error, can't read global g }
How does transitive const fit into all of this? To use the intuitive definitions from Andrei Alexandrescu’s slides on the functional subset of D:
- const(T) x: I can’t modify x or anything reachable from it
- invariant(T) x: Nobody can modify x or anything reachable from it
Const is not strong enough to be used in the functional subset (which depends on truly immutable data), but it has one application that could be important. From Walter Bright at the D newsgroup:
Const allows you to reuse code that works on invariants to work with
mutables, too.
How usable is const to the functional subset?
Const can be used to write code that works with data from both the imperative (mutable) and the functional (immutable) subsets. For example, the print function below.
void print(const int a) { writefln(a); } const int a = 1; print(a); // ok invariant int b = 2; print(b); // ok print(3); // ok
The reason this works is that invariants and immutable data is implicitly converted to const when necessary. One can question how useful this would be in practice though, the print function above would not be invokable from the functional subset (which would require it to be pure).
My conclusion is that although it may very well be important, transitive const is not “absolutely fundamental to making it work.” Transitive invariant, on the other hand, is.
Cheers!
Calling print from a pure function doesn’t seem like a great idea anyway… the bigger problem with what you’ve described seems to me to be that would not be able to pass a mutable value to a pure function. So you’d need two versions of all of your pure functions.
Why should pure imply that all parameters are invariant? It looks like you’d want pure to mean roughly “only has access to the world through the parameters”. That way you can write pure functions with const parameters and apply them to invariant or mutable data.
Thank you for your comment mort. I agree that calling print from a pure function is not a good idea. I used that merely as an example since it’s the type that Andrei used in his slides.
Const arguments is not sufficient for pure functions. You can’t let mutable state leak through the pure function interface since then all thread-safe benefits of pure functional programming are lost.
Consider this:
If a() is invoked in one thread, and reset_obj() in another chances are it will work fine most of the time. But on rare occasions g_obj will be assigned to null between the (o != null) check and the o.do_something() call. On those rare occasions you will have an access violation. So a is not thread-safe if you allow const arguments. Invariant on the other hand ensures g_obj will never change (g_obj will have to be constructed as an invariant).
Hi Hans-Eric,
Several points:
1) How would the situation you describe differ from the situation where you didn’t declare a() to be pure? If you need to call the function on mutable data, you need to be careful about thread safety, as always.
2) If o was actually invariant, however, you get the same thread safety with just the const declaration that you would with an invariant declaration. You can use a() in either situation, get the thread-safety-for-free in the invariant case, and not have to write a() a second time to use it in the mutable case.
3) There’s nothing stopping you from declaring a’s parameters invariant if you know you’ll never need to pass in mutable ones. This would probably enable some compiler optimizations since you don’t have to worry about instruction reordering in a threaded environment.
4) You didn’t address my concern about having to write your pure fragment twice. That looks to be a big problem to me. How would you address that?
Thanks,
Mort
Hi again, excellent points. I’ll try to address them.
1) It wouldn’t differ a bit, and that’s my point. Pure functions has the wonderful property of being thread-safe, without you needing to be “careful about thread safety”. If you allow const arguments to pure functions then they are no longer pure, and you lose the thread-safety.
2) This is absolutely true. The problem is that since a then can be used in both situations the compiler can not assume it’s free of side-effects and can not make optimizations (like automatic multi-threading) based on the thread-safety property.
3) True, but the compiler will have a hard time figuring out whether a is thread-safe or not unless you help it (with the pure keyword that enforce the pure function rules). It’s not enough to look for invariant arguments, all contained function calls and access of global state must also be considered, deeply. Furthermore, the programmer gets no help from the compiler to make the code thread-safe, which is one of the big advantages of pure functional languages.
4) This is definitely a problem for complex types. I guess there has to be some way to implicitly convert mutable types to immutable ones, but I see several problems with this. I will dig into this subject in a coming post so please stay tuned.
Hi. I think I didn’t get one idea across. I’m suggesting we keep the ‘pure’ keyword, but all it does is forbids access to global state and calls to non-pure functions. So I don’t think your response to 3) applies to my proposal. Figuring out that a function is pure (in your sense) is trivial: it’s marked pure *and* it’s parameters are marked invariant.
Thus pure in my sense subsumes pure in yours, and is orthogonal to invariance of parameters (rather than implying invariance).
Regarding the optimizations that can be applied, I would observe that nothing is stopping the compiler from specializing every pure function to a faster version that’s pure in your sense if all arguments are known to be invariant.
I look forward to reading more about this topic in the future – I’ve bookmarked your page :). BTW, is there a source you can link to for additional information on the semantics of ‘pure’ in D? Thanks!
Ok, I didn’t get that (I’m a bit slow at times 🙂 ). But if one does what you suggests (pure keyword that only forbids global state access and calls to non-pure functions) there is nothing that stops a programmer from passing references to mutable data. In that case mutability “leaks” through to the pure function. Thus we end up with a function marked as pure, that can not be safely considered pure by the compiler. (Am I making any sense? Have I still not gotten your idea? Sorry if that’s the case, It’s rather late here now and my brain feels like jelly)
There’s not much documentation to be found on the subject, but here’s a couple of links you could check out:
http://www.digitalmars.com/d/2.0/function.html (Pure Functions)
http://www.digitalmars.com/d/2.0/const3.html (Const and Invariant)
And of course the links I gave in the article.
Best regards
If you have a function marked pure (in my sense) whose parameters are also marked invariant (or even just const), how can mutability leak in? Obviously you can create *local* mutable state (on the stack), but according to Andrei Alexandrescu’s slides, you’ll be able to do that even if a function is marked pure in your sense. But I don’t see how you can get past the parameters being invariant to do anything undesirable. Can you give a code example?
If the parameters are marked as invariant, mutability can not leak, but – and that was my point – if they are merely marked as const there is a possibility that someone else changes the data while the pure function is executing (in a different thread). Thus, pureness requires invariant arguments.
You have a (contrived) code example that highlights this in my first reply.
If a pure function’s parameters are marked as const, then they can change. Agreed, that’s the point – it allows you to operate on mutable data. If you want to require invariance of the data, you use ‘invariant’ to specify that. Trying to come up with a scheme for specifying things as ‘temporarily immutable’ looks like a can of worms to me.
The other benefit of allowing pure non-invariant functions is that you can write pure helper functions that do mutate their arguments. With only pure invariant functions, if you do local mutations on the stack in a pure (invariant) function, there is no way to factor those mutations out into functions. So you may find yourself cutting/pasting code. That seems like a pretty bad situation.
Now I (finally) see where you’re driving at. You want to allow for what Bruno Madeiros (in the news thread you provided) call “partially pure” functions along with pure (the compiler would have to figure out which is which, which is trivial).
Interesting concept.
The only disadvantage I can think of, and it’s a minor one, is that programmers might think they’re writing thread-safe code when they’re not. Partially pure functions are only thread-safe when invoked from within a truly pure function; at top-level on the other hand, they’re not.
I think it’s actually pretty intuitive if you define pure to mean ‘a function completely determined by the value of its parameters’, and think of the parameters as defining deep values. Then if those values are being mutated concurrently, it makes sense that you lose determinism.
Actually I think the current proposal for ‘pure’ is pretty broken for the reasons mentioned above: pure functions aren’t usable on mutable values and mutable functions can’t be used on the local mutable values inside pure code. That thread I linked to is many months old – I wonder if they’ve addressed this…
I am bound to agree with you, now that I finally understood 🙂
Ok, I just found a thread on the D forums discussing this very idea. You can find it here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=70762
I didn’t see any compelling arguments against it in skimming the thread. One thing I think you’d need is to generalize the rules if you wanted to allow nested pure functions: I think they shouldn’t be able to access any non-invariant data from their environment (including globals and variables in scope).
Is it possible to contact administration?
Hih you hear me??