A Critique of Impure Reason April 27, 2009 by Rúnar This post has been moved to http://blog.higher-order.com/blog/2009/04/27/a-critique-of-impure-reason/ Share this:TwitterFacebookRedditLike this:Like Loading... Related
This is why you have bugs, why programs are hard to debug, and software is difficult to extend. It’s because programs are difficult to comprehend to the extent that they are impure
Please read “No silver bullet”. Some ‘essential complexity’ will remain in the program, even when using pure functional programming.
Nice ideas on the difference between identity and equality though, thanks!
But the argument is that by using a purely functional paradigm, you can reduce the accidental complexity, and expose only the essential complexity of the problem at hand.
OO principles also allow programmers to selectively expose complexity, hopefully reducing complexity to a commonsense model of what is going on. (Essential complexity?) However, this amounts to volunteering to do “the right thing,” as even “pure OO” environments like Smalltalk allow one to violate encapsulation and write procedural code. I can see how a “pure functional” environment would make it difficult to write procedural code. But one can also create spaghetti OO models that respect encapsulation. I imagine that one can create spaghetti functional code as well.
I think “No Silver Bullet” is claiming the difficulty we’re experiencing is inherent complexity, but does not actually justify this claim. I firmly believe most programmers deal almost exclusively with “accidental complexity” and that there are indeed a silver bullet to be found.
Note that the difference between the two can only be explained in terms of the underlying machine. …
The difference between identity and equality can also be explained as:
X == Y means that if you change some attribute of the object X, the change will be immediately visible in the object Y as well.
X == Y means that if you change some attribute of the object X
I don’t think you can do that in a pure language :)
I think that’s rather the point.
In my opinion, while there is a time and a place for impurity–say, querying the user for input–it shouldn’t have to make everything done with the value objects ugly.
Still, you’re right, you can’t have side-effects in a pure language…
He he he…
You can have effects, but not side-effects.
There are no side-effects here.
Once you graduate and start working in the commercial world, you’ll see that things are not quite so clear cut. As Nick C suggests, this is no “Silver Bullet”.
Even such a basic principle of modularity or componentisation of software is not fully applied a lot of the time, even by very smart programmers. This sounds nonsensical, but dividing software into reusable, standalone parts that can be understood and tested in isolation is often missed under the pressure of deadlines and the mad rush to Get Things Done.
This doesn’t mean that aspiration towards writing clear and simple concise programs isn’t good – stateless/immutable patterns are as useful today as they were forty years ago. It just means unless you’re working on your own, be prepared for compromise and shades of grey.
I know that pragmatism is often applied under deadlines, and principles go out the window. I know all too well about shades of grey. But the theory that writing poor quality code saves time, and that dispensing with principles is practical, is false.
Here’s my thinking of the moment: Impure coding saves time today, maybe not tomorrow.
If the project is canceled next week, then it did save time today – to NOT write the most reusable, most understandable, most error-free, most readable code, we could come up with. But can we afford to do that, most of the time?
Now, it may be a good argument that the industry as a whole should move towards “side-effects-free” programming, for many good reasons. It’s a different question whether an individual programmer should do so in the actual coding tasks of the day.
We sometimes write code as an exercise as to how to write the best possible (maintainable, reusable, fast, …) code ever. More often we write code just to get something done. This may be the difference between “scripting” and “programming-at-large”. But the fact is that most projects are not “programming-at-large” – BECAUSE their future is uncertain! (Doors!).
The fact often unstated is that writing good-quality-code costs more than writing low-quality code. Sometimes low-quality-code is all we need. Naturally we would all want to learn better techniques of writing high-quality code more cheaply. But even then learning itself is not free. I would call this the “PROGRAMMER’S DILEMMA”.
Cheers for the good, mind-inspiring article
Systems I have worked on over the last 20 years that have applied the principles apocalisp is talking about were delivered into production on time, with happy customers and incredibly low bug counts.
When so-called pragmatism rides roughshod over common sense and good principles, the results are always bad: late delivery and a system limping into production.
An excellent post as always Apocalisp.
Great post, thanks. But I was expecting
\y -> (return (x + 1))to be
\y -> (return (x + y)). Am I missing something?
Some processes are simpler to express with Object Oriented techniques, and mutable state than in a pure way. These simple programs will be quicker to write, have less bugs, and be easier to maintain.
Strong connections to a simple physical machine make it easier to reason about correctness than than more abstract descriptions of behavior. A close relationship to the behavior of physical machine also makes time and space complexity easier to reason about. That is why you explain references (an abstractly defined concept in OO) in terms of the physical concept of a location in memory.
Some problems are best thought about as pushing buttons and pulling levers. Am abstraction with a connection to a physical machine has an advantage, not a liability.
I enjoy the mental gymnastics pure programming even when it is probably not simpler, and I have nothing against fun. However I believe the hypothesis that some problems are better expressed with mutable state.
Some processes are simpler to express with Object Oriented techniques, and mutable state than in a pure way.
Easily proven by an example — got one?
Warning/Note: Haskell is a better imperative language than Java (et. al.).
Your move :)
I think “side-effect-free-programming” would be a less emotionally charged term to use in this context.
Is the mutable state simulated in a “functional” language really so much “purer” than the concept of “variables”?
And what about those guys who write in Assembler or Ada, are they the mutant sinners, of impure thoughts, scribbled on their late-night punch-cards?
I do understand the metaphor, but perhaps it has too much connotations. Just like extreme ways in general (Moby!)
For clarity, please note that Haskell is not an imperative language at all. Rather it’s a purely functional language that (among many other wonderful things) can describe (generate) imperative programs encoded in a data type (embedded language) called “IO”. In this regard, Haskell plays the same role as the C macro/preprocessor language (as interpreted by “cpp”), but in a more sophisticated way. See The C language is purely functional.
Denotational semantics translates imperative programs (for instance) into pure ones, so the the functional/imperative distinction may be a matter of shiftable perspective. Instead I like to compare the complexity of precise denotational models. The model underlying purely functional languages is simpler than impurely functional ones (C, Java, ML, etc) — even when the impurely functional languages are purely sequential. If you add concurrency, the complexity is exponentially more complicated (using “exponentially” in a literal sense).
The only solution for the Pure problems I see is to avoid having problems at all. Programs? Let’s form a committee of experts and solve the problems before implementing them sounds really Pure.
Equal Rights for Functional Objects.
Pingback: Top Posts « WordPress.com
As always, a really good and well written article, thanks for it !
Nonetheless, I thing that there is a missing point: performances. I believe that they played a great role in the battle of purity against mutable states, that historically it was simpler to achieve great performances with programs that behaved as near as the underlying machine as possible.
Perhaps that this is coupled with the cultural problem you were talking about : a long time ago, the big common programs needs to talk to the machine in an efficient way (OSs, Databases, such things), but as these programs were the most common, they were taken as example, and so they became well known and well taught, and so the language in which they were build was an entry gate to software development, and so on.
All in all, it’s good to know that their is more and more relevant resources and courses about pure programs, and so that more and more developers may become accustomed to that.
Of course, it can be argued that this a clear use of early optimisation trap, that macro-optimisation can be raised latter thank to immutable data structures, etc. But well, it seems that that wasn’t the path chosen.
If the problem were cultural at first, now it’s worse: we are caught in a feedback loop: Fortran/C-like languages were the most popular and efficient (given how they were used and implemented). Processor vendors took notice and made processors wich sped up these languages, wich biased them in the process. Of course, compiler writers did the same.
Now, imagine that instead of Fortran/Pascal/C, we had forth. I bet processors would have been stacked based instead of register based. The idea of local variable may even be alien to Joe Programmer.
If some day a bright entity manage to pull an entire stack (from processor to high level language) out, we may get out of this loop. The gradual shifting of priorities may also get us out. The pervasiveness of garbage collection alone gives me hope. Until then, we will still have to implement “functionnal languages on stock hardware”.
No act of imagination is necessary: stack-based Forth-oriented processors exist from the ’80s, but they are based on usual Von Neumann architecture anyway.
The alternative dataflow architecture, which is an ideal mapping of pure functional programming languages to hardware, had, unfortunately, an intrinsic problem of inefficiency back when was actively researched. Quoting from Wikipedia: “.. Instructions and their data dependencies proved to be too fine-grained to be effectively distributed in a large network. That is, the time for the instructions and tagged results to travel through a large connection network was longer than the time to actually do the computations. ..”
The hardware landscape is greatly changed, I don’t know if this still hold today.
But my question is,can purity be the only criteria for evaluating software program quality?
I find it hard to take take this approach for software development because in my experience I find there are lots of instances where there are other far more important overriding factors than purity in designing software systems.
Considering that your posts on Java Actor Concurrency was the point where I really started taking interest in functional style of programming and I have recently managed to apply some basic principals of functional programming to solve some annoying problems I encountered at work,and thanks for that, I still find it hard to swallow that that’s the only wise way of programming.
Its entirely possible that I’m being a bit resistant to change in paradigm and new way of looking a things.
I have some use cases where Empirical style of programming seem to yield the best result for me. Perhaps that’s the result of my dependence on Java to get my job done and my lack of proper foundation of functional programming.
But here’s a use case.
I have a TCP server that accepts connections from clients that send data at at the interval ranging from every 10 seconds to 5 minutes in short blurbs.
Data send by clients has to be persisted and a web interface has to be provided so that communication is bi-directional.
As such I have to keep state of various aspects of those connections and at any given time there can be 500-2000 clients connected to the server.
The problem is, if I used immutable data-structures to hold the states of those connections,that means I’ll be recreating the entire data structure at unpredictable time-interval. could be every fraction of second or a minute at most and that will cost me a fortune in terms of CPU usage.
So, how do I solve this problem within Java’s OO programming domain with using immutable data structures?
Help much appreciated.
Because functional languages deal with creating “new” structures instead of modifying state, they certainly need to solve the problem you’re describing. And they do solve it by sharing structure. As the structures are immutable, creating a “new” list by appending some item means that you take the new item and append the old list as the tail. Thanks to immutability, noone can notice this “trick”.
See Clojure for implementation of this stuff under Java.
Heck, you can do this in regular old Java. Just write a list that is either empty or consists of an element and another list. Make both components final, and the empty list a static value on the class.
Purity is certainly no the only criteria for a good program. But if some portion of a program is poorly written, it’s much easier to replace with a well-written one if it’s pure in the first place.
It’s funny that you mention actor concurrency, because actors are all about mutable state. However, this kind of concurrency can be understood as pure (in terms of the pi calculus). A language with good type system support for concurrency could enforce purity in concurrent programs. Remember how we wrapped all that actor state mutation in the Promise monad? Even Java can afford us a simple concurrency algebra that composes nicely.
Your TCP server sounds interesting. It would be hard to give direction via something like blog comments, but check your premise that you need to re-create the entire data structure. Immutable (a.k.a. persistent) data structures generally allow a large degree of data sharing.
If you’re wondering how such a thing can be implemented in a pure fashion, I recommend you look at HappStack for inspiration.
Thanks for the response, much appreaciated.
And thanks for the pointer to HappStack. I had a brief look at it, and I’ll look at it in more details when I get some time.I have to admit I find Haskell a bit overwhelming as of now :-)
Since reading your posts on ‘Threadless Concurrency With Actors’ series I had been trying to put some of the lessons learned into practice and I have managed it to certain extend. The best outcome so far is that out of the 10-100s of synchronized blocks in the last version of the TCP server , the new version is totally free of synchronized blocks. Which is an achievement, but it came at a price :-)
I have experimented with replacing the Server’s concurrency completely with FunctionalJava’s Actors. But I have some issues with it which I’ll detail if you’d like to hear about it.
I have also experimented replacing the the underlying ExecutorService based concurrency with Scala Actors and I had some surprisingly good results though I’m bit scared to put it to produciton yet. The reason being though I love the pattern matching in Scala, I have way too limited experience with Scala, since that was the only experimantation I ever had with Scala.
And I did experiment with replacing threadpool based concurrency in Jetty webserver with Scala actors in my attempt to uild my own webserver and I had some interesting results on that too.
I was wondering if you could test that out with Jetty since I have a real distaste for the Comet Evalgelists out there who do less work and make a lot of noise out there in the hope of claiming some fame in the so called Comet field and I have expressed my distaste about in at cometdaily.com about itm in no less words.
I would like to build a web server based on with Actors (Scala or FunctionalJava) rather than than having to rely on ExecutorService or threadpool.
Would be great if you could build a proof of concept for that so that I don’t have to do the hard work :-)
And sadly I’m the only developer :-)
The problem domain was understated in my previous post because the data rate is not quite as I stated there. At best its 2-5 minutes of short blurbs per connection and at worst its totally unpredicatable,could be even every second for some connections. Periodic data can be 2-5 minutes depending on the state of the device connected to server, but besides periodic data I have to account for data generated due to events that occur in those devices and events can occur at any time and at any frequency.
And I have to persist any data recieved at the server end for auditing and reporting an dfor realtime interaction- almost like a chat applicaiton but much more complicated- at least for me.
And I find it hard to use immutable data structures to serivce all my needs as I have to be wary of the state of the connected devices and have to publish events to listeners that can be web-applicaiton users or other automated monitoring system.
Since the communication is two way, device to server and server to device via web interface in real-time with not much room for latency.
My biggest question is how feasible is it to replace the thread-based concurency with Actors (FJ/Scala) ? ANd what kind of performance gain I can expect?
I have done some experiment in that regards, but I would like to hear your ideas on that.
Actor concurrency ultimately is Thread-based, since it all uses the same underlying threading model. The actor abstraction just helps you think about it more easily and keep things clear.
I suspect the problem that you’re up against with all that shared state in your app is that I/O has to be sequenced. If this were my application, I would probably be using something transactional such as a DBMS to sequence I/O. For simple things, the actor abstraction can help with concurrent I/O requests since they’re guaranteed to process messages in sequence (this is called QueueActor in FJ).
Hit up the FJ contributors on the Google Group, or on Freenode. We’d very much be interested in hearing about your issues with the FJ actor library. What kinds of problems did you run into?
This is actually a reply to “apocalisp” above, but there was no “reply” link for that post.
An alternative way to sequence IO, is to make each IO resource its own actor that accepts a queue of “IO programs”. An IO program is a sequence of IO actions with values. An IO program can be computed either by abstracting the the algorithm one is working with into an IO program, or by computing a “residual” program by recording the sequence of IO calls with their values.
As an example of the former, consider the Pop3Client.Connect method in my Sasa library, which takes a function accepting a Pop3Session. Pop3Session is only created within Pop3Client, and should not escape the scope of the Connect call. All resources are disposed of afterwards, so an escaping Pop3Session would be useless anyway.
Performing IO on more than one channel is doable. Such a program is parameterized by all objects that perform side-effects, so if you’re just reading all messages from an inbox and saving them to disk, your IO program would be an Action<Pop3Session, FileStream>. If you’re reading from the inbox, saving to disk, then forwarding the messages, your program would be Action<Pop3Session, FileStream, SmtpClient>. The astute among you will recognize monads at the heart of this.
I have production code which performs incrementally processes messages read from a Pop3Session, then forwarded via SmtpClient.
In the case of Pop3Client, the IO streams are opened and closed within the scope of Connect, but you can just as easily keep the resource open and have Connect simply queue IO programs, which are then processed sequentially from a separate thread. Witness the power of higher-order programming!
Thanks for the reply.
I’m very keen to hear a bit more on you idea of using DBMS to sequence I/O :-)
Sandro Magi hit it close with his post as I can draw a bit of parralel with what he described and what I’m working on, besides all the jargons :-)
I hadn’t had the chance to work on FJ Actor for a while now due to lack of time.
I’ll work on the problem I had with it and create a reproducible test-case hopefully in a week or two so that I can lay out the problem more coherently at FJ google Group.
And my apology because I realised that what I posted has gone completely off topic in regards to your post.
late night posts has its own drawbacks. The typos, grammar, and unintended errors.
“best its 2-5 minutes of short blurbs per connection and at worst its totally unpredicatable…”
please read that as at best its 10 seconds to 5 minutes interval of data….”
I think object-oriented principles gained popularity so quickly, because they are in a sense very human. If I look upon myself or in fact any physical object, it most certainly has a mutable inner state (apart from quarks (maybe)). So I can easily identify myself with that object to project how it works. Humans do this all the time. It’s the way our brain functions to make predictions about the future.
Also, for example my computer has tons of inner state. I don’t get a new one every time I press a key. How would I represent my computer in a pure programming world? Or – to look for a more practical example – how do I add another element to an immutable list that already holds more than half of my available memory?
What do you propose to do with relational databases? The whole point of these is to hold mutable state. I’d rather not imagine a SQL-Server without an UPDATE-Statement. Performance still matters!
To add an element to an immutable list that already holds more than half of your available memory, you create a new list that consists of your old list plus the new element. Since it’s immutable, the new list is not a copy of the old list. It is the same list, and stays put exactly where it is in memory. Your mind works this way too. Think of a very large number and call it N. Now think of N + 1. The original N and the appearance of N in the expression N + 1 are the same number.
It’s a very common misunderstanding that immutable data structures involve copying data on modification. In practise, however, immutability allows a large degree of memory sharing. There’s hardly ever a reason to keep more than one copy of an immutable structure.
Updates to a relational database can be understood as pure functions whose arguments and result values are databases. The sequencing of such updates is an instance of Kleisli composition.
I think this assertion must have a caveat applied. Yes, because data is immutable, you can share it in surprising (and sometimes strange :-)) ways, but as a practical matter, you have to be rather careful about it. You basically have to be aware of the languages particular implementation of the data structures in question.
For example, in some implementations of functional(ish) languages, your example *will* blow out your memory to preserve immutability of the left side of the list.
This doesn’t preclude tricks like just reversing the list in your algorithm, or various internal obscure bookkeeping tricks done by the language implementation to share slices. But to say “it’s all taken care of” seems a little too quick on the trigger.
Algorithms have certain inherent size profiles, to be sure, of which you need to be aware. For example, an iterative process is O(1) whereas a linearly recursive process will be O(n). If your function that appends an element to a list is linearly recursive, you will blow your memory for a list that’s more than half the memory size. You should take that as a hint that you’re using the wrong data structure (or the wrong append function), not that you need to introduce impurity.
interesting post. i’m a java programmer who’s trying to make this leap into the pure world you’re describing. but i’m wrestling with the fact that the world is full of side effects. this is what gilad bracha said in a discussion:
“[people] don’t know how to break it [problems] down into functions and their perception of it is stateful; it often is. i mean i’m looking at you right now, and i know it’s not really you any more, it’s a new you that just got reconstituted. but most people have this idea that it really is you. memory is important. so throwing out state makes life a lot simpler but very often creates a problem in modeling things. and if you really, really understand something you can usually throw away the state but it is often very hard. you need to have both, but you need to be close to functional as you can.”
any thoughts? purity is nice, but don’t we have to get our hands a little dirty to do tings?
Purity isn’t about getting rid of state. It’s about properly modeling the manipulation of state as argument passing or as results, rather than as destructive writes.
This has some very nice consequences that actually lead to pure handling of state being superior in many ways to destructive updates. The main benefit is that pure updates of state don’t have atomicity problems, as they are inherently transactional.
False premise #1: Effects imply side-effects.
While we certainly need effects, we don’t need side-effects. In pure-functional programming, an effect is modeled as a function that yields an effect given some argument. For example, a program in Haskell is a pure function that takes no arguments and yields a value of type
IO (). A value of this type is an “IO action”, e.g. instructions for the operating system. For example, if you’re programming a missile silo, you will never have a function that takes a String and yields a String, and also, on the side, launches the missiles. If you want to launch the missiles, you return a value, e.g. LaunchMissiles, which represents some machine-specific code that, when run on the silo’s hardware, launches the missiles.
False premise #2: Purity requires that we don’t have state.
Even if premise #1 were true, and while we need effects in order to model state transitions in the outside world (missiles not launched -> missiles launched), we don’t need them for transitions of the datastructures within the program. These are simply modeled as pure state-transition functions. Such a function takes a structure in some state as its argument and returns a structure in the next state.
False premise #3: The world being full of side-effects implies that we need them in software.
In programming, we do not manipulate real-world objects. The arguments and results of functions are not existents, but abstractions. While Bracha’s example is a cute analogy, it does not apply. In programming, we are reasoning about entities of a different kind from the objects in the world around us. Take for example a ball rolling down a hill. When constructing in our minds a concept of this ball, we take all the things that are essential to it and omit the things that aren’t. For example, its particular location in space at a given time is nonessential, so as it moves through space we’re able to retain the idea that it is the same ball even as it changes in front of our eyes, because the things that are changing are not part of our idea of this ball. Now take the example of a computer simulation of a ball rolling down a hill. This is an entirely different kind of thing. In a simulation, we have a transition function that progresses the simulation from one discrete step to the next. Given this function, and the state of the simulation at time Tn, we can find the state at time Tn+m. You can make your simulation as accurate as you like, but it’s still just a transition function, and it will never be an actual ball rolling down an actual hill. An essential difference to note is that the simulation program is completely omniscient about the system under simulation, whereas when reasoning about real-world objects, we must go through a constant process of discovery and integration of new information.
False Premise #4: The world is full of side-effects.
When we talk about side-effects in programming, we’re talking about something that happens in addition to a function resulting in a value. We say that a program that takes two integers and returns their sum, and also destroys a small country on the side, has a side-effect. When applied to the real world, this is absurd. Programs and functions are cognitive entities. Evaluating functions is a process of reasoning from premises to a conclusion, i.e. it’s a process of thought. When a computer does this for us, it’s to aid or augment our own thinking. Now, imagine a cognitive process in the real world that has a side-effect. There’s no such thing. For example, imagine that if you were in the field counting your cows, and counting them in your head would cause every fifth one to explode. That would be a side-effect, but this will never happen.
Now, this is not to say that cognition doesn’t have effects. It wouldn’t do us very much good to think if it didn’t have an effect. For example, counting your cows will have the effect of your knowing how many cows you have.
Thanks for this article. It really helped to understand purity a little more. But I am still not at a point, where I can use it in action. I think the “mathematical” stuff, the real computing is understandable, because functional programming is really like mathematics in this matter. So I have learned to apply this knowledge in math lectures. But how to do this stuff with something like a blog, where you handle database entrys, html and messages? How to do it in creating computer games? I can easily find an impure object that explains what a “post” or a “comment” is. I can easily write a function with side effects to the data base. It sounds impossible in a pure world. Could you write more from that perspective? I think that would help many engineers to get it.
I’m not going to give a very long-winded answer. When we are impure, we make calls out to external system APIs to do I/O. We wait for those APIs to complete the request and then we proceed on the assumption that the I/O has been carried out. When we are pure, we instead assume that our program is being called by the external system. We construct a value which is a sequence of I/O actions, and we return that value to the external world. So you don’t perform the I/O actions, but you construct a description of what you would do if you could.
Real-time systems can be modeled in a pure way just as well. There are examples of computer games written in Haskell (a pure language) and there are frameworks for doing this kind of thing. Most employ a technique called Functional Reactive Programming (FRP). In such a framework, events and reactions are functions of time, and the system is executed by feeding it discrete time values (at, say, the screen refresh rate).
Hm. Sorry, it was not meant as request for now. It is just the direction I would like to see in later articles, you might write for people like me. Just see it as suggestions.