Sunday, November 22, 2015

Algorithms, data structures, accidental and essential complexity, inheritance, composition and functional programming

Today we're going to go through the tools a programmer has in order to see how and when we can use them to solve programming problems.

Let's start with some of the good old slogans that we have been fed for years:

Every loop can be replaced with functional programming
Every decision can be replaced with inheritance
Every inheritance can be replaced with composition

Does it mean that all our inheritance structures should be 2 level (Object -> MyClass) and that inheritance for itself is bad? Does it mean that if we use a switch/case or an if we're committing a crime against purity? Are those just relics of an era that's come an gone? What do you think?

I strongly believe if there is a tool (even like the goto instruction) and even if the fashion for it has been mentally superseded with a new construct (or even an older one!) then it is still valid to use it in certain contexts. For example goto in the context of text parsers is still very much valid and in use for years to come.

So how about loops and conditions? When is the right place to use them?

First things first - inheritance

Let's tackle the inheritance first because it is the simplest one. You inherited genes from your parents because you're a man. If you'd be a dog you'd not inherit a thing from your human owners as they are not your parents (although your mrs. loves you very much!). An employee is most probably a human being (although there are examples where that would not be true) and a car is a vehicle (when it works otherwise it's a problem).

To put it bluntly: whenever what you inherit is a thing you inherit from then you're doing the right thing. When you're inheriting because you'd like to have that same functionality but extended or twister then you're committing a felony and you should burn in hell.


Let's take a look at the types of complexity we're working with on a daily basis. As we've been already told there is essential complexity (driven by the complexity of the domain we're working with) and accidental complexity (driven by the technical details of framework/language you're working in). A good example everyone can relate to is sorting so let's take a look at 2 quicksort implementations:

It is pretty straightforward, right? At first we're slicing the input in half, figuring out the elements that are lesser than the middle point and swapping them out with those that are to the right. Basically we're grouping elements smaller, equal and greater than the pivot and sorting those groups further until there is anything to sort. Efficiency aside the algorithm presented here is pretty clear. Can it get any clearer?

At first when I described the algorithm to you in the above paragraph I told you not how we're grouping things but that we do group them. I even gave you the recipe on assigning elements their destination group. This in turn describes the essential complexity of the algorithm. Can we remove all the accidental complexity? Is that even possible? Let's try it out with Groovy:

Try to read the above code and explain to yourself what it does and how it does it. Immediately you'll notice that it lacks something. I mean it must lack something, right? The Java implementation is 56 lines long and this is just 6 so there surely is a huge gap in those two implementations. But is there really a gap? First we define the exit criteria (list size less than 2), then we figure out the middle point, then we group the elements by their comparative ratio (if there are no elements then return an empty list), then we take all the smaller, sort them, add the equal to the pivot and then sort the bigger ones and add them to. I mean it takes more words to describe what is taking place than really reading the algorithm!

By the way... Take note of the if statement. It's there because the algorithm dictates it.

I think we should go over one more example, maybe less algorithmic but definitely very useful: running external applications from Java.

It is again, very straight forward: get the hold of the runtime, execute the command, read line-by-line the output from a buffered reader. Done. Can it be any simpler?

Wooow! Now that is really the salt itself: just execute the command I gave you and print out the text that came out of it. It's almost like writing a bash script but with the indescribable potential you get when using a full-fledged, mainstream JVM-based programming language.

Personal preference aside I think those two examples demonstrate pretty clearly that software can be created much faster and in a readable fashion. Obviously I'm far from suggesting you write your own quicksort implementation. Those types of lego pieces are already in place and you don't really need to worry about them. But you do need to worry about your algorithms.

Algorithms and data

Remember when you have been introduced to the SOLID principles? My God is that a bizarre harness for a developer when he hears about this the first time! Hands get soaking wet, head shakes in disbelief, and immediately questions like "why" and "how" start to raise but what it really is are means to implement extendable, easy to understand programs.

Single responsibility principle tells you that a piece of code shall only have one reason to be changed. That reason would be for example change in the algorithm but a you probably already know those change very infrequently.

Open/Closed principle tells you to write your software in such way that you can extend it without recompiling.

Liskov substitution principle tells you that no matter which implementation you're going to use the outcome of the algorithm shall stay the same and there should be no difference in the outcome when that algorithm ends. That don't mean it there can only be a single implementation because the sole purpose of doing a different implementation is to inflict change. But the fact is that if your algorithm is generic enough it will always work the same and the changing parts will be properly externalized. A fantastic example here is the template method pattern and composition over inheritance is JDBCTemplate in Spring JDBC. It allows you to tell "in place" what the behavior you'd like to inflict on the result of your query is and takes aside all that nagging boiler-plate away. No matter what you do the algorithm there will always do the same which is iterate over the ResultSet and call your method when appropriate.

Interface segregation principle tells you to only expose as much of functionality as the user of your code would. So for example the JDBCTemplate could theoretically give you the option to close the ResultSet but it doesn't do it because you'll never need it (and if you do you're doing something wrong)

Dependency injection principle modeled after one of the the Hollywood's most famous quotes, Don't call us - we'll call you, is also very well observed in the JDBCTemplate where the algorithm in that implementation takes complete control over the flow of your program and calls you back when your intervention is required.

In addition to this Niklaus Wirth came out with this book of his, Algorithms + Data Structures = Programs. So what's that all about? Has it not been superseded by SOLID principles?

The thruth is that the SOLID principles back up that book's messgage which is to take a point in dividing the essential and the accidental complexity. To make sure you don't include data in your algorithms. What do I mean by this?

Suppose you're writing a very simple program to tell the driver that the current environmental conditions are somehow dangerous. My car has that kind of feature and when it is too cold (below 4°C a subtle alarm will go off and the temperature display will blink a few times. What my car lacks however is that if it is too hot (and that'd be for me > 30°C) then I should be notified by this as well. Otherwise I could fall into the same danger as a boiled frog. I suspect that the program they have has some fixed boundaries, maybe something like this:

What is wrong with this type of programming is not that the temperature is not extracted to some constant or even taken from some function that would combine it with different sensor readings. The bad part of it is that if we would extend it we definitely need to change it and that is not kosher.

That's a whole different story! You can even read the settings from a database or mock the input in test! Cool, ain't it?


Every time you write a program think if what you are doing comes from the essential complexity of the domain you're working with or if it is just because the tools you have are not powerful enough to allow you to express it more elegantly. If you find yourself in the latter position extend the tools (use meta-programming, use reflections, use monkey-patching, use whatever you can) to make sure that if you read the algorithm a year from now it will still read nicely and clearly.

Happy coding!

No comments: