2010 February 09
The world is parallel. If we want to write programs that behave as other objects behave in the real world, then these programs will have a concurrent structure. Use a language that was designed for writing concurrent applications, and development becomes a lot easier. Erlang programs model how we think and interact.— Joe Armstrong
Erlang makes it easy to model the real world. An Erlang program consists of concurrent objects that communicate by message-passing, just like people and things co-exist in the real world. Erlang programs are often concise and clean as the distance between the way you think about an algorithm and how it's expressed in code is very short.
Erlang was designed by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. The first version was developed by Joe Armstrong in 1986. The sequential subset of Erlang is a functional language, with strict evaluation, single assignment, and dynamic typing.
If you are familiar only with imperative languages, Erlang might look queer. The first surprise will be the variables, well, that don't vary. This is a feature of all pure functional languages. They treat variables as they are treated in Algebra. If you say something like "X = 10" in Erlang, X will hold on to the value '10' forever. If the value of X cannot be changed later, then why is it still called a variable? The point is this - Erlang has variables that can be changed only once. Single-assignment variables are not a handicap. They make software robust and easy to debug. Some advantages of single-assignment variables are:
1. They make it easy to reason about programs. Code can be treated as mathematical equations that, for a given input, will return a fixed output. Look at this simple example:
MassOfHydrogenAtoms = 0.111.
SpeedOfLight = 300000000.
EnergyInOneKgOfWater = MassOfHydrogenAtoms *
SpeedOfLight * SpeedOfLight.
Here, we have a formula that calculates the energy contained in 1kg of pure water. Whenever we evaluate EnergyInOneKgOfWater it is expected to return a number that equals to '10,000,000,000,000,000 Joules'. What if one of the programmers, by accident, modify the value of MassOfHydrogenAtoms or SpeedOfLight some where down in the source code? As an Erlang variable is just a reference to a value (like in Java), this can alter the meaning of the equation and put the program in an incorrect state. Fortunately Erlang prevents such errors from happening. If the value returned by EnergyInOneKgOfWater is not what your unit tests expected, there are only two places where you need to check while debugging. In an imperative language, you will have to wade through possibly millions of lines of code, looking for places where the values of MassOfHydrogenAtoms and SpeedOfLight could have been tampered with.
2. They make it easy to parallelize programs. As variables are prevented from being assigned again, all the problems related to synchronizing shared data is simply eliminated. The compiler can emit optimized code as it know variables won't be modified later in the program. For instance, the above could be optimized by the compiler into a single assignment -
EnergyInOneKgOfWater = 10000000000000000.
- eliminating all the overhead related to storing/accessing variables and applying arithmetic functions to them.
You must be wondering by now that, how it is possible to write something useful in a language that allows only 'constant' variables? Before I answer that question we need to look at another powerful feature of Erlang.
In most languages, = denotes an assignment statement. In Erlang, it stands for a pattern matching operation. Look at the following Erlang statements:
1> X = 10.
10
2> X = 6 + 4.
10
3> X = 6 + 3.
** exception error: no match of right hand side value 9
The first statement matches the variable X with the pattern 10. It is found that X is an unbound variable. So the object that represents 10 is bound to X. The second statement succeeds as X matches the patterns 6 + 4, but the third statement fail as 6 + 3 is 9 and it does not match the value bound to X.
Arbitrarily complex objects can be bound to variables and their values extracted using pattern matching. The following example shows how to represent a Cartesian point as a Tuple and extract its values:
4> Point = {point, 10, 15}.
{point,10,15}
5> {Type, X, Y} = Point.
{point,10,15}
6> Type.
point
7> X.
10
8> Y.
15
It is an Erlang idiom to represent such simple objects using built-in data structures like tuples and lists. The first element will be an atom (here it is 'point') that identifies the type of the object (called a type tag). The above sample shows how to extract the type and value of such an arbitrary object with the help of pattern matching.
In addition to packing and unpacking objects, pattern matching is used extensively in defining and invoking functions. Erlang functions use patterns to take decisions. This enables the programmer to express algorithms cleanly, without much dependence on conditional blocks and local variables. Here is a simple algorithm implemented in Erlang:
%% Computes the factorial of N.
fact(0) -> 1;
fact(N) -> N * fact(N-1).
Note: Erlang functions should be part of a module. For brevity we have not included the module declaration here.
The fact function consists of two patterns (in the context of functions they are called clauses). The first one returns 1 if the argument is 0. The second clause will continue to multiply N with fact(N-1) until the call to factorial matches the pattern - fact(0) which terminates the recursion and returns the computed factorial.
Pattern matching is so powerful and expressive a feature that, we almost never have to worry about maintaining mutable state.
Now let us look at some abstraction mechanisms offered by Erlang to complement pattern matching.
Guards are used to perform simple tests and comparisons on variables in a pattern. Here is a max function implemented using guards:
max (X, Y) when X > Y -> X;
max (_, Y) -> Y.
If X is greater than Y, the first clause is matched and the value of X is returned. If Y is greater than or equal to X, then the second clause is matched and Y is returned. We could've implemented the max function in a single clause using the if expression:
max (X, Y) ->
if
(X > Y) -> X;
true -> Y
end.
The true atom is often the final guard which will be evaluated if all other guard tests failed.
A 'List Comprehension' is a special syntactic extension to the language that makes it easy to generate lists from existing lists, based on certain conditions. They make the code short and easy to read. The following is alist comprehension that creates a list containing all the vowels in the given input. (In Erlang a string is just a list of integers).
1> [X || X <- "hello, world", (X =:= 97) or (X =:= 101)
or (X =:= 105) or (X =:= 111)
or (X =:= 117)].
"eoo"
This binary search implementation makes use of both guards and list comprehensions:
bsearch ([], _) -> false;
bsearch ([E], N) when E =:= N -> true;
bsearch ([_], _) -> false;
bsearch ([H|T], N) ->
Median = lists:nth (length ([H|T]) div 2, [H|T]),
bsearch_help ([[F || F <- [H|T], F < Median]]
++ [Median] ++
[[S || S <- [H|T], S > Median]], N).
bsearch_help ([FirstHalf, Median, SecondHalf], N) ->
if
Median =:= N -> true;
N < Median -> bsearch (FirstHalf, N);
true -> bsearch (SecondHalf, N)
end.
Just like Lisp, Erlang treats functions as first-class objects. They can be assigned to variables and passed around as arguments to other functions. Functions can also return other functions. The fun keyword is used to create anonymous functions, used for temporary usage as arguments and return values. A Function that take other function or functions as its argument or output a function as its return value is known as a higher-order function. Erlang comes bundled with many higher-order functions, especially for dealing with lists. An example is the function lists:map(F, L) which returns a list made by applying F to all elements of L:
1> lists:map (fun(X) -> X*2 end, [1,2,3,4,5]).
[2,4,6,8,10]
Here is a simple music database that makes use of higher-order functions:
-module (musicdb).
-export ([search/1]).
musicdb () -> [{{title, "Brothers In Arms"},
{artist, "Dire Straits"},
{rating, 4}},
{{title, "Pulse"},
{artist, "Pink Floyd"},
{rating, 4}},
{{title, "Brandenburg concertos"},
{artist, "J. S. Bach"},
{rating, 5}},
{{title, "Cello Suites"},
{artist, "J. S. Bach"},
{rating, 5}}].
search (Predic) ->
MusicDb = musicdb (),
[{{title, T}, {artist, A}, {rating, R}}
||
{{title, T}, {artist, A}, {rating, R}}
<- MusicDb, Predic (T, A, R)].
The function search(Predic) takes a function which filters the list based on three values: Title, Artist and Rating. The following shows a sample interaction with the database using predicate functions defined by the user:
1> musicdb:search (fun (_,_,R) -> R =:= 4 end).
[{{title,"Brothers In Arms"},
{artist,"Dire Straits"},{rating,4}},
{{title,"Pulse"},{artist,"Pink Floyd"},{rating,4}}]
2> musicdb:search (fun (_,Artist,R) -> (R =:= 4) and
(Artist =:= "Dire
Straits") end).
[{{title,"Brothers In Arms"},{artist,"Dire Straits"},
{rating,4}}]
3> musicdb:search (fun (_,_,R) -> R > 4 end).
[{{title,"Brandenburg concertos"},{artist,"J. S. Bach"},
{rating,5}},
{{title,"Cello Suites"},{artist,"J. S. Bach"},{rating,5}}]
The bit syntax is an interesting and extremely useful extension to pattern matching. It is used for extracting and packing individual bits or sequences of bits in binary data. The bit syntax is incredibly useful for reading and writing binary data at a bit level. The bit syntax produces highly efficient code and will come in handy when dealing with highly optimized file formats and network protocols.
The following example shows how to extract information from a MIDI file using the bit syntax:
-module (midi_parser).
-export ([parse/1]).
parse (Midi_File) ->
{ok, Data} = file:read_file (Midi_File),
parse_helper (Data).
parse_helper (Midi_Data) ->
%% First 14 bytes makes the header.
{Hdr, _} = split_binary (Midi_Data, 14),
<<MThd:32, HeaderSize:32, Format:16, NumTracks:16,
DeltaTimeTicks:16>> = Hdr,
%% Pack the extracted information into Erlang data
structures.
[{mthd, MThd}, {headerSize, HeaderSize},
{format, format_to_atom(Format)},
{numTracks, NumTracks}, {deltaTimeTicks, DeltaTimeTicks}].
format_to_atom (0) -> single_track;
format_to_atom (1) -> sync@multi_track;
format_to_atom (2) -> async@multi_track.
Usage:
1> c(midi_parser).
{ok,midi_parser}
2> midi_parser:parse ("./teddybear.mid").
[{mthd,1297377380},
{headerSize,6},
{format,sync@multi_track},
{numTracks,4},
{deltaTimeTicks,192}]
Now the time has come to study the feature that really sets Erlang apart from the rest of the crowd...
Erlang is a language designed with parallelism in mind. An Erlang program easily scales from running tiny parallel processes on a single processor to multiple cores and processors distributed across a network. Erlang processes are self-contained virtual machines that can evaluate Erlang functions. These are not operating system processes, but they belong to the programming language.
In Erlang creating and destroying processes are very fast. They do not share memory. The only way for processes to interact is through message passing, which is also very fast. The natural way to design an Erlang program is by factoring out the logic into tiny processes that interact with each other. Within a single instance of Erlang, you can have tens of thousands of such processes. Erlang is a pure message passing language.
Let us re-write the music database as two components - a server process that maintains the database and responds to queries and a client process that sends a query to the server and prints the result:
%% MusicDB Server
-module (musicdb_server).
-export ([start/0]).
start () -> spawn (fun loop/0).
musicdb () -> [{{title, "Brothers In Arms"},
{artist, "Dire Straits"},
{rating, 4}},
{{title, "Pulse"},
{artist, "Pink Floyd"},
{rating, 4}},
{{title, "Brandenburg concertos"},
{artist, "J. S. Bach"},
{rating, 5}},
{{title, "Cello Suites"},
{artist, "J. S. Bach"},
{rating, 5}}].
loop () ->
receive
{From, Predic} -> From ! search (Predic),
loop ();
Any -> io:format ("Received invalid message:
~p~n", [Any]),
loop ()
end.
search (Predic) ->
MusicDb = musicdb (),
[{{title, T}, {artist, A}, {rating, R}}
||
{{title, T}, {artist, A}, {rating, R}}
<- MusicDb, Predic (T, A, R)].
%% MusicDb client
-module (musicdb_client).
-export ([start/1]).
start (ServerPid) ->
ClientPid = spawn (fun loop/0),
{ClientPid, (fun (Predic) ->
ServerPid ! {ClientPid, Predic} end)}.
loop () ->
receive
Any -> io:format ("Response: ~p~n", [Any]),
loop ()
end.
To use the code, we load both the server and client modules into the Erlang shell. Then we start a server process and a client process. The client process will use the server process' ID to send it a request. The request will contain the client process ID and a function used to filter the results. The server will generate the result by using this predicate function and return it to the client using its ID.
49> c(musicdb_server).
{ok,musicdb_server}
50> c(musicdb_client).
{ok,musicdb_client}
51> ServerPid = musicdb_server:start().
<0.179.0>
52> {ClientPid, SearchF} = musicdb_client:start(ServerPid).
{<0.181.0>,#Fun<musicdb_client.1.131757040>}
53> SearchF (fun (_,Artist,Rating) -> (Artist =:=
"J. S. Bach") and (Rating =:= 5) end).
Response: [{{title,"Brandenburg concertos"},{artist,"J. S.
Bach"},{rating,5}},
{{title,"Cello Suites"},{artist,"J. S. Bach"},{rating,5}}]
{<0.181.0>,#Fun<erl_eval.18.105910772>}
Here we ran both the client and the server in the same Erlang node. It is easy to run the server process on one node and client process on another, without any modification to the code. All that is needed is to start the Erlang virtual machine in distributed mode by passing the node name as a command line argument. Erlang also provide very high level support for error detection and recovery in distributed programs.
We have just scratched the surface of Erlang. It is an ideal language for building large-scale, fault-tolerant, distributed applications. (You can find lots of real-world Erlang projects here). Erlang comes with OTP, which is a platform and a set of libraries that abstracts the details of such applications for us. Erlang also boasts of having many frameworks that will be of interest to developers creating software with massive scalability requirements. Here are a few of them:
The best tutorial and reference for Erlang is Programming Erlang: Software for a Concurrent World by Joe Armstrong.
You will also benefit from the online Erlang/OTP documentation.
Also see: Erlang for C, C++ and Java Programmers.