Narrative and State-Based Descriptions

Narrative and state-based descriptions

Software developers need to solve the most complex problems of the modern world using a brain that evolved mostly in the paleolithic age and earlier. How can we tap into the cognitive strengths of a hunter-gatherer to better produce and maintain software?

As we know, software development is a matter of communication as well as logic. We need to communicate clearly with the machine, of course, but just as importantly we need to communicate with other software developers and our future selves. Furthermore, clear thinking is largely a matter of communicating with our present selves. Have your thoughts ever been clarified through the act of writing? This happens to me all the time; it certainly happened while writing this essay.

In computing we often need to describe sets of sequences. Some common examples include: sets of sequences of symbols (i.e., formal languages); sets of sequences of operations on a set of objects; sets of sequences of interactions with external agents.

These sets are typically infinite and may include infinite sequences, so we need some finite way to describe the sets.

There are, I think, two modes of thought that we use when creating finite descriptions of large sets of sequences. I'll call these the narrative mode —I'm inventing the name here, I haven't seen a name put to the concept before—  and the state-based mode.

Narrative descriptions tap into the deep human habit of telling stories. We generally tell stories in order: this happened and then that. If we are giving someone directions, we might say "Keep walking north on Main Street until you get to the post-office, then turn left" or "Ring the bell, then if no one answers the door, go around to the back."

State-based descriptions describe how to react in different situations. They are a collection of rules: "If you are at the library, turn north and start walking. If you are walking north on Main Street and are south of the post-office keep walking. If you are walking north on Main Street and at the post office, turn left." The order of the rules doesn't matter because the sequencing is done via states.

With narrative descriptions the states are in the background. You can still see them in the negative space, e.g., the gap between two symbols in a regular expression or the line-break between two commands in a program. They are there, but unnamed and often unmarked.

Just as there are two modes of thought, there are two kinds of description formalism, which I'll also call narrative and state-based. Narrative formalisms have a bias toward the narrative mode of thought and state-based formalisms have a bias toward the state-based mode of thought.

State-based formalisms include finite state automata, Turing machines, flow charts, Harel's and UML's state charts, machine code, and programming languages like Fortran II. The last two might be considered hybrids, as not all states need to be named. If we add explicit stack operations to finite state automata, we get pushdown automata. If we add implicit stack operations to finite-state automata, we get syntax diagrams (sometimes called railroad diagrams); recursive transition networks are essentially the same thing.

Narrative formalisms include regular expressions, structured programming languages without recursion (say a language with just sequential composition, if-then-else commands, and while loops), process algebras, dynamic logic, and use cases.

Descriptions in narrative formalisms are trees in which the leaves are primitive narratives and the nonleaf nodes are operators on narratives. Descriptions in state-based formalisms are of sets of rules. Narratives are explicitly structured. State-based descriptions are not.

(Another name for narrative formalisms might be algebraic, emphasizing the analogy of sequential composition with multiplication, choice with addition, etc. Narrative formalisms tend to be semirings. However, in this essay, I'd like to emphasize the relationship of what I've called narrative formalisms to stories. There are lots of things that are algebraic but not are narrative; number theory is an example.)

While naming states doesn't seem to be particularly useful in the narrative mode, naming stories is. A narrator may jump from what he did in Paris to what he did in Berlin, but mention that they are leaving the story of the train ride for another day, or perhaps that is a story they already told before and they just refer to it now. In either case, giving the substory a name helps keep everything straight and promotes modularity.

Allowing named stories to refer to each other or themselves recursively is another step. Adding recursion to regular expressions gives extended-context-free grammars (a.k.a. extended BNF or EBNF). Recursive stories (unlike iterative loops) are not something that seem to be supported by our hundreds-of-thousands-of-years-old habit of story telling.

This may explain why recursion is often harder to learn for students than the other elements of the narrative mode. In a sense, though, recursion is just an extension to naming, so I'm inclined to say we are still in the narrative genera of description formalisms. And we know that, while the looping in extended CFGs reduces the need for recursion, there are still many problems that are elegantly solved by recursion. By the same token, we can add recursive procedures to a structured programming language and still be narrative.

By restricting extended-context-free grammars to only use sequential composition, we get context-free grammars (a.k.a. Backus-Naur Form, a.k.a. BNF). It would seem that adding a restriction to a formalism can't change it from narrative to state-based.  After all any description we can make in the restricted formalism we could have made in the unrestricted formalism, so if the description was narrative before it must still be narrative in restricted formalism. However, in fact we've started down a slippery slope: If we restrict further that all productions have the form A --> xB, we arrive at finite state automata.

So the lesson here is that narrative and state-based formalisms aren't clearly separated. Rather, as I said before, there are narrative and state-based modes of thought and different formalisms promote one mode over another. Because EBNF is so flexible, we can use it to write descriptions in either a narrative or a  state-based mode, but, because it makes narratives easy, it tends to promote a narrative mode of thought. How does it make narrative easy? Because, by using a more narrative style, I don't have to introduce as many nonterminals.

For me EBNF is a nice compromise. The same compromise is made in many modern programming languages. We have structured constructs (sequencing, choice, repetition) to construct our narratives, but also subroutines to give our narratives names. In theory we could use the recursive subroutine mechanism to write completely unstructured code, but in practice this rarely happens.

UIs, asynchronous programming, and inversion of control

So what motivated these thoughts? A few years about I asked myself: How do we deal with a sequence of interactions with the user? Back in the days of command-line programs, we used a narrative form. We used to write code like this

     ask the user for input

exit when there is no more input

     process the input

     output the result

end loop
More recently, use cases have become a popular way of describing sets of sequence of interactions during system specification. Use cases are very much narratives.

However, with graphical user interfaces has come event-driven input. Here we are expected to set up a set of event handlers and then the main program exits. From that point on the code executes a series of reactions to the various possible input events. The reaction to an event may depend on what has happened in the past and so we have a state dependent response. This is particularly true with actions like mouse movements or mouse clicks whose interpretation depends on past inputs.
onStart :  showQuestion(); showAnswerField() ; state := 1

onInput: if state=1 then hideQuestion(); hideAnswerField() ;
                 showOutput( process(getInput()) ) ; showNextButton() ; state := 2 end if

onDone: if state=1 then hideQuestion(); hideAnswerField() ; state := 3 end if

onNext: if state=2 then hideOutput() ; hideNextButton() ;
                 showQuestion() ; showAnswerField() ; state := 1 end if
There is a name for this sort of programming: Inversion of Control.

Writing programs using inversion of control essentially means throwing out the narrative formalism provided by our programming language: We no longer use whiles to create loops, nor ifs decide choices, at least not when the choice is dictated by the user's choice of the next input event. The result is an approach to programming that is unstructured and inviting of ad hoc solutions. It is difficult to write and difficult to read. As a result it is difficult to verify. We can fight the ad hoc solutions by using principled approaches to inversion of control such as the state pattern. This improves the situation somewhat, but it underscores that we are now living in the land of state-based thinking.

User interfaces are just one area where we are forced by our technology away from narrative and towards state-based modes of thought. Others include interacting with network communications. In general we call this asynchronous programming.

Computer scientists developed narrative formalisms to cope with asynchronous programming in the 70s and 80s, starting with threads and monitors and later with rendezvous and process algebras. Process algebras are very close to --and inspired by-- context-free grammars, especially in how they deal with choices imposed by external agents; however ideas from process algebras have not yet made it into mainstream language design nor into libraries for mainstream languages.

When I started thinking about a way to design and implement user interfaces in a more narrative way, I thought about using context-free grammars and perhaps generating UI code using something similar to a parser generator. However I then realized that I could follow an approach similar to monadic parsing, i.e., writing a library rather than a tool. But backtracking is more constrained in an interactive system than when parsing; we can't backtrack over output events that the user has already seen. Thus some changes from parsing monads are required; taking a monadic leads to implements a process algebra as a library or an embedded domain specific language; using such an embedded DSL, we can hide the inversion of control under an abstraction layer. Above that layer we write applications in a narrative style. I call my library "take back control". I've written more about it in another blog post.

Elsewhere I've described the difference between inversion of control and upright control as a difference between pushing and pulling. With inversion of control, a sequence of events is being pushed at out program and it must react to each one as it arrives. Pulling events means the code is written to pull events into the program, events are dealt with only when the program is ready to deal with them. This contrast between push and pull does not really explain why I find pull code preferable to code that simply reacts to a pushy world. This post is meant to fill that gap: The pull approach fits a narrative mode of thinking, which I think is easier for us humans, with our paleolithic brains, to cope with.


How to do redo, and undo revisited


An earlier post looked at 5 techniques for implementing undo/redo.

Here I want to take another look the problem and look at a solution based on immutable data structures.

Essentially it is a check-pointing solution, but by using an immutable data structure there is essentially a zero time cost to making a checkpoint.

Nodes, paths, and trees

The running example from the earlier post used a tree as the data structure with the following interface.

public interface TreeI {
    NodeI makeNode( String text ) ;

    NodeI getRoot() ;

    void insert( NodeI parent, int posn, NodeI child ) ;

    void remove( NodeI parent, int posn ) ;

Our immutable version will be a bit different. We start with nodes. We want to have nodes that can't change. Our node class only has constructors and accessors, not mutators.

public class Node {
public Node ( String text, List<Node> children ) { ... }

    int getChildCount() { ... }

    Node getChild( int i) { ... }

    String getLabel() { ... }


We can identify any node in a tree by its path from the root. For example, [] represents the root of a tree, [0] represents the first child of the root, [0,1] represents the second child of the first child of the root, and so on. For this scheme to make sense, the tree can never be empty; there will always be a root node. Now we change the tree interface to

public interface TreeI { 

    public Node getRoot( ) ;

    public void replace( Path p, int i, int j, List<Node> newNodes );

    public void changeLabel( Path p, String newLabel) ;


The replace function replaces all children of the given node from position i up to (but not including) position j with a list of new nodes. The changeLabel method changes the label on a node. Both these functions work by creating new nodes as needed. For example replace( p, 0, 1, ns ) where p is the list [0,1] will build three new nodes, all the other nodes of the new tree (apart from possibly the ones in list ns) will be shared with the old tree.

The interface is implemented by a class

public class Tree implements TreeI { 

    protected Node root = new Node( "", new ArrayList<Node>() ) ;

    public Node getRoot( ) { return root }

    public void replace( Path p, int i, int j, List<Node> newNodes ){ ... }

    public void changeLabel( Path p, String newLabel) { ... }


Implementing undo and redo.

Once we have an immutable data structure undo and redo are very simple. The main difference compared with the checkpointing solution of the earlier post, is that we don't make copies of the model.

public class TreeWithCheckpointing extends Tree {

    Stack<Node> olderTrees() = new Stack<Node>() ;

    Stack<Node> newerTrees() = new Stack<Node>() ;

    public void checkpoint() {
        olderTrees.push( root ) ;
        newerTrees.clear() ; }

    public void undo() {
        if( olderTrees.size() > 0 ) {
            newerTrees.push( root ) ;
            root = olderTrees.pop() ; } }

    public void redo() {
        if( newerTrees.size() > 0 ) {
            olderTrees.push( root ) ;
            root = newerTrees.pop() ; } }



  • This approach is very simple and fool proof. We don't need that the model be copyable, as we only copy pointers.  
  • There may other advantages to using immutable data structures. For the system where I used this, the main reason for using immutable data structures was to enforce invariants. If there are no mutators, we only have to worry about object invariants at node construction time. Another advantage is in reusing view object; we can make a cache mapping nodes to view objects, since the nodes never change, the view objects are sure to be accurate representations..
  • Because there is no copying, it is more efficient than the checkpointing solution with mutable data.


  • The main disadvantage is the awkwardness of an immutable data structure. It works well for trees, but for an arbitrary graph we have to give up on using ordinary pointers to represent edges.


Take Back Control. GUIs and other event driven programming without inversion of control.

GUI programming and inversion of control

Writing graphical user interfaces can be a tricky business. Typically we set up a set of event handlers. Each handler waits for its event and based on the current state moves the state to some other state. This arrangement is commonly called "inversion of control".

This is fine for simple GUIs, but once the system gets past a certain level of complexity it is rife with opportunities for error.  What if an event happens in a state that we didn't anticipate. What if we forget to disable events? Once we add in asynchronous IO such as HTTP requests, new complexities arise.

One solution is to carefully draw out a state machine --perhaps using state charts-- and use the state pattern to ensure that states are entered and and exited cleanly and that transitions only happen when we are in the correct state.

However state machines are at least one level of abstraction down from the usual structured programming design notations such as sequential composition (f followed by g), if-then-else, while-do. It's a lot like programming with only conditional goto statements.  Furthermore algorithmic abstraction (subroutines) are not available with state machines, unless you want to get in to pushing states on a stack. What's worse, state machines are a level up from the code you write. You might have a lovely state machine drawn on a white board, but once you've translated it to code all you have is an unstructured pile of code. Even if you keep that code well organized, for someone else to really understand that code, they'll need to translate it back to a nice picture. The solution is to keep the code and a diagram of your state machine in sync, but we all know that code and its documentation tend to drift apart. Furthermore, state machines do not correspond well to use-cases, which makes it harder to verify that your design meets all the use cases and only the use cases.

So, if you start from use cases, there are two design steps to take: translate from use cases to a state machine; translate from the state machine to event driven code.  Each step changes the paradigm and each is a potential source of error. Furthermore, if you are maintaining code, the first thing to do is to understand what it does. To reconstruct the use-case model from event driven code requires reverse engineering the same steps: translate from event driven code to a state diagram; translate the diagram back to use cases.

So my aim was to allow a more direct translation from use cases to code: a way to avoid inversion of control.

Reverting control

When my students first design GUI programs, I have them create sequence diagrams for their software. Often the diagrams come back looking something like this -- translated back into code:
    function main() {
        displayWindow() ;
        while( true ) {
            waitForUserToClick() ;
            reactToClick() ; }
This makes a lot of sense if you consider that, in all their previous courses, they've been writing console applications where input comes only when it is asked for.  It also makes sense if you consider that a GUI is --in essence-- a set of sequences of interactions and the student already know a language for expressing sets of sequences of actions, namely the ordinary programming constructs of sequential composition, choice, iteration, and function calls.

My contention is that these student, who might seem naive to those of us used to writing event driven GUIs, are on to something worth looking into.  Is there a way to realize a GUI without inversion of control?

The goal is to be able to describe the user experience in a simple and executable notation. Here is an example of such a notation
    displayWindow >
        await( click( b ) && reactToClick )
Here b is an object representing a button, displayWindow and reactToClick are functions representing changes to some model and/or view, and loop, await, click, > and && are functions and methods from the library I'll be describing in the next section.

The tbc library

The remainder of this post describes an experimental library named tbc (for "Take Back Control"), written in Haxe to explore these ideas. The language I use below is Haxe, but everything could be translated into just about any language that has lambda expressions and closures with lexical scoping, for example JavaScript, Scala and many many others.

The library is structured as several modules, so far these are tbc.TBC, tbc.TBCTime, and tbc.TBCHTML. Unless otherwise mentioned all names are exported from either the TBC module, or are static functions of the TBC.TBC class. It can be found at https://github.com/theodore-norvell/tbc .

I've been thinking in terms of web development, which is why there is a  TBCHTML, but one could also have a TBCSwing or some such. One of the nice things about Haxe is that it can be translated into lots of useful languages such as JavaScript, Java, C++, and so on. The core TBC module is target independent -- can be used regardless of the target language; TBCTime works with some targets but not all; TBCHTML is specific to JavaScript.


Central to the library is a generic data type, Process<A>, which is the type of all processes that compute values of type A. For example an object of type Process<Int> is an object that can compute an integer value.  Executing a process may cause side effects to happen; side effects include changes to state, output, input, and delays; delays include waiting for input and waiting for time to pass.

We can execute a process using its .go method. This method causes the process to execute, but there is nothing in the semantics of .go that says that it will finish immediately, it might finish after some time, and, in particular, after some interaction with the user.  The argument of .go is a void function that takes an A. For example if p is an object of type Process<Int> then
    p.go( function( x : Int ) { trace(x) ; } ) ;
will cause p to start to execute and, when it has finished, the Int it produces is sent to the trace routine. The argument
    function( x : Int ) { trace(x) ; }
is an anonymous function or lambda expression.  The trace function is a standard Haxe function for outputting a message to a log.

One particularly useful process is unit(x), which is a process that computes value x immediately with no delay and does nothing else. So
    unit(42).go( function( x : Int ) { trace(x) ; } ) ;
is a complicated way to output 42 to the log.

As an abbreviation skip() stands for unit(null). skip() has type Process<Triv> and Triv is a type that has null as its only value. So skip() is a process that has no side effects and computes an informationless value.

If f is a function of type Void->A (i.e., a function with 0 arguments that returns a value of type A), then exec(f) is a Process<A> that executes by calling f, and the result of the process is the result of f. The exec function is useful for inserting side effects, such as changing or reading global state, into the computation.

Another kind of side effect is to spend time doing nothing. In the TBCTime module, there is process function pause so that pause(n) is a Process<Triv> that completes after about n milliseconds.

We can combine processes in a number of ways.  p.sc(q) is a process that, when executed, executes process p, throws away its result and then executes process q; the value computed by p.sc(q) is the value computed by the execution of q.  As an example
    pause(1000).sc( exec( hello ) )
is a process that executes function hello after a delay of 1 second.

You can think of sc as standing for either "sequential composition" or for "semicolon". Sequential composition is associative
    p.sc(q.sc(r))  =  p.sc(q).sc(r)
And skip() and unit(x) (regardless  of x) are left units of sc
    p  =  skip().sc(p)  =  unit(x).sc(p)
Furthermore skip() is a  right unit
    p  =  p.sc( skip() )
provided p is of type Process<Triv> .

As a convenience the > operator is overloaded to mean sequential composition. So
    p > q > r
    p.sc( q ).sc( r ) 
The laws are
    (p > q) > r  =  p > (q > r)
    p  =  skip() > p  =  unit(x) > p
    p  =  p > skip()
provided (for the last law only) that p is of type Process<Triv> .

Besides sequential composition, we can imitate other sequential programming constructs. A process ite( p, q, r), where p is of type Process<Bool> and q and r are both of type Process<A> (for any type A) is a Process<A> that mirrors if-then-else statements. And loop(p) is a Process<Triv> that is equivalent to an infinitely repeated sequential composition of p with itself. And so on.

A useful generalization of sequential composition is bind. The problem with sequential composition is that it throws away the result of the left process. Bind solves this problem. If  p is a Process<A> and f is a function from type A to type Process<B>,  then p.bind(f) is a Process<B>. To execute p.bind(f), first p is executed, then its result is fed to f, finally the resulting process is executed, delivering the result. bind is the familiar bind (or flatmap) operation from monads.  As an abbreviation p.bind(f)can be written by p >= f. Bind and unit obey the usual monad laws
    (p >= g) >= h  =  p >= function(y) { return g(y) >= h ; }
    p >= unit  =  p
    unit(x) >= f  =  f(x)
provided that y is not free in either g or h. The first law is not used much in practice exactly because often y is free in g or h. People tend to use bind as a combination of sequential composition and local variable declaration. So you might see code like this
    p >= function( x ) { return
    ...x... >= function(  y ) { return
    unit(...x...y...) ; } ; }
This is so common in Haskell that there is a special notation for it
    do x <- p
       y <- ...x...
       return (...x...y...)

If we were to define the operation f*g to mean function(x) { return f(x).bind(g); } (where x is not free in g), then the three laws concerning bind can be written in a very familiar way
    (f*g)*h  =  f*(g*h)
    f*unit  =  f
    unit*f  =  f 
In practice >= it is easier to use than *, and so we define >= but not *.

Much has been written about monads elsewhere, which means I don't need to write much here.  I'll just note that p.sc(q) is defined in terms of bind to be
    p.bind(function(x) { return q ; } )  
And so
    (p > q) >= f  =  p > (q >= f)
    (p >= f) > q  =  p >= function (y) { return f(y) > q; }
provided that y is not free in f or q. Furthermore loop(p) is defined to be
    p.bind( function( a : A ) { return loop(p) ; } ) 

In Haxe, >= and > are at the same level of precedence and associate to the left. For example
    p > q >= f > r >= g
    (((p > q) >= f) > r) >= g
which means

[Note for Haskell programmers: Just remember that Haskell's >> is TBC's > and Haskell's >>= is TBC's >=; i.e. drop a > from the Haskell notation to get the TBC notation.  There is probably some way to use Haxe's macro system to get something like Haskell's do notation, but I don't know Haxe's macro system (yet).]


If p is a Process<A> and q is a Process<B> object, par(p,q) is a Process<Pair<A,B>> that, when executed, executes both p and q, possibly interleaving their various side-effects. Whenever both are delayed, either one may be the next to proceed. Note that everything actually runs on one thread.

For example
            exec(outX) > pause(2000) > exec(outX)
            exec(outY) > pause(2000) > exec(outY)
is a process that executes functions outX and outY twice each. The first outX and outY are immediate, while then latter two are executed about 2 seconds later. In theory, any of the interleavings
    outX outY outX outY
    outX outY outY outX
    outY outX outX outY
    outY outX outY outX
are possible.

Fork-and-join parallelism is also possible, although I haven't implemented it yet.

Guards and Guarded Commands

An object of type Guard<E> represents some category of input events that might happen in the future. For example, if b is a button element on a web page, then click(b) is a guard representing clicks on the button. click is in the TBCHTML module. The type parameter E is the type of objects representing information about the events, for example the coordinates of a mouse click. The type click(b), if b is an HTML button element, is Guard<Event>, where Event is the type of DOM events in HTML.

If g is a Guard<E> and function f is of type E -> Process<A>, then g.guarding(f) is a GuardedProcess<A>. A guarded process might be thought of as a process that starts with an event.

A method await turns a set of guarded processes into a process. So if m0 = g0.guarding(f0) and m1  = g1.guarding(f1) are guarded processes of the same type, then await(m0,m1) is a process. The process await(m0,m1) is executed as follows: First time passes until one of the events associated with m0 or m1 happens. Say an event that g0 represents happens, then execution continues with the execution of f0(e0), where e0 is the information associated with the event.

When the process await(m0,m1) begins, the event categories associated with m0 and m1 are enabled. In the implementation, this may mean installing event handlers and changing the appearance of GUI widgets. When one of the events fires, the event categories are disabled before execution proceeds with the corresponding guarded process. If both m0 and m1 are triggered by the same event, only one of them proceeds; which one is up to the implementation.

Here is an example of guarding
            await( click( b0 ).guarding( out0 ) ) >
                click( b1a ).guarding( out1A )
                click( b1b ).guarding( out1B )
            ) >
           await( click( b2 ).guarding( out2 ) )
Here the b0, b1a, b2a, and b2 are button elements on an HTML page and out0, out1A, out1B, and out2 are functions returning processes that make changes to the page.  The interaction with the user proceeds as follows.
  • b0 becomes enabled.
  • When b0 is clicked, it becomes disabled and buttons b1a and b1b become enabled.
  • When one of these two buttons is clicked, both become disabled and b2 becomes enabled.
  • When b2 is clicked, it becomes disabled and the process continues from the start.
The use case scenario corresponding to this process is
  1. The user clicks on b0
  2. The system makes change out0
  3. The user clicks either b1a or b1b
  4. If the user clicked on b1a
    • The system makes change out1a
  5. Else the user clicked on b1b
    • The system makes change out1b
  6. The user clicks on b2
  7. The system responds with change out2
  8. Back to 1.
Of course a real use case would explain what out0 etc are, but none of that matters to us. The point is the close correspondence between the use case and the code.

The >> operator is overloaded to mean guarding, so the last example can be written as
        await( click( b0 ) >> out0 ) >
            click( b1a ) >> out1A
            click( b1b ) >> out1B 
        ) >
        await( click( b2 ) >> out2 )

In some case we might want to throw away the event information. This is easily done using
    g >> function( e : E) { return p ; }
which is a GuardedProcess<A> if g is a Guard<E> and p is a Process<A>.  For example we might want to simply wait for a click on button b. This Process<Triv> will do that
    await( click(b) >> function( e : Event ) return skip() ; } )
As a convenience, guards have a .andThen method to handle this situation less verbosely. We can write
    await( click(b).andThen( skip() ) )
or, using operator overloading,
    await( click(b) && skip() )
Since && has lower precedence than > and >= we can write
    g && p > q >= f
which will be parsed as
    g && ((p > q) >=f) 

The .sc and .bind methods can also be applied to guarded processes on their left. Thus
    g >> m >= f
which is parsed as
    (g >> m) >= f
means what you would expect: when g is fired, the event is sent to m and then the result, a, of executing m(e) is sent to f and the result of executing f(a) is the result of the whole thing. The following associativity law.
    (g >> m) >= f  =  g >> function(e) { return m(e) >= f ; }
holds provided e is not free in m or f. Likewise
    (g >> m) > p  =  g >> function(e) { return m(e) > p ; }
holds provided e is not free in m or p. Furthermore
    (g && p) > q  =  g && (p > q)
    (g && p) >= m  =  g && (p >= m)

[Note on the notation. The && operator was chosen most because of its low precedence.  Furthermore | could combine guards, while || could combine guarded processes. So it is nice that && has lower precedence than | and higher precedence than ||.  (But note that >> has higher precedence than |.)]

[Another notational note: It is tempting to use > for .andThen and >= for .guarding.  Advantage: fewer operators to remember. Disadvantage: this notation obscures the types, making code less readable. For example, if you read a && p, it is clear that a is a guard and the result is a guarded process. But, with the alternate notation, if you read a > p, the type of a could be guard or process or guarded process and the type of the result can only be determined after determining the type of a. Also, with the alternative notation, skip() is not always a right unit of >, nor is unit always a right unit of >=.]

Guards can also be used for time events. The timeout guard fires if it's been enabled for more than a certain amount of time. For example, if the user fails to respond in a timely manner, we might want to move on. Process nag(null), defined like this
    function nag(triv : Triv) : Process<Triv>{ return
                click( b ) && exec(thanKTheUser)
                timeout( 1000 ) && exec(nagTheUser) >= nag
        ) ; }     ,
nags the user every second until the button is clicked.

In fact pause( t ) is defined by
    pause( t )  =  await( timeout( t ) && skip() )

Summary of the core operations

Suppose that
  • p is of type Process<A>
  • q is of type Process<B>
  • f is of type A -> Process<B> 
  • h is of type A -> C
  • g is of type Guard<E>
  • m is of type E -> Process<A>
  • gp, gp0, gp1, ..., gpn are of type GuardedProcess<A>
Meaning Syntax Type Operator syntax
Do p, then do q. p.sc( q ) Process<B> p > q
Do p, then do f(x), where x is the result of p. p.bind( f ) Process<B> p >= f
Make a guarded process that does p when fired. g.andThen( p ) GuardedProcess<A> g && p
Make a guarded process that does m(e) when fired, where e represents information about the event. g.guarding( m ) GuardedProcess<A> g >> m
Make a guarded process similar to gp, but that does that does q after gp has executed. gp.sc( q ) GuardedProcess<B> gp > q
Make a guarded process similar to gp, but that does that does f(a) after gp has executed, where a is the result of executing gp. gp.bind( f ) GuardedProcess<B> gp >= f
Fire one of these guarded processes. await( gp0, gp1, ..., gpn ) Process<A>
Do p again and again loop( p ) Process<Triv>
Do both p and q. par( p, q ) Process<Pair<A,B>>
Start process p. p.go( h ) Void

Related ideas

When I first looked into this a few years ago, I found Martin Odersky and Philipp Haller's 2006 paper, "Event-based programming without inversion of control", which introduced Scala's Actors library. The paper mentions user interfaces as an area for future work.  Recently I came back to this question, and the above documents some preliminary experimentation.

In addition to Odersky and Haller's actors, some of the inspirations for TBC include process algebras, such as Tony Hoare's CSP and Robin Milner's CCS; (extended) context-free grammars; parsing combinators.; monads as used in Haskell and more recently in other languages such as JavaScript; JavaScript's promises; and the command pattern as documented by Erich Gamma and company.

Further work

I'm happy with the basic abstractions of process, guard, and guarded process. However, the above is just an early exploration. There is much more to consider including
  • Internal events for communication between parallel processes. Processes can already communicate via shared global state. However this does not allow synchronization between processes. For that and to avoid using shared state, synchronous messages, asynchronous messages and rendezvous are the classic approaches.
  • Communication with external entities other than the user. In the HTML context, this includes XML HTTP requests.
  • Failures and recovery.
    • One approach is that exception catching can be emulated by means of a second version of bind. While bind says what to do next in case of success, onFail says what to do in case of a failure. We give processes a second type parameter so that Process<A,B> is a process that succeeds with a value of type A or fails with a value of type B. If p is of type Process<A,B> then and f is of type A -> Process<C,B>, then p.bind( f ) is of type Process<C,B>. Failure can happen either within p or within the result of f.  Symmetrically,  if g is a function of type B -> Process<A,D> then p.onFail( g ) is of type Process<A,D>. Here if p fails, there is a chance of recovery, should the result of g succeed. This is just like try{ p }catch( e ){ g(e) }. Note the complete symmetry between succeeding and failing.
    • Those familiar with JavaScript's Promise type will note that there is little original here.
    • Those familiar with monads will note that Process<A,B> is monad in two ways. I.e. we can fix either A or B to get the functor part of a monad.
    • The problem is that I'm not a big fan of try-catch, so before going down this road, I'd like to think about alternative ways to deal with failure and recovery.


A simpler RMI tutorial for Java

Oracle corp has a tutorial on RMI which is quite involved. There are 4 different kinds of processes working together so that two of them can talk: A server, a client, an rmiregistry process, and a web server to serve class files.  There are also security policies for both the server and the client.

My purpose here is to strip RMI down to the simplest possible set up. In particular

  • No security policy is used.
  • All code is in copies a single jar file.
  • There is no dynamic code loading.
  • No separate rmiregistry process is needed.
This brings the number of kinds of processes down to 2, reduces the number of command-line options needed to zero, and reduces the number of security policy files needed to zero. It also simplifies distribution.

There are no doubt some advantages to using an external registry process and dynamic code loading. However for many applications they just aren't needed.

The server code

My example code is an Othello game. The main server class is called RemoteGameModel. This class records the state of the game; I used the observer pattern to alert observers of any change to the state of the game. The observers are typically running in other JVM instances, possibly on other machines.

The hookup of the observers to the RemoteGameModel is initiated by the observers; they will look up the RemoteGameModel using a registry. Thus after creating the RemoteGameModel in the server we need to register it in a registry running at a known location.  This is done with a few lines of code in a main method in class ServerMain:

First we create the observable model. The RemoteGameModel classs extends UnicastRemoteObject, which means that as part of being constructed, the object is exported. Being exported is important since it means that when the object is passed to a remote object, a stub will be made for it and passed instead.

Second, we need a registry. Rather than using an rmiregistry process, it is easier to create a registry in the same JVM as the server object resides. This means the registry and the server must be on the same host. The port number used should be unique to your application.

Third, the RemoteGameModel is registered with the registry.

The next two lines report the URI that can be used to refer to the RemoteGameModel object.

There are a few things that can go wrong, so it's important to wrap the above code in try-catch.

The client code

On the client side, we need to obtain a reference to the RemoteGameModel which is on another machine. In RMI we ask the registry and the registry sends back a proxy (stub) object representing the server. The stub implements the RemoteGameModelInterface, just as does the actual RemoteGameModel.

The following code is in the main method of class ClientMain:
First set a string to be the URI for the object.

Then the static method Naming.lookup returns the stub.

Once we have a reference to this object we can use it to send messages to the server object. In my code I use it to build an observer in the client. I pass the reference to the proxy into the constructor of the observer object. The last thing the observer does is to send addObserver(this) to the proxy. As the observer object's class extends UnicastRemoteObject, this sends a proxy for the observer over to the servers machine, where it is added to the server's list of observers.

Deploying and running the code

The code can now be tested simply by starting a JVM process running ServerMain.main and a number of processes running ClientMain.main.

To make deployment easier, it is a good idea to put all files in a .jar file. I called mine othello-2.jar. Then the server can be started from the command line with the command
java -cp othello-2.jar othello.ServerMain
And the clients can be run from the command line with the command
java -cp othello-2.jar othello.ClientMain

Further information and source code

Further information about the techniques used in the Othello project can be found at  http://www.engr.mun.ca/~theo/Courses/sd/notes.html under "Animation with Threads" and "RMI and remote, concurrent observers". The source code at http://www.engr.mun.ca/~theo/Courses/sd/links.html shows the project at various stages of development:

  • Othello-0: Thread based animation.
  • Othello-1: Adds concurrent observation and multiple view stragegies.
  • Othello-2: The project is converted to use RMI.
  • Othello-3: Adds input and game state to make a playable game.


13 totally true facts every C++ programmer should know about Java

Java has a lot of similarity with C++. However the designers of Java set out to create a language that is safer, simpler, and more portable than C++.

Here is a quote from Java's original designer, James Gosling, explaining the difference of language philosophy.
JAVA omits many rarely used, poorly understood, confusing features of C++ that in our experience bring more grief than benefit. This primarily consists of operator overloading (although it does have method overloading), multiple inheritance, and extensive automatic coercions.

Below I've tried to highlight some of the major differences.

0. In Java variables hold either values or pointers to objects

In Java data is held in variables and variables hold either values belonging to primitive types or pointers to objects that are on the heap.

Variables include

  • local variables (including parameters),
  • fields of objects,
  • items of arrays,
  • static fields --these are fields that belong to a class rather than an object--, and
  • temporary variables that hold results from method calls and results of operations such as addition or multiplication.
All these sorts of variables can only hold either a value of a primitive type or a pointer to objects that are on the heap. In particular variables can not hold objects themselves.

By the way, in Java terminology, pointers are called references. So, if you are reading a book on Java, you probably won't see pointers mentioned, but lots of mentions of references. When a Java programmer speaks of a reference what they mean is what a C++ programmer would call a pointer.

Just to make things even more confusing, C++ has something C++ programmers call references, which are significantly different from pointers. As we will see, Java has no equivalent to C++'s references.

For the rest of this note, I'm going to use the word pointer to
mean a value that identifies a particular object: i.e., what a C++
programmer calls a pointer and what a Java programmer calls a reference.

1. In Java objects exist only on the heap and so can only be accessed via pointers.

In C++, as you know, objects can be held in local variables or global variables, they can be returned from subroutines as values, they can be passed to subroutines as parameters. None of these things are possible in Java because in Java objects exist only on the heap. Since objects are on the heap, we can never refer to them directly (just as is true of C++ objects on the heap); they are always accessed via pointers.

2. In Java types are either primitive or reference types

There are exactly 8 primitive types in Java and you can't define any others. They are boolean, char, byte, short, int, long, float, and double.

All other types are reference types, which means their values are pointers to objects and a special value 'null'. You can define new reference types by defining new classes and interfaces.

'What about arrays?' you may wonder. Well arrays are objects in Java, so they are accessed through pointers too. And so arrays can only exist on the heap.

3. There is no dereference operator in Java

It's time to look at some code.

This code does the following: (a) It creates a new object of class 'Car' on the heap. (b) It creates a new local variable 'p' of type 'Vehicle' and initializes it with a pointer to the new object. (c) It sends a message 'start' to the object.

In C++ we would write

In C++ the declaration and the use of 'p' include the derefernce operator  '*'; this is an operator that turns a pointer to an object into an object. You can see that, in Java, there is no dereference operator either in the declaration of 'p' or the use of 'p'. In fact in Java, there is no way to write an expression that simply refers to an object; we always refer to objects using pointers.

The best way to look at it is that the arrow operator '->' of C++ translates to a dot '.' in Java.

Let's extend the example:

Here 'q' is a pointer and it is initialized to the value held in 'p', which is a pointer to the object. Then a message is sent to the object that 'q' points to. This is the same object that 'p' points to. So ... both messages go to the same object.

It's important to understand that when we declare a pointer with like this

or this

we are declaring variables that hold pointers to objects, not variables that hold objects.

4. In Java all methods are virtual

In C++ a method (member function) can be either virtual or nonvirtual. Nonvirtual methods are dispatched based on the static type of the expression representing the recipient object (the type that the compiler know the object to have) whereas virtual methods are dispatched based on the dynamic type, i.e. based on the class of the actual object receiving the message. In general the dynamic type can only be known at run-time. In Java, all methods are dispatched based on the type of object that receives the message.

Let's look at an example. Suppose we have the following classes

If the following code is executed

then the version of 'start' in class 'Car' will be executed.

Here are two examples that illustrate why the compiler can not know the dynamic type of an object. Consider the following (rather artificial) example:

In general the compiler can't know what sort of vehicle 'p' will point to on the last line. The decision as to whether to call the 'start' methods of 'Car' or the 'start' method of 'Bicycle' is made at execution time. The most common situation where this comes up is when polymorphic methods are involved. Consider a method

and the code

When 'p' is passed to 'initialize', the 'start' method of 'Car' is called; when 'q' is passed to 'initialize', the 'start' method of 'Bicycle' is called.

The only exception is static methods. But these aren't really methods at all; they are used to send messages to classes, not to objects.

5. There is no explicit deleting of objects in Java

Java's approach to memory management is very different from that of C++. In Java, once an object is no longer accessible to the running program, it becomes eligible for garbage collection. Garbage collection is the process of finding objects that are no longer usable and making the space they occupy available for new objects. Thus there is no need to delete objects in Java. Once an object is no longer needed, just ensure that it is no longer accessible.

Also Java has no equivalent to C++'s destructors. If you want an object to clean up when it is no longer needed, write a method called 'cleanUp' and call it explicitly.

(There is a method in Java that is a little like a destructor. It is called 'finalize' and is called if and when the object is collected by the garbage collection process. It is almost always a bad idea to override this method; don't do it.)

6. Java has one class hierarchy and it's a tree

In C++ classes can inherit from more (or less) than one base class. In Java every class extends exactly one class; that class is called its direct superclass. There is one exception which is the class called 'java.lang.Object'; this class has no direct superclass. Thus the set of classes in a program forms a tree with 'java.lang.Object' at the root of the tree.

By the way it's not always obvious that a class has a direct superclass. Consider the definitions.

The direct superclass of 'Car' is 'Vehicle', but what is the direct superclass of 'Vehicle'? The default direct superclass is 'java.lang.Object'; so that is the direct superclass of 'Vehicle'.

Since every class inherits from 'java.lang.Object', there are a few methods we can count on every object supporting. For example every object has an 'equals' method, a 'toString' method, and a few others.

Why does Java allow only single inheritance between classes? Mostly it is a matter of simplicity. Multiple inheritance has a number of issues that make it complex to implement and confusing for the programmer.

7. Java allows a limited form of multiple inheritance

As mentioned earlier, no class can have more than one direct superclass. This can be inconvenient. Suppose we want to have a type of objects that can be displayed on the screen. We'll call this type 'Displayable' and all classes that inherit from 'Displayable' have a method called 'display' that takes an argument of type 'Screen'. Thus, if 'displayList' is a list of 'Displayable' objects, we can write code like this:

So far so good. But suppose we want 'Car' objects to be displayable, but not 'Vehicle' objects in general. We need 'Car' to inherit from 'Displayable' and from 'Vehicle'.  We could solve this by having Car extend 'Displayable' and 'Displayable' extend 'Vehicle'. Alas, we also want 'Pedestrian' to inherit from 'Displayable', but not from 'Vehicle'. We need multiple inheritance.

The solution is to use interfaces. An interface is a type very much like a class. But interfaces are restricted; interfaces can not have fields, method implementations, or constructors.  A class can only inherit directly from one class, but it can inherit from any number of interfaces. We can solve our problem like this

8. In Java, parameters are always passed by value and results are always returned by value

As mentioned above, variables can only hold values of primitive type (numbers, booleans, chars) or (possibly null) pointers to objects. Since parameters are just a special kind of variable, the same applies to parameters. The same also applies to the results of methods.

There is no concept in Java that corresponds to C++'s references and so there are no reference parameters or reference results.

In C++, passing an object by value will cause the object to be cloned via its copy constructor. In Java we can't pass object by value; all we can do is pass a pointer to the object and in that case (as in C++) there is no explicit copy of the object made. Here is an example. Suppose we have a method

and we call the method like this

For the duration of the method call the parameter 'v' points to the same object as 'p' does. No copy is made.
This is no different from the following in C++

If we want to get the effect of C++'s pass by copy construction, we would have to make an explicit copy like this

For this to work, class 'Car' needs a constructor that takes a pointer to a 'Car' and constructs a copy.

9. There is no operator overloading in Java

Operators in Java (e.g. +, =, and ==) have fixed meanings. You can't overload them to apply to your own class.

There is no need to overload assignment (=), since, as we've seen, variables can't hold objects anyway. If you really want a copy of an object it should be made explicitly.

Similarly you can't use == to compare objects as the value of its operands can not be objects.  If we have

we are comparing pointers, not objects. In this example, the pointers initially point to different objects and so, unless a pointer has been reassigned, they can't possibly be equal. If we want to compare objects, we can use the 'equals' method inherited from 'java.lang.Object'.

For this to work properly, the 'equals' method should be overridden. By default it will simply compute p==q. If we want to say that two cars are identical if they have the same 'vin' field we would override equals as follows.

A special case to look out for is strings since, in Java, strings are objects. The code

is almost certainly wrong. I should have written

10. In Java, arrays are objects too

As mentioned above, arrays in Java are objects too. This means they are always allocated on the heap and always dealt with via pointers. For example, consider the the following sequence of commands:

Here we declare 'a' to be a local variable of type pointer to an array of pointers to objects of type 'Vehicle'. Read that sentence again to be sure you've got it. Next, an array object is created on the heap. Initially the items of the array all have value 'null', that is, they hold the null pointer. Next 'a' is initialized to point to this new object. On line 3, a local variable 'b' is created and initialized to the same value as 'a', that is it points to the same array as 'a' points to. Line 4 creates two 'Car' objects and assigns pointers to them to the items of the array. The last two lines, print the values of the two 'Car' objects. The values of variables 'a' and 'b' point to the same array object and so 'a[0]' and 'b[0]' refer to exactly the same array item.

Arrays really are objects in Java. They have fields and methods. Well one field anyway. For example, consider:

The second last line reads a field of the array. In the last line a message is sent to the array.

By the way, the items of arrays are initialized when the array is allocated (In C++, the items initially have undefined values.) In the previous example, since 'a' is an array of pointers its items are initialized to 'null'. For number types (byte, short, int, long, float, and double) arrays are initialized to zero. For boolean arrays, items are initialized to 'false'. And for char arrays, items are initialized to the character with unicode encoding 0.

11. In Java, array lengths are not part of the type

In C++ we can have a variable whose type is pointer to array of 10 int items.

Such a variable can only contain the pointers to arrays of size 10. In Java the closest we can get to this declaration is

This declares 'm' to be a pointer to an array of ints, but it doesn't specify the size of the array.

12. Java has no 2-dimensional arrays

In C++ we can declare an array whose items are arrays. For example
declares 'm' to point to arrays that have 10 items, each of which is an array of 5 ints.

Java has no 2-dimensional arrays. Nor does it have 3-dimensional arrays. And so on. This is a natural consequence of the fact that objects must be on the heap, they can't be items of arrays. In Java we can declare
This means that 'm' points to an array of pointers, each of which points to an array of integers.
We can see this by running the following code

This will print 'true'. Since the items of an newly created array of pointers are initialized to 'null'.


When I invented inner classes.

Over the years I've invented a number of useful things. Unfortunately, much of the time, someone else did it first. One I'm quite proud of is dynamic register renaming, which I invented in 1989, many years after engineers at IBM had.

A couple of months ago I decided to look at the report I had written as my undergraduate thesis. It was on the design and implementation of an object-oriented language. This was in 1985 and interest in object orientation was just starting to rise after years of O-O being an obscure field with small groups of enthusiasts using SmallTalk and Simula. Nineteen-eighty-five was the year C++ was released commercially by Bell Labs. Even 6 years later, the first edition of Booch's Object Oriented Design with Applications was largely about how to do O-O design and then implement in a non-O-O language such as Ada (which, at the time, was not object-oriented).

Anyway, reading my report, I was struck by the fact that I had invented inner classes. Somewhat dismaying was that I had not only invented inner classes, I had also apparently failed to notice that I had done so: at least they get no special mention in the report. Indeed when Java introduced inner classes in 1997 or 1998, I saw them as a cool new idea, completely missing the fact that I had invented the same thing myself.

Then, a few days ago, I was struck by a very worrying thought: What if I had not only invented inner classes and failed to notice, but had been the first to invent them and failed to publish! After a bit of looking about I realized that this was not the case. Simula seems to have had them since at least 1973 and the Beta language also had them.

You might wonder how I managed to invent something, but not notice. This language was essentially an O-O extension of a language very similar to a statically scoped Lisp -- but with a nicer syntax. Functions were written as lambda-expressions, which evaluated to closures, which carried with them the context in which the lambda expression was evaluated. A context is simply a mapping from identifiers to locations, together with a pointer to a parent context. Thus, within a lambda expression written within another lambda expression, the parameters and other variables of the outer lambda expression could be referred to within the inner lambda expression -- just as in Scheme or Common Lisp. For example, the following defines a function that computes the composition of its arguments

    var compose <= (fn (f, g) (fn (x) f(g(x)) fn) fn);

(The <= means permanent assignment.) Classes were essentially lambda expressions that implicitly returned a copy of their context as their value. Objects were just copies of contexts.  (The advantage of making a copy is that the parent pointer is not needed in the object. Thus, in simple cases, such as an object with no methods, the context and its parent could be potentially garbage collected, even if the object was not. The object and the original context share the same locations of course.) Methods were simply lambda expressions evaluated within a class, and thus, when a method was called, it could refer to the locations of the object which were shared with its context.

    var Counter <= (class {init}
                var k <- init ;
                var incr <= (fn() k <- k+1 fn)
                var get <= (fn () k fn) 
            class) ;
    var counterObj <= Counter{0} ; 

(the <- means assignment. I used curly braces for constructor parameter and argument lists, so that allocation expressions look different from function applications.)

Given this, a class written within a class or a class within a lambda expression just works. It would have been extra work to design and implement the language so that it didn't have inner classes. Unlike Java (but like Scala), there was no restriction that all local variables captured be constants (final in Java lingo). This is because I was using trees of contexts already. Contexts were garbage collected when no longer needed, rather than being explicitly de-allocated like the frames on the Java stack.  I was aware of the inefficiency of this scheme, but figured that a compiler could optimize by keeping variables that could not outlive their function invocation on a stack and all others on the heap.


Genericity, Recursive Data Types, and Inheritance.

A couple of times recently I've wanted to create a generic recursive data type and then inherit from it. For example suppose we wanted to define the concept of a tree as a data type, and then extend it to create trees that bear specific types of data in their nodes. A second example is a graph consisting of nodes and edges. This is a recursive data type in that, from a node, you can follow an edge and find another node. In both cases you'd expect some homogeneity; in a tree, the children of a node should be nodes of the same type. In a graph, if you start at a node and follow an edge, you should arrive at another node of the same type. This homogeneity requirement is where genericity provides a solution.

To illustrate, I'll use the tree example. We'd like to build, say trees of integers, and tree of floating point numbers. These are two types, but as they share something in common: their "treeness". We'd like them to inherit the characteristics common to all trees from a common base type.

The key to making this work is to use bounded genericity.

This abstract class captures the "treeness" of its subclasses. Any object of type Tree<T> has children of type T where T is a type that extends Tree<T>.

Now we can both specialize and concretize to create a type that represents trees of integers

Each object of type IntTree is a tree that has IntTree objects as children and a field of type int.

Equally we could have a tree of floating point variables.

That we can do this is only really interesting if we can write generic code to deal with data (trees in this case) at the generic level of abstraction. And we can. For example here is a class of depth-first tree traversers.

Then we can specialize and concretize this class to create a visitor that finds the sum of the integers in an IntTree.

Finally I'll just briefly describe the systems where these issues came up.
  •  For the hierarchical graph types of the JHigraph system we defined: first, a mutually recursive set of generic interfaces; next, a mutually recursive set of abstract classes; and, finally, a mutually recursive set of abstract classes that use a tag system similar to XML and ensure validity. These three sets of types define the model at various levels of abstraction. Based on the interfaces, we then defined a set of generic view classes and  layout managers for visualizing the nodes and edges of the graph. An application can be built by specializing and concretizing either set of generic, abstract node and edge classes to hold the appropriate kind of data and by specializing and concretizing the view classes.
  • The second application is a language processor in which the parser produces an abstract syntax tree and then the type checker produces a typed abstract syntax tree. The plain and the typed abstract syntax tree types share a common generic superclass.