2016-06-30

How to do redo, and undo revisited

Introduction

An earlier post looked at 5 techniques for implementing undo/redo.

Here I want to take another look the problem and look at a solution based on immutable data structures.

Essentially it is a check-pointing solution, but by using an immutable data structure there is essentially a zero time cost to making a checkpoint.

Nodes, paths, and trees


The running example from the earlier post used a tree as the data structure with the following interface.

public interface TreeI {
    NodeI makeNode( String text ) ;


    NodeI getRoot() ;

    void insert( NodeI parent, int posn, NodeI child ) ;

    void remove( NodeI parent, int posn ) ;
}



Our immutable version will be a bit different. We start with nodes. We want to have nodes that can't change. Our node class only has constructors and accessors, not mutators.


public class Node {
   
public Node ( String text, List<Node> children ) { ... }

    int getChildCount() { ... }


    Node getChild( int i) { ... }

    String getLabel() { ... }

}


We can identify any node in a tree by its path from the root. For example, [] represents the root of a tree, [0] represents the first child of the root, [0,1] represents the second child of the first child of the root, and so on. For this scheme to make sense, the tree can never be empty; there will always be a root node. Now we change the tree interface to

public interface TreeI { 

    public Node getRoot( ) ;

    public void replace( Path p, int i, int j, List<Node> newNodes );



    public void changeLabel( Path p, String newLabel) ;

    .
    .
    .
}

The replace function replaces all children of the given node from position i up to (but not including) position j with a list of new nodes. The changeLabel method changes the label on a node. Both these functions work by creating new nodes as needed. For example replace( p, 0, 1, ns ) where p is the list [0,1] will build three new nodes, all the other nodes of the new tree (apart from possibly the ones in list ns) will be shared with the old tree.

The interface is implemented by a class

public class Tree implements TreeI { 

    protected Node root = new Node( "", new ArrayList<Node>() ) ;

    public Node getRoot( ) { return root }

    public void replace( Path p, int i, int j, List<Node> newNodes ){ ... }



    public void changeLabel( Path p, String newLabel) { ... }

    .
    .
    .
}

Implementing undo and redo.

Once we have an immutable data structure undo and redo are very simple. The main difference compared with the checkpointing solution of the earlier post, is that we don't make copies of the model.

public class TreeWithCheckpointing extends Tree {

    Stack<Node> olderTrees() = new Stack<Node>() ;

    Stack<Node> newerTrees() = new Stack<Node>() ;

    public void checkpoint() {
        olderTrees.push( root ) ;
        newerTrees.clear() ; }

    public void undo() {
        if( olderTrees.size() > 0 ) {
            newerTrees.push( root ) ;
            root = olderTrees.pop() ; } }

    public void redo() {
        if( newerTrees.size() > 0 ) {
            olderTrees.push( root ) ;
            root = newerTrees.pop() ; } }

}
  

Merits

  • This approach is very simple and fool proof. We don't need that the model be copyable, as we only copy pointers.  
  • There may other advantages to using immutable data structures. For the system where I used this, the main reason for using immutable data structures was to enforce invariants. If there are no mutators, we only have to worry about object invariants at node construction time. Another advantage is in reusing view object; we can make a cache mapping nodes to view objects, since the nodes never change, the view objects are sure to be accurate representations..
  • Because there is no copying, it is more efficient than the checkpointing solution with mutable data.

Demerits

  • The main disadvantage is the awkwardness of an immutable data structure. It works well for trees, but for an arbitrary graph we have to give up on using ordinary pointers to represent edges.

2015-05-28

Take Back Control. GUIs and other event driven programming without inversion of control.


GUI programming and inversion of control

Writing graphical user interfaces can be a tricky business. Typically we set up a set of event handlers. Each handler waits for its event and based on the current state moves the state to some other state. This arrangement is commonly called "inversion of control".

This is fine for simple GUIs, but once the system gets past a certain level of complexity it is rife with opportunities for error.  What if an event happens in a state that we didn't anticipate. What if we forget to disable events? Once we add in asynchronous IO such as HTTP requests, new complexities arise.

One solution is to carefully draw out a state machine --perhaps using state charts-- and use the state pattern to ensure that states are entered and and exited cleanly and that transitions only happen when we are in the correct state.

However state machines are at least one level of abstraction down from the usual structured programming design notations such as sequential composition (f followed by g), if-then-else, while-do. It's a lot like programming with only conditional goto statements.  Furthermore algorithmic abstraction (subroutines) are not available with state machines, unless you want to get in to pushing states on a stack. What's worse, state machines are a level up from the code you write. You might have a lovely state machine drawn on a white board, but once you've translated it to code all you have is an unstructured pile of code. Even if you keep that code well organized, for someone else to really understand that code, they'll need to translate it back to a nice picture. The solution is to keep the code and a diagram of your state machine in sync, but we all know that code and its documentation tend to drift apart. Furthermore, state machines do not correspond well to use-cases, which makes it harder to verify that your design meets all the use cases and only the use cases.

So, if you start from use cases, there are two design steps to take: translate from use cases to a state machine; translate from the state machine to event driven code.  Each step changes the paradigm and each is a potential source of error. Furthermore, if you are maintaining code, the first thing to do is to understand what it does. To reconstruct the use-case model from event driven code requires reverse engineering the same steps: translate from event driven code to a state diagram; translate the diagram back to use cases.

So my aim was to allow a more direct translation from use cases to code: a way to avoid inversion of control.

Reverting control

When my students first design GUI programs, I have them create sequence diagrams for their software. Often the diagrams come back looking something like this -- translated back into code:
    function main() {
        displayWindow() ;
        while( true ) {
            waitForUserToClick() ;
            reactToClick() ; }
    }
This makes a lot of sense if you consider that, in all their previous courses, they've been writing console applications where input comes only when it is asked for.  It also makes sense if you consider that a GUI is --in essence-- a set of sequences of interactions and the student already know a language for expressing sets of sequences of actions, namely the ordinary programming constructs of sequential composition, choice, iteration, and function calls.

My contention is that these student, who might seem naive to those of us used to writing event driven GUIs, are on to something worth looking into.  Is there a way to realize a GUI without inversion of control?

The goal is to be able to describe the user experience in a simple and executable notation. Here is an example of such a notation
    displayWindow >
    loop(
        await( click( b ) && reactToClick )
    )
Here b is an object representing a button, displayWindow and reactToClick are functions representing changes to some model and/or view, and loop, await, click, > and && are functions and methods from the library I'll be describing in the next section.

The tbc library

The remainder of this post describes an experimental library named tbc (for "Take Back Control"), written in Haxe to explore these ideas. The language I use below is Haxe, but everything could be translated into just about any language that has lambda expressions and closures with lexical scoping, for example JavaScript, Scala and many many others.

The library is structured as several modules, so far these are tbc.TBC, tbc.TBCTime, and tbc.TBCHTML. Unless otherwise mentioned all names are exported from either the TBC module, or are static functions of the TBC.TBC class. It can be found at https://github.com/theodore-norvell/tbc .

I've been thinking in terms of web development, which is why there is a  TBCHTML, but one could also have a TBCSwing or some such. One of the nice things about Haxe is that it can be translated into lots of useful languages such as JavaScript, Java, C++, and so on. The core TBC module is target independent -- can be used regardless of the target language; TBCTime works with some targets but not all; TBCHTML is specific to JavaScript.

Processes

Central to the library is a generic data type, Process<A>, which is the type of all processes that compute values of type A. For example an object of type Process<Int> is an object that can compute an integer value.  Executing a process may cause side effects to happen; side effects include changes to state, output, input, and delays; delays include waiting for input and waiting for time to pass.

We can execute a process using its .go method. This method causes the process to execute, but there is nothing in the semantics of .go that says that it will finish immediately, it might finish after some time, and, in particular, after some interaction with the user.  The argument of .go is a void function that takes an A. For example if p is an object of type Process<Int> then
    p.go( function( x : Int ) { trace(x) ; } ) ;
will cause p to start to execute and, when it has finished, the Int it produces is sent to the trace routine. The argument
    function( x : Int ) { trace(x) ; }
is an anonymous function or lambda expression.  The trace function is a standard Haxe function for outputting a message to a log.

One particularly useful process is unit(x), which is a process that computes value x immediately with no delay and does nothing else. So
    unit(42).go( function( x : Int ) { trace(x) ; } ) ;
is a complicated way to output 42 to the log.

As an abbreviation skip() stands for unit(null). skip() has type Process<Triv> and Triv is a type that has null as its only value. So skip() is a process that has no side effects and computes an informationless value.

If f is a function of type Void->A (i.e., a function with 0 arguments that returns a value of type A), then exec(f) is a Process<A> that executes by calling f, and the result of the process is the result of f. The exec function is useful for inserting side effects, such as changing or reading global state, into the computation.

Another kind of side effect is to spend time doing nothing. In the TBCTime module, there is process function pause so that pause(n) is a Process<Triv> that completes after about n milliseconds.

We can combine processes in a number of ways.  p.sc(q) is a process that, when executed, executes process p, throws away its result and then executes process q; the value computed by p.sc(q) is the value computed by the execution of q.  As an example
    pause(1000).sc( exec( hello ) )
is a process that executes function hello after a delay of 1 second.

You can think of sc as standing for either "sequential composition" or for "semicolon". Sequential composition is associative
    p.sc(q.sc(r))  =  p.sc(q).sc(r)
And skip() and unit(x) (regardless  of x) are left units of sc
    p  =  skip().sc(p)  =  unit(x).sc(p)
Furthermore skip() is a  right unit
    p  =  p.sc( skip() )
provided p is of type Process<Triv> .

As a convenience the > operator is overloaded to mean sequential composition. So
    p > q > r
abbreviates
    p.sc( q ).sc( r ) 
The laws are
    (p > q) > r  =  p > (q > r)
    p  =  skip() > p  =  unit(x) > p
    p  =  p > skip()
provided (for the last law only) that p is of type Process<Triv> .

Besides sequential composition, we can imitate other sequential programming constructs. A process ite( p, q, r), where p is of type Process<Bool> and q and r are both of type Process<A> (for any type A) is a Process<A> that mirrors if-then-else statements. And loop(p) is a Process<Triv> that is equivalent to an infinitely repeated sequential composition of p with itself. And so on.

A useful generalization of sequential composition is bind. The problem with sequential composition is that it throws away the result of the left process. Bind solves this problem. If  p is a Process<A> and f is a function from type A to type Process<B>,  then p.bind(f) is a Process<B>. To execute p.bind(f), first p is executed, then its result is fed to f, finally the resulting process is executed, delivering the result. bind is the familiar bind (or flatmap) operation from monads.  As an abbreviation p.bind(f)can be written by p >= f. Bind and unit obey the usual monad laws
    (p >= g) >= h  =  p >= function(y) { return g(y) >= h ; }
    p >= unit  =  p
    unit(x) >= f  =  f(x)
provided that y is not free in either g or h. The first law is not used much in practice exactly because often y is free in g or h. People tend to use bind as a combination of sequential composition and local variable declaration. So you might see code like this
    p >= function( x ) { return
    ...x... >= function(  y ) { return
    unit(...x...y...) ; } ; }
This is so common in Haskell that there is a special notation for it
    do x <- p
       y <- ...x...
       return (...x...y...)

If we were to define the operation f*g to mean function(x) { return f(x).bind(g); } (where x is not free in g), then the three laws concerning bind can be written in a very familiar way
    (f*g)*h  =  f*(g*h)
    f*unit  =  f
    unit*f  =  f 
In practice >= it is easier to use than *, and so we define >= but not *.

Much has been written about monads elsewhere, which means I don't need to write much here.  I'll just note that p.sc(q) is defined in terms of bind to be
    p.bind(function(x) { return q ; } )  
And so
    (p > q) >= f  =  p > (q >= f)
    (p >= f) > q  =  p >= function (y) { return f(y) > q; }
provided that y is not free in f or q. Furthermore loop(p) is defined to be
    p.bind( function( a : A ) { return loop(p) ; } ) 

In Haxe, >= and > are at the same level of precedence and associate to the left. For example
    p > q >= f > r >= g
means
    (((p > q) >= f) > r) >= g
which means
    p.sc(q).bind(f).sc(r).bind(g)

[Note for Haskell programmers: Just remember that Haskell's >> is TBC's > and Haskell's >>= is TBC's >=; i.e. drop a > from the Haskell notation to get the TBC notation.  There is probably some way to use Haxe's macro system to get something like Haskell's do notation, but I don't know Haxe's macro system (yet).]

Parallelism

If p is a Process<A> and q is a Process<B> object, par(p,q) is a Process<Pair<A,B>> that, when executed, executes both p and q, possibly interleaving their various side-effects. Whenever both are delayed, either one may be the next to proceed. Note that everything actually runs on one thread.

For example
    par(
            exec(outX) > pause(2000) > exec(outX)
    ,
            exec(outY) > pause(2000) > exec(outY)
    )
is a process that executes functions outX and outY twice each. The first outX and outY are immediate, while then latter two are executed about 2 seconds later. In theory, any of the interleavings
    outX outY outX outY
    outX outY outY outX
    outY outX outX outY
    outY outX outY outX
are possible.

Fork-and-join parallelism is also possible, although I haven't implemented it yet.

Guards and Guarded Commands

An object of type Guard<E> represents some category of input events that might happen in the future. For example, if b is a button element on a web page, then click(b) is a guard representing clicks on the button. click is in the TBCHTML module. The type parameter E is the type of objects representing information about the events, for example the coordinates of a mouse click. The type click(b), if b is an HTML button element, is Guard<Event>, where Event is the type of DOM events in HTML.

If g is a Guard<E> and function f is of type E -> Process<A>, then g.guarding(f) is a GuardedProcess<A>. A guarded process might be thought of as a process that starts with an event.

A method await turns a set of guarded processes into a process. So if m0 = g0.guarding(f0) and m1  = g1.guarding(f1) are guarded processes of the same type, then await(m0,m1) is a process. The process await(m0,m1) is executed as follows: First time passes until one of the events associated with m0 or m1 happens. Say an event that g0 represents happens, then execution continues with the execution of f0(e0), where e0 is the information associated with the event.

When the process await(m0,m1) begins, the event categories associated with m0 and m1 are enabled. In the implementation, this may mean installing event handlers and changing the appearance of GUI widgets. When one of the events fires, the event categories are disabled before execution proceeds with the corresponding guarded process. If both m0 and m1 are triggered by the same event, only one of them proceeds; which one is up to the implementation.

Here is an example of guarding
    loop(
            await( click( b0 ).guarding( out0 ) ) >
            await(
                click( b1a ).guarding( out1A )
            ,
                click( b1b ).guarding( out1B )
            ) >
           await( click( b2 ).guarding( out2 ) )
        )
Here the b0, b1a, b2a, and b2 are button elements on an HTML page and out0, out1A, out1B, and out2 are functions returning processes that make changes to the page.  The interaction with the user proceeds as follows.
  • b0 becomes enabled.
  • When b0 is clicked, it becomes disabled and buttons b1a and b1b become enabled.
  • When one of these two buttons is clicked, both become disabled and b2 becomes enabled.
  • When b2 is clicked, it becomes disabled and the process continues from the start.
The use case scenario corresponding to this process is
  1. The user clicks on b0
  2. The system makes change out0
  3. The user clicks either b1a or b1b
  4. If the user clicked on b1a
    • The system makes change out1a
  5. Else the user clicked on b1b
    • The system makes change out1b
  6. The user clicks on b2
  7. The system responds with change out2
  8. Back to 1.
Of course a real use case would explain what out0 etc are, but none of that matters to us. The point is the close correspondence between the use case and the code.

The >> operator is overloaded to mean guarding, so the last example can be written as
    loop(
        await( click( b0 ) >> out0 ) >
        await(
            click( b1a ) >> out1A
        ,
            click( b1b ) >> out1B 
        ) >
        await( click( b2 ) >> out2 )
    )

In some case we might want to throw away the event information. This is easily done using
    g >> function( e : E) { return p ; }
which is a GuardedProcess<A> if g is a Guard<E> and p is a Process<A>.  For example we might want to simply wait for a click on button b. This Process<Triv> will do that
    await( click(b) >> function( e : Event ) return skip() ; } )
As a convenience, guards have a .andThen method to handle this situation less verbosely. We can write
    await( click(b).andThen( skip() ) )
or, using operator overloading,
    await( click(b) && skip() )
Since && has lower precedence than > and >= we can write
    g && p > q >= f
which will be parsed as
    g && ((p > q) >=f) 

The .sc and .bind methods can also be applied to guarded processes on their left. Thus
    g >> m >= f
which is parsed as
    (g >> m) >= f
means what you would expect: when g is fired, the event is sent to m and then the result, a, of executing m(e) is sent to f and the result of executing f(a) is the result of the whole thing. The following associativity law.
    (g >> m) >= f  =  g >> function(e) { return m(e) >= f ; }
holds provided e is not free in m or f. Likewise
    (g >> m) > p  =  g >> function(e) { return m(e) > p ; }
holds provided e is not free in m or p. Furthermore
    (g && p) > q  =  g && (p > q)
and
    (g && p) >= m  =  g && (p >= m)


[Note on the notation. The && operator was chosen most because of its low precedence.  Furthermore | could combine guards, while || could combine guarded processes. So it is nice that && has lower precedence than | and higher precedence than ||.  (But note that >> has higher precedence than |.)]

[Another notational note: It is tempting to use > for .andThen and >= for .guarding.  Advantage: fewer operators to remember. Disadvantage: this notation obscures the types, making code less readable. For example, if you read a && p, it is clear that a is a guard and the result is a guarded process. But, with the alternate notation, if you read a > p, the type of a could be guard or process or guarded process and the type of the result can only be determined after determining the type of a. Also, with the alternative notation, skip() is not always a right unit of >, nor is unit always a right unit of >=.]

Guards can also be used for time events. The timeout guard fires if it's been enabled for more than a certain amount of time. For example, if the user fails to respond in a timely manner, we might want to move on. Process nag(null), defined like this
    function nag(triv : Triv) : Process<Triv>{ return
        await(
                click( b ) && exec(thanKTheUser)
        ,
                timeout( 1000 ) && exec(nagTheUser) >= nag
        ) ; }     ,
nags the user every second until the button is clicked.

In fact pause( t ) is defined by
    pause( t )  =  await( timeout( t ) && skip() )

Summary of the core operations

Suppose that
  • p is of type Process<A>
  • q is of type Process<B>
  • f is of type A -> Process<B> 
  • h is of type A -> C
  • g is of type Guard<E>
  • m is of type E -> Process<A>
  • gp, gp0, gp1, ..., gpn are of type GuardedProcess<A>
Meaning Syntax Type Operator syntax
Do p, then do q. p.sc( q ) Process<B> p > q
Do p, then do f(x), where x is the result of p. p.bind( f ) Process<B> p >= f
Make a guarded process that does p when fired. g.andThen( p ) GuardedProcess<A> g && p
Make a guarded process that does m(e) when fired, where e represents information about the event. g.guarding( m ) GuardedProcess<A> g >> m
Make a guarded process similar to gp, but that does that does q after gp has executed. gp.sc( q ) GuardedProcess<B> gp > q
Make a guarded process similar to gp, but that does that does f(a) after gp has executed, where a is the result of executing gp. gp.bind( f ) GuardedProcess<B> gp >= f
Fire one of these guarded processes. await( gp0, gp1, ..., gpn ) Process<A>
Do p again and again loop( p ) Process<Triv>
Do both p and q. par( p, q ) Process<Pair<A,B>>
Start process p. p.go( h ) Void

Related ideas

When I first looked into this a few years ago, I found Martin Odersky and Philipp Haller's 2006 paper, "Event-based programming without inversion of control", which introduced Scala's Actors library. The paper mentions user interfaces as an area for future work.  Recently I came back to this question, and the above documents some preliminary experimentation.

In addition to Odersky and Haller's actors, some of the inspirations for TBC include process algebras, such as Tony Hoare's CSP and Robin Milner's CCS; (extended) context-free grammars; parsing combinators.; monads as used in Haskell and more recently in other languages such as JavaScript; JavaScript's promises; and the command pattern as documented by Erich Gamma and company.

Further work

I'm happy with the basic abstractions of process, guard, and guarded process. However, the above is just an early exploration. There is much more to consider including
  • Internal events for communication between parallel processes. Processes can already communicate via shared global state. However this does not allow synchronization between processes. For that and to avoid using shared state, synchronous messages, asynchronous messages and rendezvous are the classic approaches.
  • Communication with external entities other than the user. In the HTML context, this includes XML HTTP requests.
  • Failures and recovery.
    • One approach is that exception catching can be emulated by means of a second version of bind. While bind says what to do next in case of success, onFail says what to do in case of a failure. We give processes a second type parameter so that Process<A,B> is a process that succeeds with a value of type A or fails with a value of type B. If p is of type Process<A,B> then and f is of type A -> Process<C,B>, then p.bind( f ) is of type Process<C,B>. Failure can happen either within p or within the result of f.  Symmetrically,  if g is a function of type B -> Process<A,D> then p.onFail( g ) is of type Process<A,D>. Here if p fails, there is a chance of recovery, should the result of g succeed. This is just like try{ p }catch( e ){ g(e) }. Note the complete symmetry between succeeding and failing.
    • Those familiar with JavaScript's Promise type will note that there is little original here.
    • Those familiar with monads will note that Process<A,B> is monad in two ways. I.e. we can fix either A or B to get the functor part of a monad.
    • The problem is that I'm not a big fan of try-catch, so before going down this road, I'd like to think about alternative ways to deal with failure and recovery.

2015-03-11

A simpler RMI tutorial for Java

Oracle corp has a tutorial on RMI which is quite involved. There are 4 different kinds of processes working together so that two of them can talk: A server, a client, an rmiregistry process, and a web server to serve class files.  There are also security policies for both the server and the client.

My purpose here is to strip RMI down to the simplest possible set up. In particular

  • No security policy is used.
  • All code is in copies a single jar file.
  • There is no dynamic code loading.
  • No separate rmiregistry process is needed.
This brings the number of kinds of processes down to 2, reduces the number of command-line options needed to zero, and reduces the number of security policy files needed to zero. It also simplifies distribution.

There are no doubt some advantages to using an external registry process and dynamic code loading. However for many applications they just aren't needed.

The server code

My example code is an Othello game. The main server class is called RemoteGameModel. This class records the state of the game; I used the observer pattern to alert observers of any change to the state of the game. The observers are typically running in other JVM instances, possibly on other machines.

The hookup of the observers to the RemoteGameModel is initiated by the observers; they will look up the RemoteGameModel using a registry. Thus after creating the RemoteGameModel in the server we need to register it in a registry running at a known location.  This is done with a few lines of code in a main method in class ServerMain:

First we create the observable model. The RemoteGameModel classs extends UnicastRemoteObject, which means that as part of being constructed, the object is exported. Being exported is important since it means that when the object is passed to a remote object, a stub will be made for it and passed instead.

Second, we need a registry. Rather than using an rmiregistry process, it is easier to create a registry in the same JVM as the server object resides. This means the registry and the server must be on the same host. The port number used should be unique to your application.

Third, the RemoteGameModel is registered with the registry.

The next two lines report the URI that can be used to refer to the RemoteGameModel object.

There are a few things that can go wrong, so it's important to wrap the above code in try-catch.

The client code

On the client side, we need to obtain a reference to the RemoteGameModel which is on another machine. In RMI we ask the registry and the registry sends back a proxy (stub) object representing the server. The stub implements the RemoteGameModelInterface, just as does the actual RemoteGameModel.

The following code is in the main method of class ClientMain:
First set a string to be the URI for the object.

Then the static method Naming.lookup returns the stub.

Once we have a reference to this object we can use it to send messages to the server object. In my code I use it to build an observer in the client. I pass the reference to the proxy into the constructor of the observer object. The last thing the observer does is to send addObserver(this) to the proxy. As the observer object's class extends UnicastRemoteObject, this sends a proxy for the observer over to the servers machine, where it is added to the server's list of observers.

Deploying and running the code

The code can now be tested simply by starting a JVM process running ServerMain.main and a number of processes running ClientMain.main.

To make deployment easier, it is a good idea to put all files in a .jar file. I called mine othello-2.jar. Then the server can be started from the command line with the command
java -cp othello-2.jar othello.ServerMain
And the clients can be run from the command line with the command
java -cp othello-2.jar othello.ClientMain

Further information and source code

Further information about the techniques used in the Othello project can be found at  http://www.engr.mun.ca/~theo/Courses/sd/notes.html under "Animation with Threads" and "RMI and remote, concurrent observers". The source code at http://www.engr.mun.ca/~theo/Courses/sd/links.html shows the project at various stages of development:

  • Othello-0: Thread based animation.
  • Othello-1: Adds concurrent observation and multiple view stragegies.
  • Othello-2: The project is converted to use RMI.
  • Othello-3: Adds input and game state to make a playable game.

2014-12-29

13 totally true facts every C++ programmer should know about Java

Java has a lot of similarity with C++. However the designers of Java set out to create a language that is safer, simpler, and more portable than C++.

Here is a quote from Java's original designer, James Gosling, explaining the difference of language philosophy.
JAVA omits many rarely used, poorly understood, confusing features of C++ that in our experience bring more grief than benefit. This primarily consists of operator overloading (although it does have method overloading), multiple inheritance, and extensive automatic coercions.

Below I've tried to highlight some of the major differences.

0. In Java variables hold either values or pointers to objects

In Java data is held in variables and variables hold either values belonging to primitive types or pointers to objects that are on the heap.

Variables include

  • local variables (including parameters),
  • fields of objects,
  • items of arrays,
  • static fields --these are fields that belong to a class rather than an object--, and
  • temporary variables that hold results from method calls and results of operations such as addition or multiplication.
All these sorts of variables can only hold either a value of a primitive type or a pointer to objects that are on the heap. In particular variables can not hold objects themselves.



By the way, in Java terminology, pointers are called references. So, if you are reading a book on Java, you probably won't see pointers mentioned, but lots of mentions of references. When a Java programmer speaks of a reference what they mean is what a C++ programmer would call a pointer.



Just to make things even more confusing, C++ has something C++ programmers call references, which are significantly different from pointers. As we will see, Java has no equivalent to C++'s references.



For the rest of this note, I'm going to use the word pointer to
mean a value that identifies a particular object: i.e., what a C++
programmer calls a pointer and what a Java programmer calls a reference.

1. In Java objects exist only on the heap and so can only be accessed via pointers.

In C++, as you know, objects can be held in local variables or global variables, they can be returned from subroutines as values, they can be passed to subroutines as parameters. None of these things are possible in Java because in Java objects exist only on the heap. Since objects are on the heap, we can never refer to them directly (just as is true of C++ objects on the heap); they are always accessed via pointers.

2. In Java types are either primitive or reference types

There are exactly 8 primitive types in Java and you can't define any others. They are boolean, char, byte, short, int, long, float, and double.



All other types are reference types, which means their values are pointers to objects and a special value 'null'. You can define new reference types by defining new classes and interfaces.



'What about arrays?' you may wonder. Well arrays are objects in Java, so they are accessed through pointers too. And so arrays can only exist on the heap.

3. There is no dereference operator in Java

It's time to look at some code.

This code does the following: (a) It creates a new object of class 'Car' on the heap. (b) It creates a new local variable 'p' of type 'Vehicle' and initializes it with a pointer to the new object. (c) It sends a message 'start' to the object.

In C++ we would write

In C++ the declaration and the use of 'p' include the derefernce operator  '*'; this is an operator that turns a pointer to an object into an object. You can see that, in Java, there is no dereference operator either in the declaration of 'p' or the use of 'p'. In fact in Java, there is no way to write an expression that simply refers to an object; we always refer to objects using pointers.



The best way to look at it is that the arrow operator '->' of C++ translates to a dot '.' in Java.



Let's extend the example:

Here 'q' is a pointer and it is initialized to the value held in 'p', which is a pointer to the object. Then a message is sent to the object that 'q' points to. This is the same object that 'p' points to. So ... both messages go to the same object.

It's important to understand that when we declare a pointer with like this

or this

we are declaring variables that hold pointers to objects, not variables that hold objects.

4. In Java all methods are virtual

In C++ a method (member function) can be either virtual or nonvirtual. Nonvirtual methods are dispatched based on the static type of the expression representing the recipient object (the type that the compiler know the object to have) whereas virtual methods are dispatched based on the dynamic type, i.e. based on the class of the actual object receiving the message. In general the dynamic type can only be known at run-time. In Java, all methods are dispatched based on the type of object that receives the message.



Let's look at an example. Suppose we have the following classes

If the following code is executed

then the version of 'start' in class 'Car' will be executed.



Here are two examples that illustrate why the compiler can not know the dynamic type of an object. Consider the following (rather artificial) example:

In general the compiler can't know what sort of vehicle 'p' will point to on the last line. The decision as to whether to call the 'start' methods of 'Car' or the 'start' method of 'Bicycle' is made at execution time. The most common situation where this comes up is when polymorphic methods are involved. Consider a method

and the code

When 'p' is passed to 'initialize', the 'start' method of 'Car' is called; when 'q' is passed to 'initialize', the 'start' method of 'Bicycle' is called.




The only exception is static methods. But these aren't really methods at all; they are used to send messages to classes, not to objects.

5. There is no explicit deleting of objects in Java

Java's approach to memory management is very different from that of C++. In Java, once an object is no longer accessible to the running program, it becomes eligible for garbage collection. Garbage collection is the process of finding objects that are no longer usable and making the space they occupy available for new objects. Thus there is no need to delete objects in Java. Once an object is no longer needed, just ensure that it is no longer accessible.



Also Java has no equivalent to C++'s destructors. If you want an object to clean up when it is no longer needed, write a method called 'cleanUp' and call it explicitly.



(There is a method in Java that is a little like a destructor. It is called 'finalize' and is called if and when the object is collected by the garbage collection process. It is almost always a bad idea to override this method; don't do it.)

6. Java has one class hierarchy and it's a tree

In C++ classes can inherit from more (or less) than one base class. In Java every class extends exactly one class; that class is called its direct superclass. There is one exception which is the class called 'java.lang.Object'; this class has no direct superclass. Thus the set of classes in a program forms a tree with 'java.lang.Object' at the root of the tree.



By the way it's not always obvious that a class has a direct superclass. Consider the definitions.

The direct superclass of 'Car' is 'Vehicle', but what is the direct superclass of 'Vehicle'? The default direct superclass is 'java.lang.Object'; so that is the direct superclass of 'Vehicle'.



Since every class inherits from 'java.lang.Object', there are a few methods we can count on every object supporting. For example every object has an 'equals' method, a 'toString' method, and a few others.



Why does Java allow only single inheritance between classes? Mostly it is a matter of simplicity. Multiple inheritance has a number of issues that make it complex to implement and confusing for the programmer.

7. Java allows a limited form of multiple inheritance

As mentioned earlier, no class can have more than one direct superclass. This can be inconvenient. Suppose we want to have a type of objects that can be displayed on the screen. We'll call this type 'Displayable' and all classes that inherit from 'Displayable' have a method called 'display' that takes an argument of type 'Screen'. Thus, if 'displayList' is a list of 'Displayable' objects, we can write code like this:

So far so good. But suppose we want 'Car' objects to be displayable, but not 'Vehicle' objects in general. We need 'Car' to inherit from 'Displayable' and from 'Vehicle'.  We could solve this by having Car extend 'Displayable' and 'Displayable' extend 'Vehicle'. Alas, we also want 'Pedestrian' to inherit from 'Displayable', but not from 'Vehicle'. We need multiple inheritance.



The solution is to use interfaces. An interface is a type very much like a class. But interfaces are restricted; interfaces can not have fields, method implementations, or constructors.  A class can only inherit directly from one class, but it can inherit from any number of interfaces. We can solve our problem like this


8. In Java, parameters are always passed by value and results are always returned by value

As mentioned above, variables can only hold values of primitive type (numbers, booleans, chars) or (possibly null) pointers to objects. Since parameters are just a special kind of variable, the same applies to parameters. The same also applies to the results of methods.



There is no concept in Java that corresponds to C++'s references and so there are no reference parameters or reference results.



In C++, passing an object by value will cause the object to be cloned via its copy constructor. In Java we can't pass object by value; all we can do is pass a pointer to the object and in that case (as in C++) there is no explicit copy of the object made. Here is an example. Suppose we have a method

and we call the method like this

For the duration of the method call the parameter 'v' points to the same object as 'p' does. No copy is made.
This is no different from the following in C++



If we want to get the effect of C++'s pass by copy construction, we would have to make an explicit copy like this

For this to work, class 'Car' needs a constructor that takes a pointer to a 'Car' and constructs a copy.


9. There is no operator overloading in Java

Operators in Java (e.g. +, =, and ==) have fixed meanings. You can't overload them to apply to your own class.



There is no need to overload assignment (=), since, as we've seen, variables can't hold objects anyway. If you really want a copy of an object it should be made explicitly.



Similarly you can't use == to compare objects as the value of its operands can not be objects.  If we have

we are comparing pointers, not objects. In this example, the pointers initially point to different objects and so, unless a pointer has been reassigned, they can't possibly be equal. If we want to compare objects, we can use the 'equals' method inherited from 'java.lang.Object'.

For this to work properly, the 'equals' method should be overridden. By default it will simply compute p==q. If we want to say that two cars are identical if they have the same 'vin' field we would override equals as follows.



A special case to look out for is strings since, in Java, strings are objects. The code

is almost certainly wrong. I should have written



10. In Java, arrays are objects too

As mentioned above, arrays in Java are objects too. This means they are always allocated on the heap and always dealt with via pointers. For example, consider the the following sequence of commands:

Here we declare 'a' to be a local variable of type pointer to an array of pointers to objects of type 'Vehicle'. Read that sentence again to be sure you've got it. Next, an array object is created on the heap. Initially the items of the array all have value 'null', that is, they hold the null pointer. Next 'a' is initialized to point to this new object. On line 3, a local variable 'b' is created and initialized to the same value as 'a', that is it points to the same array as 'a' points to. Line 4 creates two 'Car' objects and assigns pointers to them to the items of the array. The last two lines, print the values of the two 'Car' objects. The values of variables 'a' and 'b' point to the same array object and so 'a[0]' and 'b[0]' refer to exactly the same array item.



Arrays really are objects in Java. They have fields and methods. Well one field anyway. For example, consider:

The second last line reads a field of the array. In the last line a message is sent to the array.



By the way, the items of arrays are initialized when the array is allocated (In C++, the items initially have undefined values.) In the previous example, since 'a' is an array of pointers its items are initialized to 'null'. For number types (byte, short, int, long, float, and double) arrays are initialized to zero. For boolean arrays, items are initialized to 'false'. And for char arrays, items are initialized to the character with unicode encoding 0.


11. In Java, array lengths are not part of the type

In C++ we can have a variable whose type is pointer to array of 10 int items.

Such a variable can only contain the pointers to arrays of size 10. In Java the closest we can get to this declaration is

This declares 'm' to be a pointer to an array of ints, but it doesn't specify the size of the array.


12. Java has no 2-dimensional arrays

In C++ we can declare an array whose items are arrays. For example
declares 'm' to point to arrays that have 10 items, each of which is an array of 5 ints.

Java has no 2-dimensional arrays. Nor does it have 3-dimensional arrays. And so on. This is a natural consequence of the fact that objects must be on the heap, they can't be items of arrays. In Java we can declare
This means that 'm' points to an array of pointers, each of which points to an array of integers.
We can see this by running the following code

This will print 'true'. Since the items of an newly created array of pointers are initialized to 'null'.

2013-03-15

When I invented inner classes.

Over the years I've invented a number of useful things. Unfortunately, much of the time, someone else did it first. One I'm quite proud of is dynamic register renaming, which I invented in 1989, many years after engineers at IBM had.

A couple of months ago I decided to look at the report I had written as my undergraduate thesis. It was on the design and implementation of an object-oriented language. This was in 1985 and interest in object orientation was just starting to rise after years of O-O being an obscure field with small groups of enthusiasts using SmallTalk and Simula. Nineteen-eighty-five was the year C++ was released commercially by Bell Labs. Even 6 years later, the first edition of Booch's Object Oriented Design with Applications was largely about how to do O-O design and then implement in a non-O-O language such as Ada (which, at the time, was not object-oriented).

Anyway, reading my report, I was struck by the fact that I had invented inner classes. Somewhat dismaying was that I had not only invented inner classes, I had also apparently failed to notice that I had done so: at least they get no special mention in the report. Indeed when Java introduced inner classes in 1997 or 1998, I saw them as a cool new idea, completely missing the fact that I had invented the same thing myself.

Then, a few days ago, I was struck by a very worrying thought: What if I had not only invented inner classes and failed to notice, but had been the first to invent them and failed to publish! After a bit of looking about I realized that this was not the case. Simula seems to have had them since at least 1973 and the Beta language also had them.

You might wonder how I managed to invent something, but not notice. This language was essentially an O-O extension of a language very similar to a statically scoped Lisp -- but with a nicer syntax. Functions were written as lambda-expressions, which evaluated to closures, which carried with them the context in which the lambda expression was evaluated. A context is simply a mapping from identifiers to locations, together with a pointer to a parent context. Thus, within a lambda expression written within another lambda expression, the parameters and other variables of the outer lambda expression could be referred to within the inner lambda expression -- just as in Scheme or Common Lisp. For example, the following defines a function that computes the composition of its arguments

    var compose <= (fn (f, g) (fn (x) f(g(x)) fn) fn);

(The <= means permanent assignment.) Classes were essentially lambda expressions that implicitly returned a copy of their context as their value. Objects were just copies of contexts.  (The advantage of making a copy is that the parent pointer is not needed in the object. Thus, in simple cases, such as an object with no methods, the context and its parent could be potentially garbage collected, even if the object was not. The object and the original context share the same locations of course.) Methods were simply lambda expressions evaluated within a class, and thus, when a method was called, it could refer to the locations of the object which were shared with its context.

    var Counter <= (class {init}
                var k <- init ;
                var incr <= (fn() k <- k+1 fn)
                var get <= (fn () k fn) 
            class) ;
    var counterObj <= Counter{0} ; 

(the <- means assignment. I used curly braces for constructor parameter and argument lists, so that allocation expressions look different from function applications.)

Given this, a class written within a class or a class within a lambda expression just works. It would have been extra work to design and implement the language so that it didn't have inner classes. Unlike Java (but like Scala), there was no restriction that all local variables captured be constants (final in Java lingo). This is because I was using trees of contexts already. Contexts were garbage collected when no longer needed, rather than being explicitly de-allocated like the frames on the Java stack.  I was aware of the inefficiency of this scheme, but figured that a compiler could optimize by keeping variables that could not outlive their function invocation on a stack and all others on the heap.

2011-10-09

Genericity, Recursive Data Types, and Inheritance.

A couple of times recently I've wanted to create a generic recursive data type and then inherit from it. For example suppose we wanted to define the concept of a tree as a data type, and then extend it to create trees that bear specific types of data in their nodes. A second example is a graph consisting of nodes and edges. This is a recursive data type in that, from a node, you can follow an edge and find another node. In both cases you'd expect some homogeneity; in a tree, the children of a node should be nodes of the same type. In a graph, if you start at a node and follow an edge, you should arrive at another node of the same type. This homogeneity requirement is where genericity provides a solution.

To illustrate, I'll use the tree example. We'd like to build, say trees of integers, and tree of floating point numbers. These are two types, but as they share something in common: their "treeness". We'd like them to inherit the characteristics common to all trees from a common base type.

The key to making this work is to use bounded genericity.

This abstract class captures the "treeness" of its subclasses. Any object of type Tree<T> has children of type T where T is a type that extends Tree<T>.

Now we can both specialize and concretize to create a type that represents trees of integers


Each object of type IntTree is a tree that has IntTree objects as children and a field of type int.

Equally we could have a tree of floating point variables.


That we can do this is only really interesting if we can write generic code to deal with data (trees in this case) at the generic level of abstraction. And we can. For example here is a class of depth-first tree traversers.

Then we can specialize and concretize this class to create a visitor that finds the sum of the integers in an IntTree.

Finally I'll just briefly describe the systems where these issues came up.
  •  For the hierarchical graph types of the JHigraph system we defined: first, a mutually recursive set of generic interfaces; next, a mutually recursive set of abstract classes; and, finally, a mutually recursive set of abstract classes that use a tag system similar to XML and ensure validity. These three sets of types define the model at various levels of abstraction. Based on the interfaces, we then defined a set of generic view classes and  layout managers for visualizing the nodes and edges of the graph. An application can be built by specializing and concretizing either set of generic, abstract node and edge classes to hold the appropriate kind of data and by specializing and concretizing the view classes.
  • The second application is a language processor in which the parser produces an abstract syntax tree and then the type checker produces a typed abstract syntax tree. The plain and the typed abstract syntax tree types share a common generic superclass.

2011-04-13

How to do undo and redo

Introduction

Almost all desktop software has some form of undo functionality. In earlier days there might have only been the ability to undo one change --- unix's vi/ex editor had this, at least when I used it in the 1980s and 1990s. You still see this in some photo editing software. Now it is common to provide either unlimited undo or at least many levels of undo. Of course where there is undo there is usually redo.
Usually undo and redo operate in a last-in-first-out manner: Undo means undo the most recent action not yet undone, while redo means redo the most recently undone action, not yet redone. Whenever a new action is done for the first time, the ability to redo any previously undone actions disappears. This is called "linear undo/redo". Nonlinear undo allows the user to pick certain past actions to undo. Nonlinear undo is useful for documents that are being edited by more than one person; a user might want to undo their last action without (if possible) affecting changes made by other users. Nonlinear undo might also allow a user to go back to an earlier point and insert a new action and then redo all the actions just undone. Nonlinear undo/redo can have problems with coherence; what does it mean for Alice to undo an insertion that Bob has just altered? For this article, I'll assume we are dealing with linear undo/redo.
To the software engineer, an obvious question is "How do they do that?" And when we have to write something, the question is "How can I do that?"
This essay discusses a few ways to do undo and, in some cases, redo.
We can assess these various approaches according to efficiency. Another important consideration is layering. If we have an existing data structure implementation, then we'd probably want to layer undo-redo functionality on top of it. On the other hand, in creating a new implementation, it can be convenient to layer it on top of a layer that supports undo and redo; this way the clients of the data type are hardly influenced by the need to support undo and redo. 

Running Example

As a running example, I'll consider that the structure that we want to do undo and redo on is some sort of tree, such as you might use to represent an HTML document, only much simpler. There is one kind of node. Each node is associated with a string and zero or more children.
The interface to the data structure is as follows
interface TreeI {
    NodeI makeNode( String text ) ;


    NodeI getRoot() ;

    void insert( NodeI parent, int posn, NodeI child ) ;

    void remove( NodeI parent, int posn ) ;
}

Objects adhering to this interface are the model for our system. I'll assume there is also some sort of GUI built on top of the model, perhaps following some sort of model-view-controller pattern. If we are using the observer pattern between the model and the GUI, then undoing changes to the model should cause the GUI to automatically revert the view back to where it was, or at least to a state that shows the correct model state. There may be some changes to the view that are not reverted. For example, if the view has scrolling, the scroll position might not change back to exactly where it was in response to an undo action.
(Note I'm using Java notation, but the ideas here should adapt to any imperative object oriented language. I haven't run the Java code presented here through a compiler, so there will no doubt be some minor errors. I also haven't worried too much about accessibility levels: public, private, protected, default.)

Checkpointing

Checkpointing is perhaps the most obvious way to deal with undo and redo. It's likely what nonprogrammers think is going on, if they think about it at all. The idea is just to copy the whole state of the model before responding to user events that might change the model.
We'll layer checkpointing on top of an underlying model like this.
public class TreeWithCheckpointing implements TreeI {
    Tree tree = new Tree()  ;
   
    Stack<Tree> olderTrees() = new Stack<Tree>() ;

    Stack<Tree> newerTrees() = new Stack<Tree>() ;

    public void checkpoint() {
        olderTrees.push( tree.clone() ) ;
        newerTrees.clear() ; }
   
    public void undo() {
        if( olderTrees.size() > 0 ) {
            newerTrees.push( tree ) ;
            tree = olderTrees.pop() ;
            notifyAllObservers() ; } }
   
    public void redo() {
        if( newerTrees.size() > 0 ) {
            olderTrees.push( tree ) ;
            tree = newerTrees.pop() ;
            notifyAllObservers() ; } }
   
    public void insert( NodeI parent, int posn, NodeI child ) {
        tree.insert( parent, posn, child ) ;
        notifyAllObservers() ; }
    .
    .
    .
}

The GUI uses this class as follows: At the start of doing anything that might affect the model, the GUI calls "checkpoint". Then the command is implemented just as you might expect. To implement "undo" and "redo", the corresponding methods are called. It is important that the clone method (or whatever mechanism is doing the duplicating —in C++ it might be a copy constructor) does a "deep copy", that is all the objects that make up the model must be copied. (In a garbage collected language, it is ok to stop at objects that are immutable.)
In summary, the GUI behaves as follows
  • In response to each user input event (e.g., key press or mouse action) that might change the model:
    Send a checkpoint message to the model
  • Modify the model
  • In response to a user input event that requests "undo":
    Send an "undo" message to the model
  • In response to a user input event that requests "redo":
Send a "redo" message to the model
I've assumed here that the observer pattern is being used (see the calls to "notifyAllObservers"). This is not necessary; what is necessary is just that at the end of each user action, the view resyncs with the model.

Merits

  • This approach is very simple. It demands of the model only that it be copyable. It demands of the view and controller only that they access to model through a particular interface.

Demerits

  • The main disadvantage is the space required to keep all the old versions. It also requires time to make the copies. For a large model, these disadvantages may prove to serious.

Command Pattern

Probably the best known approach is the use of the Command Pattern described in Gamma et al.s book Design Patterns: Elements of Reusable Object-Oriented Software. In fact I've seen discussions of the Command Pattern where the writer just assumed that undo/redo was the only application of the pattern, which it certainly isn't.
The idea is simple and appealing. In response to input user input, the system creates a "change" object that can do the required changes and that can then undo them. We might have a change generator that translates user actions (e.g. keypresses and mouse gestures) into changes.
class ChangeGenerator {

        ChangeI makeChangeForKeyPress( Key key ) { ... }
        ...
}

However changes are generated, they all obey a simple interface consisting of methods execute, undo, and redo. (I would have called "execute" "do", but "do" is a keyword in Java.)
interface ChangeI {
    enum State{ READY, DONE, UNDONE, STUCK } ;

    State getState() ;

    void execute() ;

    void undo() ;

    void redo() ;
}

Each change object starts in the READY state. A call to "execute" moves the change from READY to DONE. While changing the state it records whatever information is needed to move the state of the model back to its original state. A call to "undo" moves the state back from DONE to UNDONE, while setting the state of the model back to where it was. Of courses there is an assumption that when "undo" is invoked, the model is in exactly the same state that it was in at the end of "execute". Finally the "redo" method moves from the UNDONE state to the DONE state, changing the model as it goes.
Often the "redo" and "execute" methods are exactly the same, but there are reasons to sometimes make them different. For example, the "execute" method may prompt for some input from the user. We probably wouldn't want to re-prompt the user again.
One thing that should not be overlooked is the possibility of errors. For this design, I've created a special state STUCK, which the change gets into if an error occurs as part of "execute", "undo", or "redo". It's important that when a change gets stuck, the model is left in a consistent state. Once a change reaches the STUCK state, none of execute, undo, or redo is allowed.
We can capture the model independent aspects of a change with an "abstract change" as follows
abstract class AbstractChange implements ChangeI {
    State state = State.READY ;

    public State getState() { return state ; }

    public void execute() {
        assert state == State.READY ;
        try { doHook() ; state = State.DONE ; }
        catch( Failure e ) { state = State.STUCK ; }
        catch( Throwable e ) { assert false ; }
    }

    public void undo() {
        assert state == State.DONE ; }
        try { undoHook() ; state = State.UNDONE ; }
        catch( Failure e ) { state = State.STUCK ; }
        catch( Throwable e ) { assert false ; }
    }

    public void redo() {
        assert state == State.UNDONE ;
        try { redoHook() ; state = State.DONE ; }
        catch( Failure e ) { state = State.STUCK ; }
        catch( Throwable e ) { assert false ; }
    }

    protected abstract void doHook() throws Failure ;

    protected abstract void undoHook() throws Failure ;

    protected void redoHook() throws Failure { doHook() ;} ;
}

We can fill in the three hook methods for specific change objects. For example, a change to remove a node might look as follows.
class RemoveChange extends AbstractChange {
   
    TreeI tree ;
    NodeI parent ;
    int posn ;
    NodeI child ;

    RemoveChange( TreeI tree, NodeI parent, int posn ) {
        this.tree = tree ;
        this.parent = parent ;
        this.posn = posn ; }

    protected void doHook() {
        child = tree.getChild( parent, posn ) ;
        redoHook() ; }
   
    protected void undoHook() {
        tree.insert( parent, posn, child ) ; }

    protected void redoHook() {
        tree.remove( parent, posn ) ; }
}

Once we have built a change, it can be passed off to an undo manager. The manager, "executes" the change and then stores it on an undo stack. When a change is undone, it goes onto a redo stack. When it is redone, it goes onto a redo stack. The manager looks like this.
class ChangeManager {
   
    Stack<ChangeI> undoStack = new Stack<ChangeI>() ;
    Stack<ChangeI> redoStack = new Stack<ChangeI>() ;
   
    public void execute( ChangeI change ) {
        change.execute() ;
        if( change.getState == ChangeI.States.DONE ) {
            undoStack.push( change ) ;
            redoStack.clear() ; }
        else { // Presumably STUCK
            undoStack.clear() ;
            redoStack.clear() ; }
    }

    public void undo() {
        if( undoStack.size() > 0 ) {
            ChangeI change = undoStack.pop() ;
            change.undo() ;
            if( change.getState() == ChangeI.States.UNDONE ) {
                redoStack.push( change ) ; }
            else { // Presumably stuck
                undoStack.clear() ;
                redoStack.clear() ; } }
    }

    public void redo() {
        if( redoStack.size() > 0 ) {
            Change change = redoStack.pop() ;
            change.undo() ;
            if( change.getState() == DONE ) {
                undoStack.push( change ) ; }
            else { // Presumably STUCK
                undoStack.clear() ;
                redoStack.clear() ; }
        }
    }
}

In summary, the GUI behaves as follows
  • In response to each user input event (i.e. key press or mouse action) that might change the model:
    Build a change.
  • Send the change to the ChangeManager for execution.
  • In response to a user input event that requests "undo":
    Send an "undo" message to the ChangeManager
  • In response to a user input event that requests "redo":
Send a"redo" message to the ChangeManager

Notes

Each change needs to be deterministic, i.e., produce the same changes to the model, each time it is redone, as it did when first executed. An example of a nondeterministic change is "paste". If the contents of the system clip-board have changed, then "paste" has a different result. Another example is a change that initiates some sort of dialog with the user, for example asking the user to confirm that they really want to do something. First you wouldn't want to pester the user with the same question again during the replay; secondly, even if you did, if the user gives a different answer the second time, then the next change to be redone with the model in a different state than expected. The same goes for any information that comes from an unpredictable source: asking the operating system for the current time, reading a disk file, making an http request. One way to make a nondeterministic change deterministic is to record all input that does not come from the model in the change object somehow. E.g. within "doHook" for the paste change we might have:
    ifthis.toInsert == null ) {
        this.toInsert = clipboard.contents() ; }
    document.insert( this.toInsert ) ;

Here is a case of nondeterminism I learned about the hard way. If the "execute" method creates objects, the "redo" method should not create objects; at least not objects that future changes might refer to. Rather, "redo" should reuse objects that were created in the earlier call to "execute". (And it follows that the "undo" method should not delete objects that the "execute" method created.) To explain why, here is an example. Suppose we execute a change u that makes a new node and inserts it into the tree; call this node x. Next we change the text in node x; the change v that does this presumably keeps a pointer to x, so that it can be undone and redone. Now suppose we undo both changes and then redo both. If redoing the u creates a new node x', then the redo of the second change will change x, which is no longer in the tree, rather than x'. The correct way to do it is that the undo of u should remove x from the tree, but keep a pointer it; then the redo of u will simply put x back in the tree. As mentioned, this example is really just another case of nondeterminism. The execute and redo methods must add exactly the same object to the tree. It is not good enough for "redo" to add an object x' that is equal to, but not the same as, the object x that "execute" had put in the tree.
So far I've assumed that each user interaction results in one change. A better system is to allow several changes bind together to make one compound change.

Merits

  • One of the big merits of using changes is that it separates the concerns of undo/redo from the model. There is nothing very special about the model we use, only the way we use it. This is handy if you are using someone else's model, but building your own GUI above it. For example, in one editor I built, I was using Mozilla's implementation of the W3C's DOM interface. I went with the command pattern.
  • The approach also makes it easy to incorporate changes to multiple models. For example we might have a selection model built above our data model. Changes to the selection are easily dealt with in their own set of changes.

Demerits

  • The big disadvantage is that the entire change needs to be worked out in advance before any of it can be executed. For many sorts of editors, this is fine, we can work out a simple and reversible sequence of modifications to an underlying data structure. For more complex systems, it might be hard to preplan all the changes, there might be choices to be made part way through. This can always be handled, but every choice made has to be recorded so it can be undone. Essentially one writes the code for the change twice, once going forward and once going backward. The consistency of these two bits of code requires careful coding and careful testing.
    • I ran into exactly this problem in the editor I mentioned above. What I did was to implement big changes via a series of fairly simple change objects. As it executes them, the undo manager joins these change objects together to make a compound change object. The only change to the interface needed is a call to the undo manager to say when a new interaction with the user has started.
  • This approach does affect the way you write the controller part of the GUI. Every thing that might affect the model has to be accomplished by building a change and sending that change to the change manager.

Log and Replay

The first piece of software I remember using that supported undo was vi. As mentioned above, the early versions of vi supported only one level of undo: You'd press "u" to undo the last change and then pressing the "u" again would undo the undo, i.e. redo. My first encounter with unlimited undo happened in 1985. I was hired to do some programming on a computer running the VMS operating system. The editor was called EDT; it was a text based "visual" editor, similar to Emacs or vi. Output was on a 25 by 80 character display --- a VT-100, if I remember rightly. Input was a sequence of keystrokes. Crucially there was no mouse input. (Indeed, there was no mouse -- this was 1985.) Anyway this editor had a peculiar, but very useful feature: Every keystroke entered was immediately saved to a log file. The main use of this feature was that, if the editor crashed before you had saved the file, you could start the editor in a special mode where it read its input from the log file rather than the keyboard ---at least until the log file ran out, after which it would take keyboard input again. This let you recover from the occasional crash. It also had the side effect of providing unlimited undo. If you wanted to undo the last 100 keystrokes, you'd simply quit without saving, edit the log file, delete the last 100 characters, save the log file, and then replay the log file on the original file. This technique also held some entertainment value: as the changes replayed, the changes to the document were displayed on the screen at high speed.
The application of this idea to undo/redo is straightforward. As with the command pattern approach above, we use change objects, but this time the only methods needed are "execute" and "redo", there is no "undo" method for changes.
As with the command pattern approach, we require that each change be deterministic. Normally the "redo" method will just call "execute", but there are special cases where the two need to be different. These include
  • User interaction. The "execute" method may pop-up dialogs to inform the user of things or to obtain more input from them. The "redo" method should not.
  • Obtaining information from outside the model. The execute method records any information it obtains from the user, the operating system, or other agents. The redo method should use this recorded information.
  • Creating objects that future changes might reference. The execute method sets a pointer to the object. The redo method uses the recorded object.
We can accomplish these differences in behaviour between execute and redo, either by writing them separately, or by writing "execute" so that, the first time it runs, it records any information it obtains and then, in subsequent executions, it uses the recorded information and skips all user interactions. (See the "paste" example, above in the command pattern approach.)
I'll ignore any possibility of changes failing, partly to keep things simpler, but also because a failure to execute is not really a problem, as long as it is repeatable. With the command pattern approach, a change that doesn't finish is problematic because it can't be undone. With the Log and Replay approach, we can undo even a change that stops part way through, so long as the model is left in a consistent state.
public abstract class LogAndReplayManager {
   
    Stack<ChangeI> undoStack = new Stack<ChangeI>() ;
    Stack<ChangeI> redoStack = new Stack<ChangeI>() ;
   
    public void execute( ChangeI change ) {
        change.execute() ;
        undoStack.push( change ) ;
        redoStack.clear( ) ;
    }

    public void undo() {
        if( undoStack.size() > 0 ) {
            ChangeI change = undoStack.pop() ;
            redoStack.push( change ) ;
            reset() ;
            for( ChangeI c : undoStack ) c.redo() ; }
    }

    public void redo() {
        if( redoStack.size() > 0 ) {
            ChangeI change = redoStack.pop() ;
            change.redo() ;
            undoStack.push( change ) ; }
    }

    // Reset the model
    protected abstract void reset() ;
}

We need some way to reset the model back to its original state. I'll leave "reset" as abstract so that the LogAndReplayManager class can be reused for various models.

Merits

  • This approach is very space efficient. All that needs to be saved is the changes, which mostly consist of user input and pointers to newly allocated objects.
  • It is very simple to implement.
  • As with the command pattern approach it demands nothing of the model. Unlike the change approach it does not require the execution of the change to be simple. Thus the change objects can correspond very closely to user interface actions. There is no need to carefully plan out the construction of changes that are easy to undo.

Demerits

  • Clearly this approach could have time problems: undos can be expected to take quite a bit longer than they would with say the command pattern approach. If the model can be copied, we can combine Log and Replay with Checkpointing. For example if we take a checkpoint every 50 changes, then the combined approach is almost 50 times more space efficient than checkpointing alone, but an undo takes no more time than executing one copy and 49 changes.
  • The other down side is determinism. Changes that obtain information from the user, or some other unpredictable source, need to somehow record the user's responses to the dialog somewhere accessible to the change.
  • It does not fit well with the Observer pattern.

Note

This idea came to me from Guido Rößling who used something like it in his Animal algorithm animation system. He calls it "reverse by fast forward".

Timestamping

In building the Teaching Machine, one thing I wanted was undo. The command pattern approach wasn't very appealing as each change could involve a great deal branching and looping. The command pattern approach requires that you write everything twice, once going forward and once going backwards. Undoing an if-then-else requires recording which direction was taken. Loops are worse. Recursion is nasty. The Teaching Machine is a program animation system, which means that it has within it an interpreter for a programming language (either Java or C++ in the TM). A single user action might execute several lines of C++ or Java code. Checkpointing was not appealing, because of the large and complex model state. Had I thought of it, the time required by the Log and Replay method would have worried me. The Transactional Memory approach below would have been suitable, but I didn't think of it at the time.
What I came up with is based on maintaining a global clock which advances by one tick with each user interaction. The following description addresses only undo, not redo, which was not a requirement for the Teaching Machine.
The model is made up of objects that have fields that never change; for mutable fields we use special Var objects. Each Var object has "get" and "set" methods that let us get or set a value.
class Var<T> {
   
    T get() {...}

    void set( T val ) { ... }

    ...
}

Each Var maintains a stack of all the values ever assigned to it and, along with each value, the times at which that value was valid. For example we might have a Var that is null until time 4, when it is set to 99, then it is set to, 100 at time 10. If the last time the variable was accessed was time 12, it might be represented by a stack of records as follows
Value First valid time Last valid time
100 10 12
99 4 9
We only update this stack and the records in it when the variable is accessed (read or written).
Undo is implemented simply by subtracting 1 from the global time variable. In our example, suppose time roles back to say time 7; then the get method of the Var is called. It should notice that the top record is no longer of use. That record is discarded and the next record is adjusted to reflect that there is no time 8 or 9. The new stack is
Value First valid time Last valid time
99 4 7
The alert reader will have noticed a flaw in this scheme. Suppose that we didn't access the Var at time 7, the stack remains as
Value First valid time Last valid time
100 10 12
99 4 9
Now if 5 more user actions happen with no accesses to this Var, it will be time 12 again, only in this time line the assignment of 100 to our variable never happened. If an access happens now we get a result of 100 instead of the correct 99.
The solution is simple. We don't use integers for times, but rather objects that point back to the previous time. (The first time points to null). Thus we can have a branching tree of times. It turns out to be useful to keep the integer as well. Thus our times are represented as objects of this class.
class Time {
   
    private final Time previous ;
   
    private final int count ;

    public Time() { previous = null ; count = 0 ; }

    private Time( Time previous ) {
        this.previous = previous ; count = previous.count + 1 ; }

    public Time makeNextTime() { return new Time( this ) ; }
   
    ...
}

Keeping track of the current time is a TimeManager.
public class TimeManager {
   
    private Time currentTime = new Time() ;

    public Time getCurrentTime() { return currentTime ; }

    public void checkpoint() {
        currentTime = currentTime.makeNextTime() ; }

    public void undo() {
        currentTIme = currentTime.getPrevious() ; }
}

The current time together with its previous time, and the time before that, and so on, constitute a "time line" -- the current time line. When a Var is accessed (whether via "get" or "set"), the variable is first put onto the current time line. I.e. we discard records from its stack, and/or adjust final times, until all times mentioned by its stack are on the current time line. Then we proceed with the get or set.
What I like about this scheme is that you only pay for what you use; this in the sense that undo costs essentially nothing at first. It is only as variables are accessed that there is a cost to be paid.
So far, this scheme does not allow for redo. Redo can be accommodated at some extra complication. Each Var needs to be equipped with two stacks. One for "past" values" and one for "future" values. I put "past" and "future" in quotes, as these are relative to the last time the variable was accessed, not to the current time. Upon access, these stacks are first adjusted so they make sense relative to the current time.
Building the whole model just out of Var objects can be a bit too low level. For the Teaching Machine, I created a suite of undoable data structures that the client could use to build their model: arrays, vectors, maps. This also allowed different implementation approaches to be used. For example needed a big array of bytes to represent memory. I could have used a big array of Var<Byte> objects, but though this would use up too much memory -- several words to represent each byte. So, while Vars were implemented much as shown above, arrays used a method more like the Transactional Memory approach discussed below.

Merits

  • No change objects are needed. From the GUI's point of view, it needs to call "checkpoint" and "undo" on the TimeManager at appropriate times, but is otherwise not complicated by the needs of undo and redo.
  • The undo is incremental. Only the parts of the state that are actually accessed need to be undone.
  • In contrast to checkpointing, the space overheads are modest. Only data that has changed takes extra space.

Demerits

  • In contrast to all the schemes above, there is a major effect on how you code the model. You must be sure to use Var objects throughout. All these and the Var class form a layer below the model, on which the model is built. It would be impossible to use some third party model.
  • Bugs occur when someone forgets to use a Var instead of a regular field.
  • The system is admittedly a bit complicated. Once working, it simply works and you can forget about how it works, but until then it is tricky coding.
  • Only unlimited undo is supported. There isn't a good way to support limited undo (e.g. only the last 100 changes can be undone). For long running, memory intensive applications, this could be an issue.
  • Since each Var requires a link to the time manager, we end up passing the time manager to every routine that might need to build a Var. One could use the singleton pattern for the time manager, but the singleton pattern can be troublesome. For example, consider an editor with a "multiple document interface"; we want two models with independent current time lines, thus two time managers.

Transactional Memory

As we've seen, the drawback of the command pattern approach is having to write code for each change in two or three places. We need to write "execute", "undo", and possibly "redo". This usually constrains what we do in "execute" to be fairly simple. The more decisions that are made during "execute", the more complex will be the recorded state and the undo operation. Timestamping is one way around this. Another is a scheme inspired by "software transactional memory", which is a method of dealing with concurrent updates to data structures.
This Transactional Memory scheme, like the Timestamping scheme, assumes that the GUI periodically calls a checkpoint method, typically at the end of each user interaction that might change the data structures. A period of time between checkpoints (or before the first) is called an epoch. (Actually this is a bit of a lie, but it's a good first approximation. I'll define "epoch" precisely later on.)
The idea of transactional memory is to make, for each epoch, a record of all variables changed during the epoch, together with their values as of the start of the epoch. Such a record is called a transaction. For this to work, the data structure must be built using special "transactional variables", for which I will use objects of class TVar. Each TVar holds a current value. Similar to the Timestamping approach, each field that makes up the model will either be a constant or will be a TVar.
class TVar<T> {
    private TransactionManager manager ;

    private T currentValue ;

    public T get() {return currentValue;}

    public void set( T val ) { manager.notifyOfSet(this) ; currentValue = val ; }

    ...
}

Each TVar knows a transaction manager, which holds two stacks of transactions. As you've probably guessed, these are a stack of past transactions (the undo stack) and a stack of future transactions (a redo stack). Each transaction is, abstractly, a partial function that maps TVars to values. Transactions on the undo stack map TVars changed during an epoch to the values of those TVars at the start of that epoch. Conversely, transactions on the redo stack map TVars to their values at the end of an epoch.
Applying a transaction t means swapping the values held by the transaction with the current values in the TVars
for v in domain(t) do (t(v), v.currentValue) := (v.currentValue, t(v))
Applying the top transaction on the undo stack will change the values of the TVars back to the way they were before the transaction started; at the same time the transaction becomes suitable for placement on the redo stack. Conversely, applying the top transaction of the redo stack will change the values of the TVars to the way they were at the end of an epoch; at the same time the transaction becomes suitable to be placed back on the undo stack.
The transaction manager has two states, DOING and NOTDOING. An epoch is defined to be the time from a transition from NOTDOING to DOING until a transition from DOING to NOTDOING. When the transaction manager is in the DOING state, it additionally has a current transaction, which is a transaction that is being built to represent the current epoch. The current transaction records the original values of all TVars that have changed during since the start of the current epoch, i.e. since the last time the transaction manager was in the NOTDOING state. When the transaction manager is in the DOING state, its redo stack is empty.
The interface of the transaction manager is similar to that of the TimeManager in the Timestamping approach. The transaction manager works as follows.
  • Initially: both stacks are empty; the transaction manager is in the NOTDOING state, and thus there is no current transaction.
  • On receiving a notifyOfSet message from a TVar v:
    • First, if in the NOTDOING state, the transaction manager creates a new transaction t, with an empty domain, and makes t the current transaction, then it clears the redo stack and switches to the DOING state. This marks the start of an epoch.
    • Now the manager is in the DOING state.
    • Second, if v is not in the domain of the current transaction, v is added to the current transaction t and is mapped, in that transaction, to its current value
      t(v) := v.currentValue
      This records the value of v as of the start of the current epoch.
  • When a checkPoint message is received
    • If the manager is in the DOING state, the current transaction is pushed onto the undo stack. The manager switches to the NOTDOING state. This ends the epoch and the building of that transaction.
    • If the manager is in the NOTDOING state, there is nothing to be done.
  • When an undo message is received
    • In case we are in an epoch, i.e., the manager is in the DOING state, the current epoch is ended exactly as if a checkPoint message was received. I.e., the current transaction is pushed onto the undo stack. The manager switches to the NOTDOING state.
    • Now the manager is in the NOTDOING state.
    • If the undo stack is now empty, there is nothing to undo.
    • Otherwise, the top transaction t of the undo stack is applied and then moved to the redo stack.
  • When a redo message is received
    • If the redo stack is empty, there is nothing to do.
    • If the redo stack is not empty, the transaction manager must be in the NOTDOING state. The transaction manager applies the top transaction of the redo stack and then moves it to the undo stack. It stays in the NOTDOING state.

Discussion

Transactional memory can be thought of as a optimized form of checkpointing. Rather than saving the whole model at each checkpoint, we save only the parts changed between checkpoints.
Like the Timestamping approach, it requires that we use special variables to build the model. For the GUI coder, it is as easy to use as the Timestamping approach. Indeed to its clients an implementation of transactional memory and an implementation of Timestamping have essentially the same interface.
The space efficiency is probably about the same as for Timestamping. In both cases we pay only for changes.
For time efficiency, the crucial operations are getting values from the model and setting values. Getting values is very quick. Each TVar holds its current value. This is in contrast to the Timestamping approach where each Var must ensure it is on the current timeline at the start of each get. The main cost of a set operation is checking whether the TVar is already a part of the the current transaction. This can be optimized by caching, in each TVar, a pointer to the last transaction it was recorded in.
public class TVar<T> {
    ...
    private Transaction lastTrans ;
    ...
    public void set( T val ) {
        if( lastTrans != manager.currentTrans || !manager.isDOING ) {
            manager.notifyOfSet(this) ;
            lastTrans = manager.currentTrans ; }
        currentValue = val ; }

    ...
}

Merits

  • Space efficiency is probably acceptable. There is an overhead of about 4 times.
  • Time efficiency is probably better than Timestamping.
  • It is easy to write the GUI code. The only requirement is to call checkpoint at the end of each user interaction cycle.
  • As with Timestamping, we can create optimized data structures, such as arrays, that use more space efficient techniques.
  • Unlike Timestamping, it is clear how to support redo.
  • Unlike Timestamping, it is easy to implement limited undo.

Demerits

  • You have to layer the model above the transactional memory system. You can't just use an existing model.
  • Bugs occur when someone uses a regular field instead of a TVar.
  • Unlike TimeStamping, undo is not incremental.
  • Since each TVar requires a link to the transaction manager, we end up passing the transaction manager to every routine that might need to build a TVar. Alternatively, one could use the singleton pattern for the transaction manager, but the singleton pattern can be troublesome. For example, consider an editor with a "multiple document interface"; we want two models with independent undo and redo stacks and thus two transaction managers.

Note

Transactional memory is well known as an approach to interference control in concurrent programming. I haven't seen the idea applied to the undo/redo problem before.

Summary

Dependence

As always in software design the separation of concerns is crucial. In designing a complete interactive system that supports undo/redo we need to consider (at least) four major components and how they are coupled.
  • The model. This is the data structure.
  • The view. Displays the data structure.
  • The controller. Interprets user input and causes the data structure to change. For the command pattern approaches, this includes the concrete change classes and the code that constructs the change objects.
  • The undo/redo infrastructure.
In all the approaches we've looked at the undo/redo infrastructure does not depend on the view or controller; we can reuse the undo/redo infrastructure with other views and controllers. What about the model?
  • We can layer the undo/redo on top of the model, as I did in my example of the checkpointing approach, where the TreeWithCheckpointing implements the same interface as the underlying model. If we have an existing model/view/controller system, the checkpointed model is a drop in replacement for the model. The only place where the view and controller depend on on the model being checkpointable is in implementing undo and redo buttons and calling checkpoint at the start of each user interaction that leads to a change. The downside is that the undo/redo infrastructure is tied to one model interface and is not reusable with another. Since the undo/redo infrastructure is fairly trivial, this is not a big price.
  • In the command pattern approach (and log and replay) the change manager, together with the change interface and the abstract change class, is reusable, but the individual changes are not. Furthermore the controller is strongly coupled to the undo system, as it must construct the change objects. On the other hand, the view and the model are independent of the undo/redo system.
  • The last two approaches --I'll call them "magic memory approaches" make the model strongly coupled to the undo/redo infrastructure, as the model must be built out of special variables. This rules out reusing an existing model short of completely rewriting it. However the view and controller are then essentially written as if there was no undo/redo; we just have to add enough calls to checkpoint and the undo and redo buttons.
In summary: If you want to reuse an existing model and view, the command based approach is probably best. On the other hand, if you want to write the controller without packaging each change into a change object, then the magic memory based approaches are simplest, as long as you are writing new model code or don't mind rewriting existing model code. The checkpointing method seems to combine the best of both worlds, but is obviously costly.
I've implemented two complex systems supporting undo
  • In the SIMPPLE project, I'm building a WYSIWYG XML editor. The most complex part is the view, as it needs to support CSS, which is a fairly complex standard. The best software is the software you don't have to write, so I decided to use the Gecko layout engine as a view component. This meant using Mozilla's DOM implementation for a model. Since I couldn't have the model depend on the undo/redo system, a command pattern based solution seemed the best.
  • In the Teaching Machine, it did not seem that commands could be packaged to be easily undone. Checkpointing would be too space consuming. I could have tried a log and replay approach but didn't think of it. I opted for Timestamping. While I think that Timestamping can be extended to support redo as well as undo, I'll probably switch to Transactional Memory when I want to add redo; this is for the sake of simplicity and efficiency.

Efficiency

The checkpointing system is conceptually simple, but costs a lot in space and time.
The magic memory approaches store fine grained changes and so you only pay for the modifications made. This saves space and time.
The command based approach is potentially very memory efficient.