How I use Git

Below are some thoughts on how I use Git and SourceTree.

First some terminology

A database containing a bunch of commits, branches and other things.
An immutable object. The value of a commit object is a state the whole file tree + pointers to its parents + a message. The address of a commit is a secure hash of its value. Thus two commit objects in different repositories with the same address will (with near 100% probability) have the same value. Commits usually have one parent, but may have 0 or 2; commits form a rooted directed acyclic graph. When we talk about a commit we might be talking about an object (which exists in one repository) or a value (which might be represented by objects in different repositories). It often doesn't matter which we mean.
A variable whose value is the address of a commit, i.e., a hash of its value. Branches are local to repositories. (Even "remote tracking branches" are local to a repository.) Each repository has two sets of branches: Local branches are intended for local work. Remote tracking branches represent local branches of other (remote) repositories; however the value of a remote tracking branch could be out of date with respect to the branch that it is tracking. (Note that I used the word "local" in two sense in this paragraph.)
Currently checked out branch:
The branch that was most recently checked out. Making a new commit typically updates this branch.
Working copy:
A state of the file tree represented in a computer's file system. Each time you check out a branch, the working copy gets overwritten.
Index (also called the Staging Area).
A place on your machine where Git keeps changes that will become part of a commit in the future.
A merge combines two commits to create another commit. As I understand it, if we merge two commits x and y that have a least common ancestor z, then the result commit w=merge(x,y) will contain all changes from z to x and also all the changes from z to y. Here is an example where we consider a file tree that contains only one file, so the state of the file tree is simply a sequence of characters. Suppose z is abcdef and x is acdef and y is abcdefg. The changes from z to x is {delete the b between a and c}. The changes from y to z are {add a g after the f}. The union of the changes is {delete the b between a and c, add a g after the f}. So w is acdefg. Sometimes it's not clear how to merge files, and in that case there is a "merge conflict". When x is the least common ancestor of y, then there is no need to create a new commit, so merge(x,y)=merge(y,x)=x. This is called merge by fast-forward.
Line of development:
A sequence of commits that may get added to over time. "Line of development" isn't really a Git concept, but I find them useful to think about. Often people use the term "branch" for this, but that's confusing because in Git a branch is a variable whose value is the address of a single commit; not a sequence of addresses of commits. Also, while each Git branch is associated with one repository, a line of development spans multiple repositories. I found Git much easier to use once I finally realized that branches and lines of development are different (but closely related) concepts. So next I'll try to explain with an example what I mean by a line of development.

More on "lines of development"

Consider this evolution of a system where commits are ordered in the order they are created (from left to right)
There are three lines of development here: the shared line, the x line and the y line. The x and y lines represent two different features and might be done by two different programmers. The shared line represents the amalgam of all completed features. Once a feature is completed, the last commit on its line is merged into the shared line. Once we have finished with a feature, we can delete the branches associated with it, but the lines of development remain. Commit x4 is particularly important. This represents the programmer catching up with all features completed since they started work on their feature -- in this case, just y. It's a good idea to make these catch-up merges each time we notice the shared line has been added to. (In the example the developer on the x line probably should have caught up earlier.) Running unit tests after these merges is important, since it can alert us to any conflicts that aren't flagged by Git. It's particularly important to make these catch-up merges (if needed) before merging back into the shared line. This ensures, that untested combinations of features never make it onto the shared line. (And, as we will see below, it also prevents merge conflicts from happening on GitHub.)

A point not captured by the diagram above is that, if we allow fast-forward merges, not all the commits shown in the picture are different. We will have y1=shared1 and x4=shared2. SourceTree  might display the graph above like this
which is simpler, in that it has fewer nodes, but doesn't clearly show the lines of development. Like I said above, lines of development do not correspond to anything in Git. They are just a product how we think about software development.

The five branches

Usually you only have to worry about two lines of development at a time: a shared line (typically called master) and a line that only you are working on. For illustration I'll call the shared line "shared" and the other line "feature". In implementation the lines of development are represented (sort of) by branches.

But thanks to Git being distributed, each conceptual branch x is replicated in a number of places:
  • There is GitHub's x, i.e. a copy of the branch that is on the server. [I'm assuming here that the central repository is GitHub, but it could just as well by Git Lab or Bit Bucket or a private server.]
  • There is a tracking branch in your repository; this is called origin/x.
  • And there is your local copy of the branch, which is called x.
That's 3 copies of each branch. I'll call them "GitHub's x", "my origin/x", and "my x". Plus everyone else may have one or two copies in their own repositories. So if 10 people are working on 1 feature each, that's 11 lines (10 feature lines + the shared line) and there could be up to 21 branches for each line of development (1 on GitHub and then each local repository can have a local and a tracking branch). So there are up to 231 branches in total. Luckily you usually only have to worry 2 lines at a time and you only have to worry about the copies on GitHub and the copies on your own machine. And, of these, I don't ever use my origin/feature, so that's only 5 branches you have to worry about:
  • GitHub's shared,
  • my origin/shared,
  • my shared,
  • my feature,
  • GitHub's feature.
plus the working copy and the index.
We try to maintain the following relationships at all times between the commits that are the values of these 5 branches. (Here ≤ means "is descended from".)
     my shared ≤ my origin/shared ≤ GitHub's shared

     GitHub's feature ≤ my feature

It's also a good idea to try to fold any changes made to the master into our feature as soon as they show on GitHub's shared branch. So we try to keep
     my feature ≤ my shared = my origin/shared = GitHub's shared
true as much as practical. (I.e., that my feature is descended from my shared, which is the same as the tracking branch which is up to date.) We do this with catch-up merges. This way, when we read, edit, and test our code, we are reading, editing, and testing it in the context of all completed features. Furthermore, when a pull-request is made we want
     GitHub's feature = my feature ≤ my shared = my origin/shared = GitHub's shared
That way merge-conflicts won't happen on GitHub's server.

Information flow

The flow of information that I use is shown in the figure. I'll explain each part below.

Basic operations

For the rest of the article I'll assume you are using SourceTree. Of course everything SourceTree does can also be done from the command line.

Some of the basic operations of SourceTree work like this (somewhat simplified)

"Fetch" updates all your tracking branches. So Fetch means my origin/x := GitHub's x, for every x branch in GitHub's repository. Typically we use Fetch to bring changes made to GitHub's shared to my origin/shared.

"Pull" means update my current branch from GitHub's repository. So Pull means my origin/x := GitHub's x ; my x := merge( my x, my origin/x), where x is the currently checked-out branch. Typically this is a fast-forward merge. (Usually I do a Fetch first and then a Pull if x is behind origin/x. When x is behind origin/x the merge is done by "fast forward", i.e., we have my x := my origin/x). Typically we use Pull to bring changes made to GitHub's shared to my shared.

"Branch" means create a new branch.  It means y := x  where y is an existing branch and x is the currently checked-out branch. y becomes the currently checked-out branch. Typically we use Branch when we start working on a new feature.

"Merge" means my x := merge(my x, my y) where x is the currently checked-out branch and y is another branch. Usually we either merge my shared into my feature. In the flow I use, merges are always merging my shared with my feature to make a new value for my feature branch.

"Check out" updates the working copy to the value of a particular commit. In the flow this is used to check out my feature branch. Some operation in SourceTree only apply to the currently checked out branch, so there are times you will check out a branch just so you can do something else with it, such as a pull.

"Stage" Staging means moving changes that are in the working copy to the index.

"Commit" Commit makes a new commit based on the changes in the index.  Of course it updates the currently checked-out branch.

"Push" means update GitHub's copy of the branch; it also updates the tracking branch. So Push means GitHub's x := my x; my origin/x := GitHub's x, where x is the currently checked-out branch. In the work flow, Push is used to push commits on my feature branch to GitHub's feature branch.

"Make and merge a pull request". A pull request is a request for someone else to review the changes on a branch and to merge one branch into another.  (Pull requests would be better named "merge requests" in my opinion.)  Pull requests are not a feature of Git, but rather of hosting services such as GitHub. SourceTree can help you create merge requests. The actual merging of the pull request is done using GitHub's web interface.

Recipes for common tasks

Here are some recipes for doing some common tasks with SourceTree.

Catch up the shared branch

  1. In source tree click on Fetch
  2. If shared and origin/shared are the same, stop
  3. Check out the shared branch by double clicking on "shared" under "Branches" on the left sidebar.
    Click on Pull to get the local shared branch up to date with origin/shared

Make a feature branch

  1. Catch up the shared branch (see above)
  2. Check out shared (if not already there).
  3. Click on Branch.
  4. Type "feature" as the New Branch. Click ok.

Catch up the feature branch.

(Do this fairly frequently)
  1. Catch up the shared branch (see above).
  2. If shared is an ancestor of feature you are caught up. Stop.
  3. Check out feature (if not already the checked out branch).
  4. Click on merge.
  5. Select shared.
  6. Click OK.
  7. Check for any merge conflicts. If there are merge conflicts they need to be resolved. That's a whole other story. (Maybe another blog post.)
  8. Even absent merge conflicts, there may be silent problems that prevent compilation or introduce bugs. So carefully inspect all differences between the merged version and the previous version of feature. Also recompile and run unit tests.
  9. Click on Push.
The final push is optional, but it saves your work.  Also you need to do it if you are going to make a pull request -- more on that below.

Make your own changes

  1. Check out feature (if not already the checked out branch).
  2. Make changes to the files. Run tests.  Etc.
  3. Back in source tree, Cmd-R (Mac) or Cntl-R (Windows) or View >> Refresh
  4. Select "Uncommitted changes"
  5. Review all unstaged changes.
  6. Stage all changes you want as part of the commit.
  7. Click Commit. (This doesn't actually do the commit.)
  8. Enter commit message
  9. Click on "Commit" button at lower right. (This does the commit.)
  10. Push the new commit to the origin, by clicking Push and OK.
  11. If you've never pushed the branch before you may need to check a box in the previous step before clicking OK.
Pushing the new commit to the origin is optional, but it is good to do for a couple of reasons. One is that it saves your work remotely. The other is that it lets other people on your team see what you are doing.

Merge your feature back to the shared branch.

(Do this when you think it's complete and ready for review.)

  1. Catch up the feature branch. (See above.) Be sure to push the feature branch to the server.
  2. If there are any problems, such as merge conflicts or failed tests, make sure they are all resolved before going on.
  3. On GitHub, make a new "Pull Request", being careful that it is a request to pull feature into shared.
  4. At this point, you might want to request someone else to review the pull request.
  5. Wait for comments or for someone else to merge the pull request.
  6. Or if no one else merges the pull request, merge it your self.
When there are comments that need to be addressed, you can modify your feature branch and push it again.  Pull requests are based on branches, not on commits. So when you push new commits on your branch they become part of the pull request.   If there are changes to the shared branch between the pull request being made and the the feature being merged, it's important to redo the process above so, for example all tests can be run on a caught up version of the commit.


Direct manipulation of abstract syntax trees part 1: Representing and manipulating trees

Direct manipulation of abstract syntax trees part 1: Representing and manipulating trees

This is the second part of a two part post. Part 0 is here.


The PLAAY language is a visual language. The editor for PLAAY allows users to modify abstract syntax trees of PLAAY programs by direct manipulation.
An earlier post focussed on the user interface from the users' point of view. Here we look at the software design behind the GUI.

Valid trees

Code in PLAAY is a tree made up of nodes of various kinds. Each kind of node has restrictions on what its children can be. For example many kinds of node aren't allowed to have any children at all.  A node representing a function call must only have children that are expressions. A node representing a while expression must have exactly 2 children and the first must be some kind of expression and the second must be an expression sequence. Expression sequence nodes can have any number of children, but each must be either some kind of expression or a variable declaration.  A valid tree obeys all these restrictions and the PLAAY editor is only capable of producing valid trees.
This idea of valid trees is essentially the same as XML's idea of valid documents.
The validity rules are PLAAY equivalent to the syntax rules of text based languages.  Since the editor can only produce valid trees, it is impossible, in PLAAY, to make a syntax error.
There is one loophole. Where the program is not finished, we allow a "placeholder" node to be placed. For example, as I mentioned, a while expression must have an expression as its first label. By default, this is an expression placeholder node, although as we will see later, we can create the expression first and then create the while node later, in which case we never need the placeholder.

Program nodes and tree nodes

Each tree is represented by a program node (PNode) which has a label and a sequence of 0 or more children. PNodes are immutable and may be shared, so they form a directed acyclic graph. It's a precondition of the PNode constructor that it is valid and the precondition is checked at run time so the error of making an invalid tree is quickly found during testing.
Also we generally make trees through a method tryMake
tryMake(label:Label, children:Array<PNode>):Option<PNode>
The type Option<PNode> has values that are either None or Some(p) where p is a PNode. So the code that tries to make a PNode with this method has to deal with cases where the node would have been invalid. Option is a monad so we can use monad operations like bind and map to deal with the results of tryMake and similar functions.
Here is a tree.

The blue ellipses are expression nodes. The yellow square is an expression sequence node.
On the screen this tree is displayed as a set of nested boxes like this:

You can see that each expression node is displayed as a box. Expression sequences are not displayed, though their children are.
Since PNodes and labels are immutable, we can use the same PNode to represent multiple nodes in a tree.  The illustration below shows two equivalent representations of the same tree in terms of PNode objects (triangles), Label objects (squares), and arrays (rectangles). Other than the amount of space they take, these representation are treated as equivalent.

A node in the tree represented by a PNode p can be represented by p and a path from the p to the tree node. For example if p is the node labelled by * in either of the structures above, the 7 tree nodes in the tree represented by p are (p,[]), (p, [0]), (p, [0,0]), (p, [0,1]), (p,[1]), (p, [1,0]), and (p, [1,1]).
If a tree node has n children, then there are n+1 positions under it. Here is a tree and all the positions in it. The dotted lines simply show which tree node is associated with each position.

So a position consists of a tree node t --called the parent-- and an integer k such that 0 ≤ k ≤ n, where n is the number of children the tree node has. For example the second position under the node labelled + on the right is ((p, [1]), 1) where p is the PNode at the root.
There are no objects in the data structure representing positions. However when we render the tree into HTML, some positions are represented by HTML elements called drop zones. Drop zones are usually invisible, but light up when the mouse hovers over them. Also, as explained below, when the current selection corresponds to a position, the drop zone --if any-- for that position is shown in grey.

Why immutability?

At this point you might wonder why I chose to use immutable objects to represent trees.
There are several reasons, but two stand out: correctness and simplicity.
Correctness: As mentioned above it is an invariant of PNodes that they are valid. In order to enforce this invariant we check that it is true on construction. Since nodes and labels are immutable the invariant can never become false. With a mutable structure we could enforce the invariant by checking that it is respected after every mutation. However what can we do when the assertion fails. At that point it is too late to recover. So for each mutator we need to write code that will predict whether the change will succeed or fail.  For example if I have a mutator that replaces children i to j of a node with a new set of children. I might write methods in PNode like this
    public canReplace( tn : TreeNode, i : int, j : int, nodes : Array<PNode>) : boolean {
        ... }
    private replace( tn : TreeNode, i : int, j : int, nodes : Array<PNodes> ): void {

        ... }
    public tryReplace( tn : TreeNode, i : int, j : int, nodes : Array<PNode>) : boolean {
        if ( canReplace( tn, i, j, nodes ) ) {
            replace( tn, i, j, nodes ) ;
            return true ; }
        else return false ; }
Now this style of coding can be made to work, but I have three problems with it. First, I know that when other people use this interface, some of them are going to ignore the result of tryReplace. The use of Option objects in the current design makes it more difficult to ignore the possibility of failure.  Second, the whole scheme rests on getting the implementation of the mutator and the predicate to be completely coherent, i.e. the can function should be true exactly if the mutator will succeed. This is easy simple cases but rapidly gets complex as the complexity of the mutations increases. Third these changes are not composable. Suppose I have an edit that consists of two changes in sequence. How can I know whether the second will succeed before doing the first?  And if I do the first, how can I get back to the original state if the second will not succeed. To answer the last two questions, we could use a command pattern based undo/facility with composable commands. However that is another level of complexity
A bonus from using immutable structures is that undo/redo is very easy and efficient. We simply keep two stacks of PNodes. Actually, we keep two stacks of Selections objects; Selection is another class of immutable objects and is discussed next.


A selection is a pair of positions. We restrict these positions to having the same parent, so it might be better to say that a selection is a tree node t and two integers a and f such that 0 ≤ a,f < n where n is the number of children t has. The numbers a and f are called the anchor and focus. Most of the time it doesn't matter which is the smaller and which is larger. The minimum of a and f is called the first and the maximum the last. When a=f, a selection just indicates a position. When a ≠ f, there is a nonempty set of tree nodes that are called the selected nodes. The number of selected nodes is the last minus the first.
Selections can be represented by triples consisting of a tree node, the anchor, and the focus.
For example, here is a selection with 0 selected nodes ((p, [0]), 2, 2)

A selection with one selected node ((p,[]), 0, 1)

And a selection with two selected nodes ((p, []), 2, 0)

When we display selections to the user, the selected nodes are shown with a grey background. If there are no selected nodes, we display the position with a small grey rectangle called a drop zone. Here are the three selections above as displayed in the editor.

The editor does not create a drop zone for every position in the tree, only those where it makes sense to insert a tree. There is a drop zone in the example above because addition nodes can have three children. While they are technically valid, we avoid creating selections corresponding to positions that don't have drop zones; such selections aren't needed and are difficult to convey to the user.
In the code, we represent selections with a class Selection, which is immutable and such that all the restrictions are checked at construction time.  Selections are guaranteed to be valid.

Edits on selections

Another key concept is that of an edit. An object of type Edit<Selection> is an object that represents a function from Selection to Option<Selection>. (In general, for any type A, an Edit<A> object is an object that represents a function from A to Option<A>. We're only interested in the case where A is Selection.) There are two operations on edits, applyEdit takes a selection and returns an Option<Selection>, and canApply is a function that takes a selection and returns true if applyEdit would succeed. I.e.
!e.canApply(s) iff e.applyEdit(s).isEmpty()
The only advantage of canApply is that it might be quicker to call canApply than to call applyEdit and see if it succeeded.
The nice thing about edits is that we can compose them in various interesting ways. For example
  • Applying compose(e, f) applies e and then applies f to the result. Of course, if either application fails, the application of the whole edit fails.
  • Applying alt( [e, f, g] ) applies the first edit in the list that succeeds.
Edits are a special case of what functional programmers call "arrows". I won't go into that now, since you don't need to know the theory of arrows to understand edits any more than you need to know group theory in order to understand addition.

Some example edits

Replace-children edits

A replace-children edit attempts to replace the selected nodes with a fixed list of trees. If there are no selected nodes, it simply attempts to insert the list of nodes at the given position. For example if the list of trees is

and we apply the resulting replace edit to the selection

the resulting selection is

The replace-children edit fails (i.e. returns None) if the resulting tree would be invalid. For example if the same edit were applied to the selection

it would fail, because assignment nodes with 3 children are invalid.

Engulf edits

Engulf edits are built out of selections called templates. Suppose we start with a template

and we built an engulf edit out of it; the engulf edit applied to a selection

results in a selection

In general engulfing is a three step process:
  • First replace the selected nodes of the template selection with the selected nodes of the target selection to make a temporary selection.
  • Second replace the selected nodes of the target selection with the root of the temporary selection.
  • Third, the path, anchor and focus are adjusted so that the selection corresponds to the position after the last node inserted in to the temporary tree.

Engulf or replace

A number of GUI actions are associated with edits that choose between an engulf or a replace based on a particular template. For example the ':' key is associated with an edit that either engulfs with the template

or replaces the current selection with the tree from the same template.  In either case, there is an optional movement edit --movement edit are discussed below-- after the replace or engulf.
 Similarly the +, *, -, =, <, and > keys are associated edits that engulf or replace using calls to the appropriate function. The ?, @, and \ keys engulf or replace with templates for, respectively, an if expression, a looping expression, and a lambda expression.
The choice between replace and engulf is made as follows. If every selected node is a place holder (e.g., if there are no selected nodes), then replace is preferred. Otherwise engulf is preferred. Of course, if the preferred edit does not succeed, the other edit is tried.
In fact a choice of templates can be tried. The first template to succeed is applied. For example, there are two alternative templates for while loop expressions.

If one expression x (which is not a placeholder) is selected, pressing @ will engulf that x with the first (left) template so that x becomes the guard. However, if two more more expressions are selected, the tree from engulfing with the first template would be invalid (it would result in a tree with 3 or more children under a while loop node which is invalid). Instead the two expressions are engulfed with the second template so that they become the body of the while loop.  This sort of thing is easy to do using the alt function.  For example the code that creates an edit to try engulfing with each of an array of templates looks like this
    const editList : Array<Edit<Selection>>
      = templates.map( (template) =>
                       new EngulfEdit( template ) ) ;
    return alt( editList )  ;

Movement edits

There are several classes of edits that only affect the path, focus, and anchor. These are used in the play editor to implement, among other things, the response to pressing the arrow keys. For example, if the selection is a shown on at the left, pressing the right arrow key four times will move through the sequence shown here. The right arrow edit only stops when the position corresponds to a drop zone. In the example, that explains why it skips the positions under the 3 and the x node.

The space bar is used to select the parent. For example the edit associated with the space bar will take the selection on the left below to the selection on the right:

The tab key is associated with an edit that attempts to move the selection to the right until a suitable place for insertion is identified. Suitable places for insertion are either positions where there is a drop zone or selections where there is one selected node and it is some sort of placeholder node. An exception is that drop zones that occur just before or after placeholders are skipped. Some tab edits are shown below

Open and closed labels

Several classes of labels are associated with a string and these labels are capable of being in two states, open and closed. Open nodes are rendered by the GIU as text fields. Typing a letter, say 'a', attempts to insert an open identifier node with a string value of "a".  Subsequent characters typed go to the text field and, when the text field is closed, the label is updated and closed.  Numbers are entered in a similar way. Function calls are entered in almost the same way; typing a '`' creates a new function call, but the name of the function is empty.

The text field can be closed by typing an enter or a tab key. If an enter key closes an open node, then that node (now closed) will the selected node. If a tab key closes an open node, a tab edit is added on, so there will be a search for the next suitable place for an insertion. The figure below shows the distinction between an open node (green) being closed with an enter versus a tab.

This distinction turns out to be very useful for entering expressions in either infix or prefix order.

Other useful edits

Some other useful edits include moves and swaps.  Both of these make substitutions at two points in the same tree. We need moves and swaps since breaking these edits down into constituents may cause the tree to become invalid temporally and that is not allowed. For example we can swap second and third children of an if-node. (Equivalently we can move the second child to the fourth position. These are both legal edits. But achieving the same result by a sequence of simpler edits will not work. For example duplicating the second child to make a fourth child and then deleting the third child would mean that the if has 4 children at one point in time and this is an invalid tree.
Another category of useful edits expands the selection in some way. In the editor, these are attached to pressing arrows while the shift key is held down.


Direct manipulation of abstract syntax trees part 0. Infix and prefix tree entry

Direct manipulation of abstract syntax trees part 0. Infix and prefix tree entry

This is the first of a two part post. Part 1 is here.

Mental friction

Programmers are constantly going back and forth between multiple levels of abstraction. Of course this is true in terms of data structures and libraries, but here I'm just thinking about the text of the program itself. There is a basic, read, think, change cycle.  We start with reading. At the lowest level we have an array of coloured dots on a screen. The programmers brain can quickly turn this into a two dimensional array of characters. Next we group the characters into tokens. Then we mentally parse the arrangement of tokens to form a tree or --screen size being limited-- a portion of a tree. At this point we have read the code. Then we have to think about the meaning of the tree, decide what change needs to be made to the tree in order to perform the next task, be it fixing a bug or adding functionality or whatever. Once we know how the tree should look like we need to think about what the arrangement of tokens should be, then the arrangement of characters, finally we need to find a sequence of mouse motions, mouse clicks, and key presses that will change the current arrangement of characters to the one we want. Sometimes, I then go back and read the text again to make sure that the change I made matches the change I intended. Essentially we are constantly doing much of the work of a compiler -- lexing and parsing -- and then unparsing and unlexing.  Not every programmer experienced programmer is going to think of it this way, but this is my way of thinking about it. And of course people lex and parse the same way as compilers, for example we use heuristics based on indentation, which is something compilers generally ignore.
All this lexing, parsing, unparsing, and unlexing creates friction in the programming process. One of the goals of the PLAAY visual programming language is to eliminate much of this mental busy work by presenting the abstract syntax tree in a more direct form and to get away from the distractions of lexing and parsing.
However eliminating friction on the reading and comprehension end is not helpful if we create the same or more friction on the production and manipulation end. Furthermore, for experienced programmers, there is the additional hurdle that they are used to producing and editing code using text and some text editor or IDE be it Vim, Sublime, Eclipse, ... .
So we need a low friction way to enter and manipulate abstract syntax trees.
The rest of this post shows that entering syntax trees and in the PLAAY editor entering trees is easy and flexible.
It's flexible in that it allows trees to be created either top-down (prefix entry) or left-to-right (infix entry).
A later post will look at how the required transformations on trees are implemented.

Entering a simple expression in PLAAY

Prefix entry

I'll start with entering expression in using prefix order, i.e. entering operations before their operands.
Let's start with a simple expression

We can enter it with the following keystrokes
  • '+' creates an addition operation with two placeholders as operands, giving
  • '*' replaces the selected place holder with an multiplication operation, giving
  • 'a' creates an identifier and displays a text field, giving
  • tab closes the text field and selects the next place holder, giving
  • 'x' creates an identifier and displays a text field, giving
  • tab closes the text field and selects the next place an expression can be added, giving(Note that the multiplication operator can have more than 2 operands.)
  • another tab selects the next place holder, giving
  • '*', 'b', tab, 'y', tab give
That's 12 keystrokes versus 7 to type a*x+b*y.

Infix entry

We can also enter the same expression in using more traditional infix notation. We have
  • 'a' creates an identifier and displays a text field, giving
  • enter closes the text field and selects the identifier, giving
  • '*' engulfs the identifier with a multiplication operation; it then selects a placeholder, giving
  • 'x' gives 
  • enter replaces the place holder with an identifier and displays a text field, giving
  • space expands the selection, giving
  • '+' engulfs the identifier with an addition operation, giving
  • 'b', enter, '*', 'y', tab give
This took 12 keystrokes, the same as prefix entry.
One reason that it takes more keystrokes in PLAAY than in Java, C++ etc. is that in PLAAY identifiers can include characters like *,+ and spaces. If we imagine modifying the language so that this is not the case, we could arrange that typing an operator into a text field is equivalent to typing enter followed by the operator or space.  That accounts for 4 characters. A second reason is that PLAAY has no operator precedence. This means that a space is needed to expand the selection. It should be noted that the expression corresponding to (a+x)*(b+y) also takes 12 keystrokes to enter in PLAAY, no increase; whereas in C there are 4 additional keystrokes.

No keyboard

If a keyboard is not available (e.g., on an iPad or similar tabler) or the user prefers not to use it, there are alternative user actions that can construct the trees.

A larger example

Prefix entry a larger example

Although we've seen prefix and infix entry of trees before, it might be good to look at another example and think about the edit applied in response to each keystroke.
Suppose we want the tree

We can enter this in prefix form with following key strokes
  • ? < a tab b tab tab
  • : x tab * a tab ` s i n tab t h e t a tab down down
  • : x tab * b tab ` c o s tab t h e t a tab
That's 43 key strokes. (The down-arrow moves the selection right until a drop zone is found under an expression sequence node some other node that is formatted vertically.) Comparing to C or Java we have
    if(a < b) 
        x = b*sin(theta);
        x = b*cos(theta);
which takes 52 key strokes

Infix entry of a larger example

The same tree can be entered in infix
  • a enter < b enter space ?
  • x enter : a enter * ` s i n tab t h e t a tab down down
  • x enter : b enter * ` c o s tab t h e t a tab
That's 43 keystrokes, same as prefix entry.

Other manipulations

Entering code in the first place is probably where a visual language such as PLAAY has the biggest disadvantages. Once a tree is entered, it is much easier to manipulate in PLAAY's editor than with a text editor that is mostly oblivious to the structure of the program. The PLAAY editor makes it easy to select, move, delete, or duplicate entire subtrees representing units of meaning such as variables, expressions, commands, declarations, and types. The price is that some manipulations are harder. In particular any edit that would produce a syntactically incorrect program is forbidden.

PART 1 explores the coding behind the editor.


Exceptional measures

 The story so far

An earlier blog post introduced the process monad and the TBC (Take Back Control) library that is based on it. Read that first.

This entry outlines a few features added since that post was written.

Exceptions and the new .go method

I prefer not to program with exceptions. However, it was clear that the design of the TBC library shouldn't ignore that they do exist. First off, even code that is not intended to throw exceptions might do so if it's not coded correctly. Secondly, we need to be able to interact with libraries written by other people that perhaps are intended to throw exceptions. With the original design of TBC, any exceptions were ignored and potentially lost. For example, if f is a function that throws an exception then


will throw an exception. However

(pause(1000) > exec(f)).go(k) 

will not throw an immediate exception; one second later, though, an exception will be thrown that our code can not catch. What happens to it depends on how the event dispatch loop deals with uncaught exceptions. On a web browser, for example, uncaught exceptions are simply ignored.

To make it possible to deal with all exceptions, the .go method now has a second argument. This one indicates what to do with an exception that is thrown while executing the process. If p is a Process<A> and k is a function from A to Void, and h is a function from Dynamic to Void, [Footnote: If you don't know what Dynamic is, don't worry, it will soon be explained.] then we can make the call

p.go(k, h)

If executing p completes normally, k is called with the result as argument. If executing process p throws an exception, h is called with that exception as argument.

In Haxe, exceptions can have any type at all. The type Dynamic in Haxe represents a value that could be of any type whatsoever. This means that exception handling code needs to be prepared for any exception that the process might throw.  If you don't expect any exceptions, a good policy is to at least print each exception to a log. In a browser we might pop up an alert, at least during testing.

Throwing and catching

The second argument to .go allows one to set up a global exception handler. But it doesn't provide a mechanism for a process to recover from an exception and carry on. In many languages (Java, JavaScript, C#, C++, etc.) one can recover from exceptions using a try-catch statement.
In TBC a process can recover from an exception and keep going. Exceptions are caught using the static attempt method. [Unless otherwise mentioned static methods are members of the TCB class.]. If  p is a Process<A> and  c is a function from Dynamic to Process<A>, then attempt(p,c) is a  Process<A>. Running attempt(p,c) is the same as running p, except that, if an exception is thrown while running p, c is called with the exception and the resulting process is run.

We can throw an exception using the toss function.  toss(e) is a process that throws exception e. Here e can be an expression of any type at all.

TBC also has the equivalent of Java's finally clauses. An optional third argument to attempt, which should be a Process<Triv>, will be executed regardless of whether there is an exception and whether or not it is handled or re-tossed. The call looks like this attempt(p,c,q). The process q will be run after p has completed normally. If p throws an exception, then q runs after the result of calling c has completed running. If the result of calling c tosses an exception, q will still be run. Assuming the execution of q does not toss an exception, the result (normal or exceptional) of running attempt(p,c,q) will  be the same as the result of running attempt(p,c) would have been.

There is also a function ultimately(p,q) that runs q as a finalizer regardless of whether or not p succeeds or fails. It is equivalent to attempt(p,c,q) where c is the function
function(ex:Dynamic) : Process<A> { return toss(ex) ; }
The behaviours of attempt and ultimately follow the example set by Java's (and JavaScript's) try-catch-finally with one difference: In Java, an exception thrown by the finally part will replace any exception thrown from the catch part or the try part. In TBC, the two exceptions are combined in a Pair, and the Pair is thrown.  In any case it's a bad idea to use a process that might throw an exception as a finalizer.

It's not recommended to use exception handling as part of normal control flow. I included attempt, toss, and ultimately, so that there is a way to cope with exceptions that might be thrown from library code or because of programmer error, not so that programmers are encouraged to throw their own exceptions.

Future posts will discuss loops, ajax calls, and communication on channels.


Narrative and State-Based Descriptions

Narrative and state-based descriptions

Software developers need to solve the most complex problems of the modern world using a brain that evolved mostly in the paleolithic age and earlier. How can we tap into the cognitive strengths of a hunter-gatherer to better produce and maintain software?

As we know, software development is a matter of communication as well as logic. We need to communicate clearly with the machine, of course, but just as importantly we need to communicate with other software developers and our future selves. Furthermore, clear thinking is largely a matter of communicating with our present selves. Have your thoughts ever been clarified through the act of writing? This happens to me all the time; it certainly happened while writing this essay.

In computing we often need to describe sets of sequences. Some common examples include: sets of sequences of symbols (i.e., formal languages); sets of sequences of operations on a set of objects; sets of sequences of interactions with external agents.

These sets are typically infinite and may include infinite sequences, so we need some finite way to describe the sets.

There are, I think, two modes of thought that we use when creating finite descriptions of large sets of sequences. I'll call these the narrative mode —I'm inventing the name here, I haven't seen a name put to the concept before—  and the state-based mode.

Narrative descriptions tap into the deep human habit of telling stories. We generally tell stories in order: this happened and then that. If we are giving someone directions, we might say "Keep walking north on Main Street until you get to the post-office, then turn left" or "Ring the bell, then if no one answers the door, go around to the back."

State-based descriptions describe how to react in different situations. They are a collection of rules: "If you are at the library, turn north and start walking. If you are walking north on Main Street and are south of the post-office keep walking. If you are walking north on Main Street and at the post office, turn left." The order of the rules doesn't matter because the sequencing is done via states.

With narrative descriptions the states are in the background. You can still see them in the negative space, e.g., the gap between two symbols in a regular expression or the line-break between two commands in a program. They are there, but unnamed and often unmarked.

Just as there are two modes of thought, there are two kinds of description formalism, which I'll also call narrative and state-based. Narrative formalisms have a bias toward the narrative mode of thought and state-based formalisms have a bias toward the state-based mode of thought.

State-based formalisms include finite state automata, Turing machines, flow charts, Harel's and UML's state charts, machine code, and programming languages like Fortran II. The last two might be considered hybrids, as not all states need to be named. If we add explicit stack operations to finite state automata, we get pushdown automata. If we add implicit stack operations to finite-state automata, we get syntax diagrams (sometimes called railroad diagrams); recursive transition networks are essentially the same thing.

Narrative formalisms include regular expressions, structured programming languages without recursion (say a language with just sequential composition, if-then-else commands, and while loops), process algebras, dynamic logic, and use cases.

Descriptions in narrative formalisms are trees in which the leaves are primitive narratives and the nonleaf nodes are operators on narratives. Descriptions in state-based formalisms are of sets of rules. Narratives are explicitly structured. State-based descriptions are not.

(Another name for narrative formalisms might be algebraic, emphasizing the analogy of sequential composition with multiplication, choice with addition, etc. Narrative formalisms tend to be semirings. However, in this essay, I'd like to emphasize the relationship of what I've called narrative formalisms to stories. There are lots of things that are algebraic but not are narrative; number theory is an example.)

While naming states doesn't seem to be particularly useful in the narrative mode, naming stories is. A narrator may jump from what he did in Paris to what he did in Berlin, but mention that they are leaving the story of the train ride for another day, or perhaps that is a story they already told before and they just refer to it now. In either case, giving the substory a name helps keep everything straight and promotes modularity.

Allowing named stories to refer to each other or themselves recursively is another step. Adding recursion to regular expressions gives extended-context-free grammars (a.k.a. extended BNF or EBNF). Recursive stories (unlike iterative loops) are not something that seem to be supported by our hundreds-of-thousands-of-years-old habit of story telling.

This may explain why recursion is often harder to learn for students than the other elements of the narrative mode. In a sense, though, recursion is just an extension to naming, so I'm inclined to say we are still in the narrative genera of description formalisms. And we know that, while the looping in extended CFGs reduces the need for recursion, there are still many problems that are elegantly solved by recursion. By the same token, we can add recursive procedures to a structured programming language and still be narrative.

By restricting extended-context-free grammars to only use sequential composition, we get context-free grammars (a.k.a. Backus-Naur Form, a.k.a. BNF). It would seem that adding a restriction to a formalism can't change it from narrative to state-based.  After all any description we can make in the restricted formalism we could have made in the unrestricted formalism, so if the description was narrative before it must still be narrative in restricted formalism. However, in fact we've started down a slippery slope: If we restrict further that all productions have the form A --> xB, we arrive at finite state automata.

So the lesson here is that narrative and state-based formalisms aren't clearly separated. Rather, as I said before, there are narrative and state-based modes of thought and different formalisms promote one mode over another. Because EBNF is so flexible, we can use it to write descriptions in either a narrative or a  state-based mode, but, because it makes narratives easy, it tends to promote a narrative mode of thought. How does it make narrative easy? Because, by using a more narrative style, I don't have to introduce as many nonterminals.

For me EBNF is a nice compromise. The same compromise is made in many modern programming languages. We have structured constructs (sequencing, choice, repetition) to construct our narratives, but also subroutines to give our narratives names. In theory we could use the recursive subroutine mechanism to write completely unstructured code, but in practice this rarely happens.

UIs, asynchronous programming, and inversion of control

So what motivated these thoughts? A few years about I asked myself: How do we deal with a sequence of interactions with the user? Back in the days of command-line programs, we used a narrative form. We used to write code like this

     ask the user for input

exit when there is no more input

     process the input

     output the result

end loop
More recently, use cases have become a popular way of describing sets of sequence of interactions during system specification. Use cases are very much narratives.

However, with graphical user interfaces has come event-driven input. Here we are expected to set up a set of event handlers and then the main program exits. From that point on the code executes a series of reactions to the various possible input events. The reaction to an event may depend on what has happened in the past and so we have a state dependent response. This is particularly true with actions like mouse movements or mouse clicks whose interpretation depends on past inputs.
onStart :  showQuestion(); showAnswerField() ; state := 1

onInput: if state=1 then hideQuestion(); hideAnswerField() ;
                 showOutput( process(getInput()) ) ; showNextButton() ; state := 2 end if

onDone: if state=1 then hideQuestion(); hideAnswerField() ; state := 3 end if

onNext: if state=2 then hideOutput() ; hideNextButton() ;
                 showQuestion() ; showAnswerField() ; state := 1 end if
There is a name for this sort of programming: Inversion of Control.

Writing programs using inversion of control essentially means throwing out the narrative formalism provided by our programming language: We no longer use whiles to create loops, nor ifs decide choices, at least not when the choice is dictated by the user's choice of the next input event. The result is an approach to programming that is unstructured and inviting of ad hoc solutions. It is difficult to write and difficult to read. As a result it is difficult to verify. We can fight the ad hoc solutions by using principled approaches to inversion of control such as the state pattern. This improves the situation somewhat, but it underscores that we are now living in the land of state-based thinking.

User interfaces are just one area where we are forced by our technology away from narrative and towards state-based modes of thought. Others include interacting with network communications. In general we call this asynchronous programming.

Computer scientists developed narrative formalisms to cope with asynchronous programming in the 70s and 80s, starting with threads and monitors and later with rendezvous and process algebras. Process algebras are very close to --and inspired by-- context-free grammars, especially in how they deal with choices imposed by external agents; however ideas from process algebras have not yet made it into mainstream language design nor into libraries for mainstream languages.

When I started thinking about a way to design and implement user interfaces in a more narrative way, I thought about using context-free grammars and perhaps generating UI code using something similar to a parser generator. However I then realized that I could follow an approach similar to monadic parsing, i.e., writing a library rather than a tool. But backtracking is more constrained in an interactive system than when parsing; we can't backtrack over output events that the user has already seen. Thus some changes from parsing monads are required; taking a monadic leads to implements a process algebra as a library or an embedded domain specific language; using such an embedded DSL, we can hide the inversion of control under an abstraction layer. Above that layer we write applications in a narrative style. I call my library "take back control". I've written more about it in another blog post.

Elsewhere I've described the difference between inversion of control and upright control as a difference between pushing and pulling. With inversion of control, a sequence of events is being pushed at out program and it must react to each one as it arrives. Pulling events means the code is written to pull events into the program, events are dealt with only when the program is ready to deal with them. This contrast between push and pull does not really explain why I find pull code preferable to code that simply reacts to a pushy world. This post is meant to fill that gap: The pull approach fits a narrative mode of thinking, which I think is easier for us humans, with our paleolithic brains, to cope with.


How to do redo, and undo revisited


An earlier post looked at 5 techniques for implementing undo/redo.

Here I want to take another look the problem and look at a solution based on immutable data structures.

Essentially it is a check-pointing solution, but by using an immutable data structure there is essentially a zero time cost to making a checkpoint.

Nodes, paths, and trees

The running example from the earlier post used a tree as the data structure with the following interface.

public interface TreeI {
    NodeI makeNode( String text ) ;

    NodeI getRoot() ;

    void insert( NodeI parent, int posn, NodeI child ) ;

    void remove( NodeI parent, int posn ) ;

Our immutable version will be a bit different. We start with nodes. We want to have nodes that can't change. Our node class only has constructors and accessors, not mutators.

public class Node {
public Node ( String text, List<Node> children ) { ... }

    int getChildCount() { ... }

    Node getChild( int i) { ... }

    String getLabel() { ... }


We can identify any node in a tree by its path from the root. For example, [] represents the root of a tree, [0] represents the first child of the root, [0,1] represents the second child of the first child of the root, and so on. For this scheme to make sense, the tree can never be empty; there will always be a root node. Now we change the tree interface to

public interface TreeI { 

    public Node getRoot( ) ;

    public void replace( Path p, int i, int j, List<Node> newNodes );

    public void changeLabel( Path p, String newLabel) ;


The replace function replaces all children of the given node from position i up to (but not including) position j with a list of new nodes. The changeLabel method changes the label on a node. Both these functions work by creating new nodes as needed. For example replace( p, 0, 1, ns ) where p is the list [0,1] will build three new nodes, all the other nodes of the new tree (apart from possibly the ones in list ns) will be shared with the old tree.

The interface is implemented by a class

public class Tree implements TreeI { 

    protected Node root = new Node( "", new ArrayList<Node>() ) ;

    public Node getRoot( ) { return root }

    public void replace( Path p, int i, int j, List<Node> newNodes ){ ... }

    public void changeLabel( Path p, String newLabel) { ... }


Implementing undo and redo.

Once we have an immutable data structure undo and redo are very simple. The main difference compared with the checkpointing solution of the earlier post, is that we don't make copies of the model.

public class TreeWithCheckpointing extends Tree {

    Stack<Node> olderTrees() = new Stack<Node>() ;

    Stack<Node> newerTrees() = new Stack<Node>() ;

    public void checkpoint() {
        olderTrees.push( root ) ;
        newerTrees.clear() ; }

    public void undo() {
        if( olderTrees.size() > 0 ) {
            newerTrees.push( root ) ;
            root = olderTrees.pop() ; } }

    public void redo() {
        if( newerTrees.size() > 0 ) {
            olderTrees.push( root ) ;
            root = newerTrees.pop() ; } }



  • This approach is very simple and fool proof. We don't need that the model be copyable, as we only copy pointers.  
  • There may other advantages to using immutable data structures. For the system where I used this, the main reason for using immutable data structures was to enforce invariants. If there are no mutators, we only have to worry about object invariants at node construction time. Another advantage is in reusing view object; we can make a cache mapping nodes to view objects, since the nodes never change, the view objects are sure to be accurate representations..
  • Because there is no copying, it is more efficient than the checkpointing solution with mutable data.


  • The main disadvantage is the awkwardness of an immutable data structure. It works well for trees, but for an arbitrary graph we have to give up on using ordinary pointers to represent edges.


Take Back Control. GUIs and other event driven programming without inversion of control.

[More information:

GUI programming and inversion of control

Writing graphical user interfaces can be a tricky business. Typically we set up a set of event handlers. Each handler waits for its event and based on the current state moves the state to some other state. This arrangement is commonly called "inversion of control".

This is fine for simple GUIs, but once the system gets past a certain level of complexity it is rife with opportunities for error.  What if an event happens in a state that we didn't anticipate. What if we forget to disable events? Once we add in asynchronous IO such as HTTP requests, new complexities arise.

One solution is to carefully draw out a state machine --perhaps using state charts-- and use the state pattern to ensure that states are entered and and exited cleanly and that transitions only happen when we are in the correct state.

However state machines are at least one level of abstraction down from the usual structured programming design notations such as sequential composition (f followed by g), if-then-else, while-do. It's a lot like programming with only conditional goto statements.  Furthermore algorithmic abstraction (subroutines) are not available with state machines, unless you want to get in to pushing states on a stack. What's worse, state machines are a level up from the code you write. You might have a lovely state machine drawn on a white board, but once you've translated it to code all you have is an unstructured pile of code. Even if you keep that code well organized, for someone else to really understand that code, they'll need to translate it back to a nice picture. The solution is to keep the code and a diagram of your state machine in sync, but we all know that code and its documentation tend to drift apart. Furthermore, state machines do not correspond well to use-cases, which makes it harder to verify that your design meets all the use cases and only the use cases.

So, if you start from use cases, there are two design steps to take: translate from use cases to a state machine; translate from the state machine to event driven code.  Each step changes the paradigm and each is a potential source of error. Furthermore, if you are maintaining code, the first thing to do is to understand what it does. To reconstruct the use-case model from event driven code requires reverse engineering the same steps: translate from event driven code to a state diagram; translate the diagram back to use cases.

So my aim was to allow a more direct translation from use cases to code: a way to avoid inversion of control.

Reverting control

When my students first design GUI programs, I have them create sequence diagrams for their software. Often the diagrams come back looking something like this -- translated back into code:
    function main() {
        displayWindow() ;
        while( true ) {
            waitForUserToClick() ;
            reactToClick() ; }
This makes a lot of sense if you consider that, in all their previous courses, they've been writing console applications where input comes only when it is asked for.  It also makes sense if you consider that a GUI is --in essence-- a set of sequences of interactions and the students already know a language for expressing sets of sequences of actions, namely the ordinary programming constructs of sequential composition, choice, iteration, and function calls.

My contention is that these students, who might seem naive to those of us used to writing event driven GUIs, are on to something worth looking into.  Is there a way to realize a GUI without inversion of control?

The goal is to be able to describe the user experience in a simple and executable notation. Here is an example of such a notation
    displayWindow >
        await( click( b ) && reactToClick )
Here b is an object representing a button, displayWindow and reactToClick are functions representing changes to some model and/or view, and loop, await, click, > and && are functions and methods from the library I'll be describing in the next section.

The tbc library

The remainder of this post describes an experimental library named tbc (for "Take Back Control"), written in Haxe to explore these ideas. The language I use below is Haxe, but everything could be translated into just about any language that has lambda expressions and closures with lexical scoping, for example JavaScript, Scala and many many others.

The library is structured as several modules, so far these are tbc.TBC, tbc.TBCTime, and tbc.TBCHTML. Unless otherwise mentioned all names are exported from either the TBC module, or are static functions of the TBC.TBC class. It can be found at https://github.com/theodore-norvell/tbc .

I've been thinking in terms of web development, which is why there is a  TBCHTML, but one could also have a TBCSwing or some such. One of the nice things about Haxe is that it can be translated into lots of useful languages such as JavaScript, Java, C++, and so on. The core TBC module is target independent -- can be used regardless of the target language; TBCTime works with some targets but not all; TBCHTML is specific to JavaScript.


Central to the library is a generic data type, Process<A>, which is the type of all processes that compute values of type A. For example an object of type Process<Int> is an object that can compute an integer value.  Executing a process may cause side effects to happen; side effects include changes to state, output, input, and delays; delays include waiting for input and waiting for time to pass.

We can execute a process using its .go method. This method causes the process to execute, but there is nothing in the semantics of .go that says that it will finish immediately, it might finish after some time, and, in particular, after some interaction with the user.  The argument of .go is a void function that takes an A. For example if p is an object of type Process<Int> then
    p.go( function( x : Int ) { trace(x) ; } ) ;
will cause p to start to execute and, when it has finished, the Int it produces is sent to the trace routine. The argument
    function( x : Int ) { trace(x) ; }
is an anonymous function or lambda expression.  The trace function is a standard Haxe function for outputting a message to a log.

One particularly useful process is unit(x), which is a process that computes value x immediately with no delay and does nothing else. So
    unit(42).go( function( x : Int ) { trace(x) ; } ) ;
is a complicated way to output 42 to the log.

[Update 2017 November: The .go method now takes two arguments instead of one. The second is a function that deals with exceptions. If the execution of a process results in an exception, that exception is passed to the second argument. Read more about it here. However for the rest of this post, we'll just pretend that .go takes only one argument.]

As an abbreviation skip() stands for unit(null). skip() has type Process<Triv> and Triv is a type that has null as its only value. So skip() is a process that has no side effects and computes an information free value.

If f is a function of type Void->A (i.e., a function with 0 arguments that returns a value of type A), then exec(f) is a Process<A> that executes by calling f, and the result of the process is the result of f. The exec function is useful for inserting side effects, such as changing or reading global state, into the computation.

Another kind of side effect is to spend time doing nothing. In the TBCTime module, there is process function pause so that pause(n) is a Process<Triv> that completes after about n milliseconds.

We can combine processes in a number of ways.  p.sc(q) is a process that, when executed, executes process p, throws away its result and then executes process q; the value computed by p.sc(q) is the value computed by the execution of q.  As an example
    pause(1000).sc( exec( hello ) )
is a process that executes function hello after a delay of 1 second.

You can think of sc as standing for either "sequential composition" or for "semicolon". Sequential composition is associative
    p.sc(q.sc(r))  =  p.sc(q).sc(r)
And skip() and unit(x) (regardless  of x) are left units of sc
    p  =  skip().sc(p)  =  unit(x).sc(p)
Furthermore skip() is a  right unit
    p  =  p.sc( skip() )
provided p is of type Process<Triv> .

As a convenience the > operator is overloaded to mean sequential composition. So
    p > q > r
    p.sc( q ).sc( r ) 
The laws are
    (p > q) > r  =  p > (q > r)
    p  =  skip() > p  =  unit(x) > p
    p  =  p > skip()
provided (for the last law only) that p is of type Process<Triv> .

Besides sequential composition, we can imitate other sequential programming constructs. A process alt( p, q, r), where p is of type Process<Bool> and q and r are both of type Process<A> (for any type A) is a Process<A> that mirrors if-then-else statements. And loop(p) is a Process<Triv> that is equivalent to an infinitely repeated sequential composition of p with itself. And so on.

A useful generalization of sequential composition is bind. The problem with sequential composition is that it throws away the result of the left process. Bind solves this problem. If  p is a Process<A> and f is a function from type A to type Process<B>,  then p.bind(f) is a Process<B>. To execute p.bind(f), first p is executed, then its result is fed to f, finally the resulting process is executed, delivering the result. bind is the familiar bind (or flatmap) operation from monads.  As an abbreviation, p.bind(f)can be written as p >= f. Bind and unit obey the usual monad laws
    (p >= g) >= h  =  p >= function(y) { return g(y) >= h ; }
    p >= unit  =  p
    unit(x) >= f  =  f(x)
provided that y is not free in either g or h. The first law is not used much in practice exactly because often y is free in g or h. People tend to use bind as a combination of sequential composition and local variable declaration. So you might see code like this
    p >= function( x ) { return
    ...x... >= function(  y ) { return
    unit(...x...y...) ; } ; }
This is so common in Haskell that there is a special notation for it
    do x <- p
       y <- ...x...
       return (...x...y...)
[Update 2018: I'm working on a macro that will bring the same level of brevity to tbc.]

The Process type is also called the Process monad.

If we were to define the operation f*g to mean function(x) { return f(x).bind(g); } (where x is not free in f or g), then the three laws concerning bind can be written in a very familiar way
    (f*g)*h  =  f*(g*h)
    f*unit  =  f
    unit*f  =  f 
In practice >= is easier to use than *, and so we define >= but not *.

Much has been written about monads elsewhere, which means I don't need to write much here.  I'll just note that p.sc(q) is defined in terms of bind to be
    p.bind(function(x) { return q ; } )  
And so
    (p > q) >= f  =  p > (q >= f)
    (p >= f) > q  =  p >= function (y) { return f(y) > q; }
provided that y is not free in f or q. Furthermore loop(p) is defined to be
    p.bind( function( a : A ) { return loop(p) ; } ) 

In Haxe, >= and > are at the same level of precedence and associate to the left. For example
    p > q >= f > r >= g
    (((p > q) >= f) > r) >= g
which means

[Note for Haskell programmers: Just remember that Haskell's >> is TBC's > and Haskell's >>= is TBC's >=; i.e. drop a > from the Haskell notation to get the TBC notation. ]


If p is a Process<A> and q is a Process<B> object, par(p,q) is a Process<Pair<A,B>> that, when executed, executes both p and q, possibly interleaving their various side-effects. Whenever both are delayed, either one may be the next to proceed. Note that everything actually runs on one thread.

For example
            exec(outX) > pause(2000) > exec(outX)
            exec(outY) > pause(2000) > exec(outY)
is a process that executes functions outX and outY twice each. The first outX and outY are immediate, while then latter two are executed about 2 seconds later. In theory, any of the interleavings
    outX outY outX outY
    outX outY outY outX
    outY outX outX outY
    outY outX outY outX
are possible.

Fork-and-join parallelism is also possible, although I haven't implemented it yet.

Guards and Guarded Commands

An object of type Guard<E> represents some category of input events that might happen in the future. For example, if b is a button element on a web page, then click(b) is a guard representing clicks on the button. click is in the TBCHTML module. The type parameter E is the type of objects representing information about the events, for example the coordinates of a mouse click. The type click(b), if b is an HTML button element, is Guard<Event>, where Event is the type of DOM events in HTML.

If g is a Guard<E> and function f is of type E -> Process<A>, then g.guarding(f) is a GuardedProcess<A>. A guarded process might be thought of as a process that starts with an event.

A method await turns a set of guarded processes into a process. So if m0 = g0.guarding(f0) and m1  = g1.guarding(f1) are guarded processes of the same type, then await(m0,m1) is a process. The process await(m0,m1) is executed as follows: First time passes until one of the events associated with m0 or m1 happens. Say an event that g0 represents happens, then execution continues with the execution of f0(e0), where e0 is the information associated with the event.

When the process await(m0,m1) begins, the event categories associated with m0 and m1 are enabled. In the implementation, this may mean installing event handlers and changing the appearance of GUI widgets. When one of the events fires, the event categories are disabled before execution proceeds with the corresponding guarded process. If both m0 and m1 are triggered by the same event, only one of them proceeds; which one is up to the implementation.

Here is an example of guarding
            await( click( b0 ).guarding( out0 ) ) >
                click( b1a ).guarding( out1A )
                click( b1b ).guarding( out1B )
            ) >
           await( click( b2 ).guarding( out2 ) )
Here the b0, b1a, b2a, and b2 are button elements on an HTML page and out0, out1A, out1B, and out2 are functions returning processes that make changes to the page.  The interaction with the user proceeds as follows.
  • b0 becomes enabled.
  • When b0 is clicked, it becomes disabled and buttons b1a and b1b become enabled.
  • When one of these two buttons is clicked, both become disabled and b2 becomes enabled.
  • When b2 is clicked, it becomes disabled and the process continues from the start.
The use case scenario corresponding to this process is
  1. The user clicks on b0
  2. The system makes change out0
  3. The user clicks either b1a or b1b
  4. If the user clicked on b1a
    • The system makes change out1a
  5. Else the user clicked on b1b
    • The system makes change out1b
  6. The user clicks on b2
  7. The system responds with change out2
  8. Back to 1.
Of course a real use case would explain what out0 etc are, but none of that matters to us. The point is the close correspondence between the use case and the code.

The >> operator is overloaded to mean guarding, so the last example can be written as
        await( click( b0 ) >> out0 ) >
            click( b1a ) >> out1A
            click( b1b ) >> out1B 
        ) >
        await( click( b2 ) >> out2 )

In some case we might want to throw away the event information. This is easily done using
    g >> function( e : E) { return p ; }
which is a GuardedProcess<A> if g is a Guard<E> and p is a Process<A>.  For example we might want to simply wait for a click on button b. This Process<Triv> will do that
    await( click(b) >> function( e : Event ) return skip() ; } )
As a convenience, guards have a .andThen method to handle this situation less verbosely. We can write
    await( click(b).andThen( skip() ) )
or, using operator overloading,
    await( click(b) && skip() )
Since && has lower precedence than > and >= we can write
    g && p > q >= f
which will be parsed as
    g && ((p > q) >=f) 

The .sc and .bind methods can also be applied to guarded processes on their left. Thus
    g >> m >= f
which is parsed as
    (g >> m) >= f
means what you would expect: when g is fired, the event is sent to m and then the result, a, of executing m(e) is sent to f and the result of executing f(a) is the result of the whole thing. The following associativity law.
    (g >> m) >= f  =  g >> function(e) { return m(e) >= f ; }
holds provided e is not free in m or f. Likewise
    (g >> m) > p  =  g >> function(e) { return m(e) > p ; }
holds provided e is not free in m or p. Furthermore
    (g && p) > q  =  g && (p > q)
    (g && p) >= m  =  g && (p >= m)

[Note on the notation. The && operator was chosen most because of its low precedence.  Furthermore | could combine guards, while || could combine guarded processes. So it is nice that && has lower precedence than | and higher precedence than ||.  (But note that >> has higher precedence than |.)]

[Another notational note: It is tempting to use > for .andThen and >= for .guarding.  Advantage: fewer operators to remember. Disadvantage: this notation obscures the types, making code less readable. For example, if you read a && p, it is clear that a is a guard and the result is a guarded process. But, with the alternate notation, if you read a > p, the type of a could be guard or process or guarded process and the type of the result can only be determined after determining the type of a. Also, with the alternative notation, skip() is not always a right unit of >, nor is unit always a right unit of >=.]

Guards can also be used for time events. The timeout guard fires if it's been enabled for more than a certain amount of time. For example, if the user fails to respond in a timely manner, we might want to move on. Process nag(null), defined like this
    function nag(triv : Triv) : Process<Triv>{ return
                click( b ) && exec(thanKTheUser)
                timeout( 1000 ) && exec(nagTheUser) >= nag
        ) ; }     ,
nags the user every second until the button is clicked.

In fact pause( t ) is defined by
    pause( t )  =  await( timeout( t ) && skip() )

Summary of the core operations

Suppose that
  • p is of type Process<A>
  • q is of type Process<B>
  • f is of type A -> Process<B> 
  • h is of type A -> C
  • g is of type Guard<E>
  • m is of type E -> Process<A>
  • gp, gp0, gp1, ..., gpn are of type GuardedProcess<A>
Meaning Syntax Type Operator syntax
Do p, then do q. p.sc( q ) Process<B> p > q
Do p, then do f(x), where x is the result of p. p.bind( f ) Process<B> p >= f
Make a guarded process that does p when fired. g.andThen( p ) GuardedProcess<A> g && p
Make a guarded process that does m(e) when fired, where e represents information about the event. g.guarding( m ) GuardedProcess<A> g >> m
Make a guarded process similar to gp, but that does that does q after gp has executed. gp.sc( q ) GuardedProcess<B> gp > q
Make a guarded process similar to gp, but that does that does f(a) after gp has executed, where a is the result of executing gp. gp.bind( f ) GuardedProcess<B> gp >= f
Fire one of these guarded processes. await( gp0, gp1, ..., gpn ) Process<A>
Do p again and again loop( p ) Process<Triv>
Do both p and q. par( p, q ) Process<Pair<A,B>>
Start process p. p.go( h ) Void

Related ideas

When I first looked into this a few years ago, I found Martin Odersky and Philipp Haller's 2006 paper, "Event-based programming without inversion of control", which introduced Scala's Actors library. The paper mentions user interfaces as an area for future work.  Recently I came back to this question, and the above documents some preliminary experimentation.

In addition to Odersky and Haller's actors, some of the inspirations for TBC include process algebras, such as Tony Hoare's CSP and Robin Milner's CCS; (extended) context-free grammars; parsing combinators.; monads as used in Haskell and more recently in other languages such as JavaScript; JavaScript's promises; and the command pattern as documented by Erich Gamma and company.

Further work

I'm happy with the basic abstractions of process, guard, and guarded process. However, the above is just an early exploration. There is much more to consider including
  • Internal events for communication between parallel processes. Processes can already communicate via shared global state. However this does not allow synchronization between processes. For that and to avoid using shared state, synchronous messages, asynchronous messages and rendezvous are the classic approaches.
  • Communication with external entities other than the user. In the HTML context, this includes XML HTTP requests.
  • Failures and recovery.
    • One approach is that exception catching can be emulated by means of a second version of bind. While bind says what to do next in case of success, onFail says what to do in case of a failure. We give processes a second type parameter so that Process<A,B> is a process that succeeds with a value of type A or fails with a value of type B. If p is of type Process<A,B> then and f is of type A -> Process<C,B>, then p.bind( f ) is of type Process<C,B>. Failure can happen either within p or within the result of f.  Symmetrically,  if g is a function of type B -> Process<A,D> then p.onFail( g ) is of type Process<A,D>. Here if p fails, there is a chance of recovery, should the result of g succeed. This is just like try{ p }catch( e ){ g(e) }. Note the complete symmetry between succeeding and failing.
    • Those familiar with JavaScript's Promise type will note that there is little original here.
    • Those familiar with monads will note that Process<A,B> is monad in two ways. I.e. we can fix either A or B to get the functor part of a monad.
    • The problem is that I'm not a big fan of try-catch, so before going down this road, I'd like to think about alternative ways to deal with failure and recovery.
    • [Update 2018: See my blog post on Exceptional Measures for information on the solution I eventually adopted.]