2018-05-16

How I use Git


Below are some thoughts on how I use Git and SourceTree.

First some terminology

Repository:
A database containing a bunch of commits, branches and other things.
Commit:
An immutable object. The value of a commit object is a state the whole file tree + pointers to its parents + a message. The address of a commit is a secure hash of its value. Thus two commit objects in different repositories with the same address will (with near 100% probability) have the same value. Commits usually have one parent, but may have 0 or 2; commits form a rooted directed acyclic graph. When we talk about a commit we might be talking about an object (which exists in one repository) or a value (which might be represented by objects in different repositories). It often doesn't matter which we mean.
Branch:
A variable whose value is the address of a commit, i.e., a hash of its value. Branches are local to repositories. (Even "remote tracking branches" are local to a repository.) Each repository has two sets of branches: Local branches are intended for local work. Remote tracking branches represent local branches of other (remote) repositories; however the value of a remote tracking branch could be out of date with respect to the branch that it is tracking. (Note that I used the word "local" in two sense in this paragraph.)
Currently checked out branch:
The branch that was most recently checked out. Making a new commit typically updates this branch.
Working copy:
A state of the file tree represented in a computer's file system. Each time you check out a branch, the working copy gets overwritten.
Index (also called the Staging Area).
A place on your machine where Git keeps changes that will become part of a commit in the future.
Merge:
A merge combines two commits to create another commit. As I understand it, if we merge two commits x and y that have a least common ancestor z, then the result commit w=merge(x,y) will contain all changes from z to x and also all the changes from z to y. Here is an example where we consider a file tree that contains only one file, so the state of the file tree is simply a sequence of characters. Suppose z is abcdef and x is acdef and y is abcdefg. The changes from z to x is {delete the b between a and c}. The changes from y to z are {add a g after the f}. The union of the changes is {delete the b between a and c, add a g after the f}. So w is acdefg. Sometimes it's not clear how to merge files, and in that case there is a "merge conflict". When x is the least common ancestor of y, then there is no need to create a new commit, so merge(x,y)=merge(y,x)=x. This is called merge by fast-forward.
Line of development:
A sequence of commits that may get added to over time. "Line of development" isn't really a Git concept, but I find them useful to think about. Often people use the term "branch" for this, but that's confusing because in Git a branch is a variable whose value is the address of a single commit; not a sequence of addresses of commits. Also, while each Git branch is associated with one repository, a line of development spans multiple repositories. I found Git much easier to use once I finally realized that branches and lines of development are different (but closely related) concepts. So next I'll try to explain with an example what I mean by a line of development.

More on "lines of development"


Consider this evolution of a system where commits are ordered in the order they are created (from left to right)
There are three lines of development here: the shared line, the x line and the y line. The x and y lines represent two different features and might be done by two different programmers. The shared line represents the amalgam of all completed features. Once a feature is completed, the last commit on its line is merged into the shared line. Once we have finished with a feature, we can delete the branches associated with it, but the lines of development remain. Commit x4 is particularly important. This represents the programmer catching up with all features completed since they started work on their feature -- in this case, just y. It's a good idea to make these catch-up merges each time we notice the shared line has been added to. (In the example the developer on the x line probably should have caught up earlier.) Running unit tests after these merges is important, since it can alert us to any conflicts that aren't flagged by Git. It's particularly important to make these catch-up merges (if needed) before merging back into the shared line. This ensures, that untested combinations of features never make it onto the shared line. (And, as we will see below, it also prevents merge conflicts from happening on GitHub.)

A point not captured by the diagram above is that, if we allow fast-forward merges, not all the commits shown in the picture are different. We will have y1=shared1 and x4=shared2. SourceTree  might display the graph above like this
which is simpler, in that it has fewer nodes, but doesn't clearly show the lines of development. Like I said above, lines of development do not correspond to anything in Git. They are just a product how we think about software development.

The five branches


Usually you only have to worry about two lines of development at a time: a shared line (typically called master) and a line that only you are working on. For illustration I'll call the shared line "shared" and the other line "feature". In implementation the lines of development are represented (sort of) by branches.

But thanks to Git being distributed, each conceptual branch x is replicated in a number of places:
  • There is GitHub's x, i.e. a copy of the branch that is on the server. [I'm assuming here that the central repository is GitHub, but it could just as well by Git Lab or Bit Bucket or a private server.]
  • There is a tracking branch in your repository; this is called origin/x.
  • And there is your local copy of the branch, which is called x.
That's 3 copies of each branch. I'll call them "GitHub's x", "my origin/x", and "my x". Plus everyone else may have one or two copies in their own repositories. So if 10 people are working on 1 feature each, that's 11 lines (10 feature lines + the shared line) and there could be up to 21 branches for each line of development (1 on GitHub and then each local repository can have a local and a tracking branch). So there are up to 231 branches in total. Luckily you usually only have to worry 2 lines at a time and you only have to worry about the copies on GitHub and the copies on your own machine. And, of these, I don't ever use my origin/feature, so that's only 5 branches you have to worry about:
  • GitHub's shared,
  • my origin/shared,
  • my shared,
  • my feature,
  • GitHub's feature.
plus the working copy and the index.
We try to maintain the following relationships at all times between the commits that are the values of these 5 branches. (Here ≤ means "is descended from".)
     my shared ≤ my origin/shared ≤ GitHub's shared

     GitHub's feature ≤ my feature


It's also a good idea to try to fold any changes made to the master into our feature as soon as they show on GitHub's shared branch. So we try to keep
     my feature ≤ my shared = my origin/shared = GitHub's shared
true as much as practical. (I.e., that my feature is descended from my shared, which is the same as the tracking branch which is up to date.) We do this with catch-up merges. This way, when we read, edit, and test our code, we are reading, editing, and testing it in the context of all completed features. Furthermore, when a pull-request is made we want
     GitHub's feature = my feature ≤ my shared = my origin/shared = GitHub's shared
That way merge-conflicts won't happen on GitHub's server.

Information flow

The flow of information that I use is shown in the figure. I'll explain each part below.


Basic operations

For the rest of the article I'll assume you are using SourceTree. Of course everything SourceTree does can also be done from the command line.

Some of the basic operations of SourceTree work like this (somewhat simplified)

"Fetch" updates all your tracking branches. So Fetch means my origin/x := GitHub's x, for every x branch in GitHub's repository. Typically we use Fetch to bring changes made to GitHub's shared to my origin/shared.

"Pull" means update my current branch from GitHub's repository. So Pull means my origin/x := GitHub's x ; my x := merge( my x, my origin/x), where x is the currently checked-out branch. Typically this is a fast-forward merge. (Usually I do a Fetch first and then a Pull if x is behind origin/x. When x is behind origin/x the merge is done by "fast forward", i.e., we have my x := my origin/x). Typically we use Pull to bring changes made to GitHub's shared to my shared.

"Branch" means create a new branch.  It means y := x  where y is an existing branch and x is the currently checked-out branch. y becomes the currently checked-out branch. Typically we use Branch when we start working on a new feature.

"Merge" means my x := merge(my x, my y) where x is the currently checked-out branch and y is another branch. Usually we either merge my shared into my feature. In the flow I use, merges are always merging my shared with my feature to make a new value for my feature branch.

"Check out" updates the working copy to the value of a particular commit. In the flow this is used to check out my feature branch. Some operation in SourceTree only apply to the currently checked out branch, so there are times you will check out a branch just so you can do something else with it, such as a pull.

"Stage" Staging means moving changes that are in the working copy to the index.

"Commit" Commit makes a new commit based on the changes in the index.  Of course it updates the currently checked-out branch.

"Push" means update GitHub's copy of the branch; it also updates the tracking branch. So Push means GitHub's x := my x; my origin/x := GitHub's x, where x is the currently checked-out branch. In the work flow, Push is used to push commits on my feature branch to GitHub's feature branch.

"Make and merge a pull request". A pull request is a request for someone else to review the changes on a branch and to merge one branch into another.  (Pull requests would be better named "merge requests" in my opinion.)  Pull requests are not a feature of Git, but rather of hosting services such as GitHub. SourceTree can help you create merge requests. The actual merging of the pull request is done using GitHub's web interface.

Recipes for common tasks


Here are some recipes for doing some common tasks with SourceTree.

Catch up the shared branch

  1. In source tree click on Fetch
  2. If shared and origin/shared are the same, stop
  3. Check out the shared branch by double clicking on "shared" under "Branches" on the left sidebar.
    Click on Pull to get the local shared branch up to date with origin/shared

Make a feature branch

  1. Catch up the shared branch (see above)
  2. Check out shared (if not already there).
  3. Click on Branch.
  4. Type "feature" as the New Branch. Click ok.

Catch up the feature branch.

(Do this fairly frequently)
 
  1. Catch up the shared branch (see above).
  2. If shared is an ancestor of feature you are caught up. Stop.
  3. Check out feature (if not already the checked out branch).
  4. Click on merge.
  5. Select shared.
  6. Click OK.
  7. Check for any merge conflicts. If there are merge conflicts they need to be resolved. That's a whole other story. (Maybe another blog post.)
  8. Even absent merge conflicts, there may be silent problems that prevent compilation or introduce bugs. So carefully inspect all differences between the merged version and the previous version of feature. Also recompile and run unit tests.
  9. Click on Push.
The final push is optional, but it saves your work.  Also you need to do it if you are going to make a pull request -- more on that below.

Make your own changes


  1. Check out feature (if not already the checked out branch).
  2. Make changes to the files. Run tests.  Etc.
  3. Back in source tree, Cmd-R (Mac) or Cntl-R (Windows) or View >> Refresh
  4. Select "Uncommitted changes"
  5. Review all unstaged changes.
  6. Stage all changes you want as part of the commit.
  7. Click Commit. (This doesn't actually do the commit.)
  8. Enter commit message
  9. Click on "Commit" button at lower right. (This does the commit.)
  10. Push the new commit to the origin, by clicking Push and OK.
  11. If you've never pushed the branch before you may need to check a box in the previous step before clicking OK.
Pushing the new commit to the origin is optional, but it is good to do for a couple of reasons. One is that it saves your work remotely. The other is that it lets other people on your team see what you are doing.

Merge your feature back to the shared branch.

(Do this when you think it's complete and ready for review.)


  1. Catch up the feature branch. (See above.) Be sure to push the feature branch to the server.
  2. If there are any problems, such as merge conflicts or failed tests, make sure they are all resolved before going on.
  3. On GitHub, make a new "Pull Request", being careful that it is a request to pull feature into shared.
  4. At this point, you might want to request someone else to review the pull request.
  5. Wait for comments or for someone else to merge the pull request.
  6. Or if no one else merges the pull request, merge it your self.
When there are comments that need to be addressed, you can modify your feature branch and push it again.  Pull requests are based on branches, not on commits. So when you push new commits on your branch they become part of the pull request.   If there are changes to the shared branch between the pull request being made and the the feature being merged, it's important to redo the process above so, for example all tests can be run on a caught up version of the commit.

No comments:

Post a Comment