Dodging Merge Conflicts in git

A small bunch of orange bonnet mushrooms grow from a crack in the side of a rotten log
Mushrooms don't care about navigating merge conflicts in distributed version control systems, and you know what? It shows.

Introduction

Of all the source control management systems that have ever been created, git is certainly one of them. You've probably used it, and been burned by a particularly complicated merge conflict. Resolving merge conflicts can be such a harrowing experience that I've seen the anticipatory fear of it drive teams to introduce large swathes of process to avoid conflicts in the first place, usually to the detriment of throughput.

With this post I'm going to step through a scenario that has resulted in particularly nasty merge conflicts for me in the past, and introduce a simple way to sidestep the whole crunchy affair. Additionally I will opine on what causes merge conflicts in general, and how to reframe your thinking about git to mitigate complicated merge conflicts moving forward.

The following scenario describes a complication introduced by using a squash-merge strategy for resolving pull requests, but even if you eschew that particular strategy I believe the scenario can help impart valuable perspective and understanding that can be applied under similarly complicated merges that may arise through other machinations.

A scenario

Your team uses feature branches, and it's time for you to create a new feature. Diligently, you create a feature branch and make your first commit. Success! When pushing, you notice that someone has pushed a new commit to your repository's main branch:

--- config: themeVariables: gitBranchLabel0: "#ffffff" gitBranchLabel1: "#ffffff" gitBranchLabel2: "#ffffff" gitBranchLabel3: "#ffffff" gitBranchLabel4: "#ffffff" git0: "#FFCB5C" git1: "#CB5CFF" git2: "#75A1FF" git3: "#FF7A5C" git4: "#B1D45E" --- gitGraph commit id: "A" commit id: "B" commit id: "C" branch my_feature checkout my_feature commit id: "X" checkout main commit id: "D"

As you continue developing the feature and committing regularly, as one should, you notice more and more work piling up on the main branch.

--- config: themeVariables: gitBranchLabel0: "#ffffff" gitBranchLabel1: "#ffffff" gitBranchLabel2: "#ffffff" gitBranchLabel3: "#ffffff" gitBranchLabel4: "#ffffff" git0: "#FFCB5C" git1: "#CB5CFF" git2: "#75A1FF" git3: "#FF7A5C" git4: "#B1D45E" --- gitGraph commit id: "A" commit id: "B" commit id: "C" branch my_feature checkout my_feature commit id: "X" checkout main commit id: "D" checkout my_feature commit id: "Y" commit id: "Z" checkout main commit id: "E"

Now, some people might say that one should regularly merge main into a feature branch to avoid a future merge conflict. Personally I avoid this practice unless there's a code change in main that specifically impacts how my feature should be developed. For the contexts I've recently worked in, I've found that merging continuously into my branch ends up muddying the commit graph and making code reviews a mess of "who wrote what". These consequences might seem a reasonable price to pay to you. Your mileage may vary.

Okay, now it's time to merge the feature code back into the main branch, probably through a pull request. You create a PR, it is reviewed, the "squash merge" button is pressed and voilà:

--- config: gitGraph: mainBranchName: "origin/main" themeVariables: gitBranchLabel0: "#ffffff" gitBranchLabel1: "#ffffff" gitBranchLabel2: "#ffffff" gitBranchLabel3: "#ffffff" gitBranchLabel4: "#ffffff" git0: "#FFCB5C" git1: "#B1D45E" --- gitGraph commit id: "A" commit id: "B" commit id: "C" branch origin/my_feature commit id: "Squash XYZ" checkout origin/main commit id: "D" commit id: "E" merge origin/my_feature id: "F"

There are two things of note here:

  1. This diagram's branch names include origin/, to reflect that these are the branches on the remote repository and not the branches on your personal dev machine
  2. The "squash merge" operation rewrites the history of the remote branch. If you're not familiar with history rewriting, it means that the resulting files and directories will be identical to the ones before the squash operation, but the commit path that git walks is slightly different

Oh, but what if code reviews take a while, and/or the project maintainer doesn't reliably press the merge button within a reasonable amount of time? The next feature up for implementation is ready to be worked on, and it depends on the code you've just finished! Well, I'm sure it won't be a huge deal to just create another branch, off the first feature branch...

--- config: themeVariables: gitBranchLabel0: "#ffffff" gitBranchLabel1: "#ffffff" gitBranchLabel2: "#ffffff" gitBranchLabel3: "#ffffff" gitBranchLabel4: "#ffffff" git0: "#FFCB5C" git1: "#CB5CFF" git2: "#75A1FF" git3: "#FF7A5C" git4: "#B1D45E" --- gitGraph commit id: "A" commit id: "B" commit id: "C" branch my_feature checkout my_feature commit id: "X" checkout main commit id: "D" checkout my_feature commit id: "Y" commit id: "Z" checkout main commit id: "E" checkout my_feature branch my_second_feature commit id: "α"

And after some time, you have developed a large chunk of the second feature and the maintainer has merged the original code into the main branch. You look at the commit graph and your Spidey senses start tingling:

--- config: themeVariables: gitBranchLabel0: "#ffffff" gitBranchLabel1: "#ffffff" gitBranchLabel2: "#ffffff" gitBranchLabel3: "#ffffff" gitBranchLabel4: "#ffffff" git0: "#FFCB5C" git1: "#B1D45E" git2: "#CB5CFF" git3: "#75A1FF" git4: "#B1D45E" --- gitGraph commit id: "A" commit id: "B" commit id: "C" branch origin/my_feature checkout main branch my_feature commit id: "X" checkout main commit id: "D" checkout my_feature commit id: "Y" commit id: "Z" checkout main commit id: "E" checkout my_feature branch my_second_feature commit id: "α" checkout origin/my_feature commit id: "Squash XYZ" checkout main merge origin/my_feature id: "F" checkout my_second_feature commit id: "β"

Gross. Commit Z on my_feature represents the same file system state as what is now on main's F commit (ignoring the hopefully non-conflicting changes introduced by D and E), but the internals of git treat them as unique. The history rewriting of the origin/my_feature branch as part of the squash merge means that if you run git rebase main from your my_second_feature branch, git will go through the following:

  1. Find the nearest common ancestor between main and my_second_feature, which is C
  2. Check out branch main at commit F
  3. Try to replay the First Commit on my_second_feature, which is X[1]
  4. Merge conflict! Commit X modifies lines already modified between C and F thanks to the squashed merge

At this point you have three options:

  • Put on your muck boots, and wade into the merge conflict
  • Merge main into your second feature branch and continue on your merry way
  • Something a lot less automated, yet more time-consuming and prone to error (I'm looking at you, git cherry-pick, even though you don't necessarily make sense in this particular contrived case)

Actually, there's a fourth option -- I used deception as a narrative device! What an immensely clever writer I am.

A better way to rebase

Maybe you already know about git rebase --onto, maybe you don't. It took me about three years of using git before I was confident I wouldn't accidentally nuke any given repository's history for everyone, and rebase --onto is a very recent discovery for me.

Essentially git rebase --onto is a way to tell the git client: "Listen. I know you want to find the common ancestor between two branches All On Your Own, but I need you to trust me that the filesystem at Point One on my branch is identical to the filesystem at Point Two on this other branch, so just pretend it's the common ancestor and perform the rebase."[2]

For our scenario, we want git rebase to pretend that F (on main) and Z (on my_feature) are functionally identical. To do that, issue the command:


git rebase --onto main my_feature my_second_feature

This operation will rewrite the history for my_second_feature, and it won't mitigate any merge conflicts with my_second_feature that might have introduced by D and E on main, but the rebase will otherwise be unremarkable, which is what we want:

--- config: themeVariables: gitBranchLabel0: "#ffffff" gitBranchLabel1: "#ffffff" gitBranchLabel2: "#ffffff" gitBranchLabel3: "#ffffff" gitBranchLabel4: "#ffffff" git0: "#FFCB5C" git1: "#B1D45E" git2: "#75A1FF" git3: "#75A1FF" git4: "#B1D45E" --- gitGraph commit id: "A" commit id: "B" commit id: "C" branch origin/my_feature checkout main commit id: "D" commit id: "E" checkout origin/my_feature commit id: "Squash XYZ" checkout main merge origin/my_feature id: "F" branch my_second_feature commit id: "α'" commit id: "β'"

Now the graph is cleaner, and any code reviews or PRs issued for my_second_feature won't have any weird extraneous merge nonsense, or any errors introduced while navigating a merge conflict resolution that was more complicated than necessary.

Understanding conflicts (and avoiding them)

There are a few different ways that merge conflicts arise to the level of dysfunction, and I'm going to ignore the ones that can be ascribed to "multiple contributors working in the same file at the same time", as that's more a process issue than a tooling issue.

The most heinous merge conflicts I've encountered over the last decade have been the direct result of replaying a commit (during a merge or rebase operation) that has already been implicitly (or explicitly) applied. In my case, this has happened for myriad reasons, including:

  • Lack of understanding of the subtleties of git's commit objects versus "how the files look at a given point in time"
  • Rebase-phobia (or reluctance to rewrite history, or concern about irreparably borking one or more branches)
  • Plain ol' lack of tools in my toolbox to deal with hairier merges

Understanding git rebase --onto goes a long way toward addressing the first point. Once you develop the correct intuition that just because the state of the file system is identical (or nearly so) for two different branches/commits doesn't mean that git itself is smart/flexible enough to intelligently navigate a merge/rebase for those two commits. It is very powerful, then, to able to tell the git client that, for the purposes of a merge/rebase operation, two commits are functionally equivalent, imparting your own intelligence and flexibility.

If you add git rebase --onto with the absolute juggernaut that is git reflog, you can hack, slash, mend, recombine branches fearlessly. If you've never used git reflog, you should look at it. It's just an historical list of the commits you have interacted with, displaying hashes, labels, and commit messages, but simply having that information at your fingertips can alleviate a lot of unnecessary handwringing. If you botch a rebase or merge operation and are worried you've done something irreparable, open up the reflog. Simply git checkout the branch that got messed up, issue git reset --hard <commit hash you aptly retrieved from the reflog>, and you're back to where you started.

Also, branches are free! If you need to create additional branches, even just to provide a temporary, semantically valuable token for complex merge operations (e.g. omg/teammate_first_commit_that_broke_my_stuff -- the omg/ can help search/cleanup efforts), you can make as many as you need and then delete them after everything is copacetic.

Go git 'em, tiger

Hopefully these tools and concepts will be useful in dealing with future merge issues. Look at the commit graph, take a deep breath, and start by breaking down the problem into subproblems you already know the solutions to. Lean on your understanding of commits, merges, rebases, and -- if things get too out of control -- the reflog to reset to a better state and try again.


  1. If you looked at the words "the first commit is X" and thought "wait, the first commit on my_second_feature is α!", here's the explanation: branches aren't real -- a commit doesn't know what branch it's a part of. The word "branch" just means "a tag, with special rules, that has been applied to a commit", and our conception of "branchiness" (i.e. the history of commits underneath this special "tag" idea) is a useful abstraction we apply to help ourselves understand the git graph. Even if commit X was made under the banner of the my_feature branch, it no longer remembers; it's nothing more than a commit (the first commit, in fact) that lies between the tag/branch my_second_feature and the nearest common ancestor between my_second_feature and main. ↩︎

  2. The more proficient git users might read this description and say "that... is an interesting way to explain, but doesn't necessarily represent my mental model", which is a fair criticism. Another way to explain --onto that more directly reflects its syntax (git rebase --onto <New Home> <Starting Commit> <Target Branch>): "starting at New Home, replay commits starting after Start Commit until you get to Target Branch's HEAD". ↩︎