Dodging Merge Conflicts in git
Introduction
Of all the source control management systems that have ever been created, git
is certainly one of them. You've probably used it, and been burned by a particularly complicated merge conflict. Resolving merge conflicts can be such a harrowing experience that I've seen the anticipatory fear of it drive teams to introduce large swathes of process to avoid conflicts in the first place, usually to the detriment of throughput.
With this post I'm going to step through a scenario that has resulted in particularly nasty merge conflicts for me in the past, and introduce a simple way to sidestep the whole crunchy affair. Additionally I will opine on what causes merge conflicts in general, and how to reframe your thinking about git
to mitigate complicated merge conflicts moving forward.
The following scenario describes a complication introduced by using a squash-merge strategy for resolving pull requests, but even if you eschew that particular strategy I believe the scenario can help impart valuable perspective and understanding that can be applied under similarly complicated merges that may arise through other machinations.
A scenario
Your team uses feature branches, and it's time for you to create a new feature. Diligently, you create a feature branch and make your first commit. Success! When pushing, you notice that someone has pushed a new commit to your repository's main
branch:
As you continue developing the feature and committing regularly, as one should, you notice more and more work piling up on the main
branch.
Now, some people might say that one should regularly merge main
into a feature branch to avoid a future merge conflict. Personally I avoid this practice unless there's a code change in main
that specifically impacts how my feature should be developed. For the contexts I've recently worked in, I've found that merging continuously into my branch ends up muddying the commit graph and making code reviews a mess of "who wrote what". These consequences might seem a reasonable price to pay to you. Your mileage may vary.
Okay, now it's time to merge the feature code back into the main
branch, probably through a pull request. You create a PR, it is reviewed, the "squash merge" button is pressed and voilà:
There are two things of note here:
- This diagram's branch names include
origin/
, to reflect that these are the branches on the remote repository and not the branches on your personal dev machine - The "squash merge" operation rewrites the history of the remote branch. If you're not familiar with history rewriting, it means that the resulting files and directories will be identical to the ones before the squash operation, but the commit path that
git
walks is slightly different
Oh, but what if code reviews take a while, and/or the project maintainer doesn't reliably press the merge button within a reasonable amount of time? The next feature up for implementation is ready to be worked on, and it depends on the code you've just finished! Well, I'm sure it won't be a huge deal to just create another branch, off the first feature branch...
And after some time, you have developed a large chunk of the second feature and the maintainer has merged the original code into the main
branch. You look at the commit graph and your Spidey senses start tingling:
Gross. Commit Z
on my_feature
represents the same file system state as what is now on main
's F
commit (ignoring the hopefully non-conflicting changes introduced by D
and E
), but the internals of git treat them as unique. The history rewriting of the origin/my_feature
branch as part of the squash merge means that if you run git rebase main
from your my_second_feature
branch, git
will go through the following:
- Find the nearest common ancestor between
main
andmy_second_feature
, which isC
- Check out branch
main
at commitF
- Try to replay the First Commit on
my_second_feature
, which isX
[1] - Merge conflict! Commit
X
modifies lines already modified betweenC
andF
thanks to the squashed merge
At this point you have three options:
- Put on your muck boots, and wade into the merge conflict
- Merge
main
into your second feature branch and continue on your merry way - Something a lot less automated, yet more time-consuming and prone to error (I'm looking at you,
git cherry-pick
, even though you don't necessarily make sense in this particular contrived case)
Actually, there's a fourth option -- I used deception as a narrative device! What an immensely clever writer I am.
A better way to rebase
Maybe you already know about git rebase --onto
, maybe you don't. It took me about three years of using git
before I was confident I wouldn't accidentally nuke any given repository's history for everyone, and rebase --onto
is a very recent discovery for me.
Essentially git rebase --onto
is a way to tell the git
client: "Listen. I know you want to find the common ancestor between two branches All On Your Own, but I need you to trust me that the filesystem at Point One on my branch is identical to the filesystem at Point Two on this other branch, so just pretend it's the common ancestor and perform the rebase."[2]
For our scenario, we want git rebase
to pretend that F
(on main
) and Z
(on my_feature
) are functionally identical. To do that, issue the command:
git rebase --onto main my_feature my_second_feature
This operation will rewrite the history for my_second_feature
, and it won't mitigate any merge conflicts with my_second_feature
that might have introduced by D
and E
on main
, but the rebase will otherwise be unremarkable, which is what we want:
Now the graph is cleaner, and any code reviews or PRs issued for my_second_feature
won't have any weird extraneous merge nonsense, or any errors introduced while navigating a merge conflict resolution that was more complicated than necessary.
Understanding conflicts (and avoiding them)
There are a few different ways that merge conflicts arise to the level of dysfunction, and I'm going to ignore the ones that can be ascribed to "multiple contributors working in the same file at the same time", as that's more a process issue than a tooling issue.
The most heinous merge conflicts I've encountered over the last decade have been the direct result of replaying a commit (during a merge or rebase operation) that has already been implicitly (or explicitly) applied. In my case, this has happened for myriad reasons, including:
- Lack of understanding of the subtleties of git's
commit
objects versus "how the files look at a given point in time" - Rebase-phobia (or reluctance to rewrite history, or concern about irreparably borking one or more branches)
- Plain ol' lack of tools in my toolbox to deal with hairier merges
Understanding git rebase --onto
goes a long way toward addressing the first point. Once you develop the correct intuition that just because the state of the file system is identical (or nearly so) for two different branches/commits doesn't mean that git
itself is smart/flexible enough to intelligently navigate a merge/rebase for those two commits. It is very powerful, then, to able to tell the git
client that, for the purposes of a merge/rebase operation, two commits are functionally equivalent, imparting your own intelligence and flexibility.
If you add git rebase --onto
with the absolute juggernaut that is git reflog
, you can hack, slash, mend, recombine branches fearlessly. If you've never used git reflog
, you should look at it. It's just an historical list of the commits you have interacted with, displaying hashes, labels, and commit messages, but simply having that information at your fingertips can alleviate a lot of unnecessary handwringing. If you botch a rebase or merge operation and are worried you've done something irreparable, open up the reflog. Simply git checkout
the branch that got messed up, issue git reset --hard <commit hash you aptly retrieved from the reflog>
, and you're back to where you started.
Also, branches are free! If you need to create additional branches, even just to provide a temporary, semantically valuable token for complex merge operations (e.g. omg/teammate_first_commit_that_broke_my_stuff
-- the omg/
can help search/cleanup efforts), you can make as many as you need and then delete them after everything is copacetic.
Go git 'em, tiger
Hopefully these tools and concepts will be useful in dealing with future merge issues. Look at the commit graph, take a deep breath, and start by breaking down the problem into subproblems you already know the solutions to. Lean on your understanding of commits, merges, rebases, and -- if things get too out of control -- the reflog to reset to a better state and try again.
If you looked at the words "the first commit is
X
" and thought "wait, the first commit onmy_second_feature
isα
!", here's the explanation: branches aren't real -- a commit doesn't know what branch it's a part of. The word "branch" just means "a tag, with special rules, that has been applied to a commit", and our conception of "branchiness" (i.e. the history of commits underneath this special "tag" idea) is a useful abstraction we apply to help ourselves understand the git graph. Even if commitX
was made under the banner of themy_feature
branch, it no longer remembers; it's nothing more than a commit (the first commit, in fact) that lies between the tag/branchmy_second_feature
and the nearest common ancestor betweenmy_second_feature
andmain
. ↩︎The more proficient git users might read this description and say "that... is an interesting way to explain, but doesn't necessarily represent my mental model", which is a fair criticism. Another way to explain
--onto
that more directly reflects its syntax (git rebase --onto <New Home> <Starting Commit> <Target Branch>
): "starting at New Home, replay commits starting after Start Commit until you get to Target Branch'sHEAD
". ↩︎