Resolving merge conflicts

What is a merge conflict?

A merge conflict happens when Git says:

“There are two versions of this file, and I can’t combine them automatically. You need to tell me what the correct version is.”

We want to choose the version that best represents what we need for the project.

For code, this often means keeping parts of both versions. For more complex formats (Jupyter notebooks, PDFs, other binaries), we may only be able to keep one.

Merge conflicts are A Good Thing, Actually

Merge conflicts don’t mean you’ve done something wrong! They are a normal consequence of collaborative work.

If git didn’t throw merge conflicts, it would be trivial to accidentally overwrite and delete someone else’s work.

Why do merge conflicts occur?

Conflicts happen when two branches change the same part of the same file differently.

They are a problem specific to collaborative work.

Imagine we have our main branch with a file A0.
This file is worked on by branches B1 and B2.
B1 gets theirs merged in first…
…so we now have a new version of the file on main (A1).
But when it is B2’s turn, the file they are working on may look very different to the one they were working on before - they were working off of A0, not A1!


branch B1:       - B1 -
               /        \ 
main:      A0 ----------- A1 --- ?
               \                /
branch B2:       - B2 - - - - -

If both branches changed different parts of the file, Git will resolve it automatically.

If both changed the same part, Git asks you to choose.

This problem is worse if we commit to main directly.

In this example, branches B1 and B2 could have many many different commits before they get pulled in.

If all of those commits had been made to main, then the number of merge conflicts could have been much higher; instead of every PR/merge triggering this behaviour, it would happen for every commit!

Resolving merge conflicts

The best way to resolve conflicts by updating your branch with main (merge main into your branch), then fixing conflicts locally.

In the updated diagram below, we merge main into B2 and end up with a new updated file BA2. We then can resolve any conflicts and merge the PR into main where we end up with our final file (A2).


branch B1:       - B1 -
               /        \ 
main:      A0 ----------- A1 ------ A2
               \                \  /
branch B2:       - B2 - - - - - - BA2

How exactly any conflicts in this merge are resolved depends on the type of file - they can sometimes be painless, or sometimes very hard to deal with.

The nice version

Text files are fairly simple to fix, and can be done within GitHub’s interface. When merging the main branch into your branch, you will see something like:

<<<<<<< HEAD
def greet():
    print("Hello from my branch!")
=======
def greet():
    print("Hello from main!")
>>>>>>> main

The “incoming” changes in this case is the main branch that you are trying to merge into the “current” branch (your feature branch, B2 in the example above). ======= is a separator between these. You can then fix the merge conflict by removing the extra git padding and code from the version you don’t want to keep.

Tip

On GitHub or in VSCode/GitHub Desktop, there are buttons you can click to select which set of changes you want to keep - this will fix the formatting automatically.

The not-so-nice version

Jupyter notebooks are another major source of merge conflicts. They are especially volatile as it is very easy to make unintended changes to cells - e.g. by simply re-running them. This makes them often trigger merge conflicts, particularly with a “commit-to-main” workflow.

In principle, it is easy enough to track changes in these files since they are text format, just with much more text to edit. However, since e.g. cell execution counts/output can change, this makes it very complex in practice.

Prevention is better than cure

The best way to deal with this is to avoid overuse of Jupyter notebooks, and follow practices that minimise problems for ones you do use:

move core logic to scripts (R/Python)
Strip outputs from notebooks before committing. If outputs are clear, there are no execution counts or other metadata to worry about.
- There are ways to automate this process - get in touch if useful
Don’t work collaboratively on the same parts of notebooks where possible!

But I want to see outputs…

Unfortunately, the outputs are the cause of all of the issues here. The best bet in this instance is to save whatever assets you want from the notebook(s) and upload separately. Or better yet, provide a link to where someone can obtain the necessary data/dependencies so that they can run locally to reproduce.

If they really are that important - then you will have to just face resolving the merge conflict each time.

… and I need to work collaboratively on them!

Sometimes it is necessary to work on Jupyter notebooks collaboratively e.g. when developing training material. In this case, try to break up the content as much as possible into separate notebooks, add new cells rather than edit existing ones, and communicate within your team who is working on what to avoid as much overlap as possible. Again, the same advice on clearing outputs applies.

In general, Jupyter notebooks are a pain to work with for team projects. They are fantastic for exploratory work, teaching and demos, but for this (and other!) reasons, they aren’t ideal for work you want to actually put into production.

The truly nasty

Binaries and generated outputs are the worst merge conflicts to deal with. These cannot be merged - you must pick one version of the file to keep. They aren’t possible to resolve via GitHub - you have to use the git command-line interface or a graphical application like VSCode to solve them.

However - in most cases, we don’t actually want binaries written into our repositories! We are far more interested in the means of generating these (code, Markdown) than the actual files themselves.

An example - Quarto

Quarto generates .html and .pdf files when running e.g. quarto render. However, these files are not what is important for us to keep track of - we can entirely reproduce them just from the .qmd files (i.e. the bits we actually write!).

It is likely that we will encounter issues if uploading them - particularly as people may have different compilers/extensions installed locally. The solution is to have

We can use GitHub Actions to do this automatically whenever we merge a pull request. The GitHub Actions runner can install all of the required dependencies and extensions, build the .html or .pdf and commit it to a branch - these can then be downloaded freely, or pushed automatically to GitHub Pages.

This means that the only files that need to live in the directory are the .qmd files and any assets (e.g. images) embedded in them. Everything else is loaded in and set up in the GitHub Actions workflow.

Feel free to reach out to me if you want any help setting this up for your repository.

`.gitignore` is your friend!

We’ll often want to compile locally to test that our changes have worked - so we will end up with these files in our local workspaces.

We do however want to ensure that they don’t end up in our team repository! To do this, we can add them to .gitignore. Then, when running git add (or equivalent), we make sure these files are not uploaded. For example:

# Quarto outputs
/_site/
/.quarto/
/_book/
/_freeze/
/docs/

You should also use this for other files that your IDE/OS generates - e.g. .DS_Store on MacOSX, .idea for PyCharm, .vscode for VSCode etc.

Demo - resolving different types of merge conflict

Live demo time! We will navigate to our demo repository now. It can be found at https://github.com/JonElsey/github_merge_conflicts.

If you are interested in actually doing the commands yourselves, please feel free to do so in your fork.

We will be using GitHub Desktop for this - but the principles work exactly the same if using the CLI or VSCode/PyCharm/etc. extensions.

Committing the merged changes

Once you have resolved the conflict, you should commit-and-push your branch - it should now be ready for merging (subject to review!).

How do we avoid them in the first place?

Everything from Session 1 helps; good documentation, small PRs, and short-lived branches.

The longer a branch lives, the more it will diverge from main
So keeping PRs concise is useful - merge your work in more frequently, less divergence
- Split big pieces of work into smaller ones!
Use Issues to track what is being worked on! This way, someone can see that you are working on a file, and see any changes coming.
Keep repositories clean of unnecessary files - like non-critical data, plots, VSCode environment data, etc. Use .gitignore for these.
Communicate with your team - if your work doesn’t overlap, then merge conflicts don’t happen.

Take-homes on merge conflicts

To solve merge conflicts:

Pull latest main
Merge main into your branch
Pick which changes you want to keep using your tool of choice
Commit and push

Remember:

Merge conflicts don’t mean something is wrong - just that Git needs you to make some choices on what is right.
Repeated and painful merge conflicts result from procedural issues, not a lack of Git skills! i.e.
- branches are too long‑lived
- PRs are too large
- generated files are being committed
- intent isn’t captured in Issues
- Jupyter notebooks are being edited by multiple people
GitHub is meant to make teamwork easier, not harder — if it is making things harder, stop and get help.

What is a merge conflict?

Why do merge conflicts occur?

Resolving merge conflicts

The nice version

The not-so-nice version

Prevention is better than cure

The truly nasty

.gitignore is your friend!

Demo - resolving different types of merge conflict

Committing the merged changes

How do we avoid them in the first place?

Take-homes on merge conflicts

`.gitignore` is your friend!