Drupal Patching, Committing, and Squashing with Git

Back in the bad old days (like 2 weeks ago) there was exactly one way to create patches and exactly one way to apply them. Now we have so many formats and so many ways to work it can be mind boggling. As a community, we're going to have to come to a consensus on which of these many options we prefer. I'm going to take a quick look at the options and how to work among them when necessary.

Ways to create patches

My preference when making changes is always to commit my changes whether or not the commits will be exposed later. That lets me keep track of what I'm doing, and make commit comments about my work. So in each if these cases we'll start with a feature branch based on origin/7.x-1.x with:

# Create a feature branch by branching from 7.x-1.x
git checkout -b my_feature_99999 origin/7.x-1.x
[edit A.txt]
git add .
git commit -m "Added A.txt"
[edit B.txt]
git add .
git commit -m "Added B.txt"

First, I may want to know what commits and changes I'm going to be including in this patch.

  • git log origin/7.x-1.x..HEAD shows me the commits that I've added.
  • git diff origin/7.x-1.x shows me the actual code changes I've done.

Now we can create a patch representing these changes in at least a couple of ways:

  • git diff origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch will create a patchfile which can be uploaded to Drupal.org.
  • git format-patch --stdout origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch will create a patchfile that includes sophisticated commit information allowing a maintainer to just apply the patch and fly - the commits you've created will automatically be applied (more later). Note that this type of patch has the email you used with git config user.email embedded in the patch, so if you post it on Drupal.org, it will be indexed by Google.

What do we do with the feature branch? Whatever. It's sitting there and can be used any number of ways in the future, or you can delete it. I tend to clean these up periodically, but not right away. git branch -D my_feature_99999 would delete this branch.

Ways to apply patches

I usually create a feature branch to apply and work with patches. This lets me make edits after the fact and have complete freedom. I don't have to keep track of what I've committed until I'm done. So in this case let's assume that I'm the maintainer and have received the patch created above.

Before we start, please
git config --global push.default tracking
which will allow a tracking branch to automatically push to its remote.

git checkout -b received_patch_99999/03 origin/7.x-1.x

This creates a named local branch which can push to the 7.x-1.x branch on the server.

There are at least three ways to apply patches:

  • patch -p1 < /path/to/patchfile.patch or patch -p1 -i /path/to/patchfile.patch will apply the differences. I typically commit the patch immediately, like:
      git status
      git add *.txt
      git commit -m "Patch from user99 to fix 99999 comment 3"
     

    I can then continue to work on this, but now I can differentiate my own edits from the original patch that was provided. I might made additional commits. As a maintainer, I can then rebase/squash and commit a nicely-formatted single commit at the end. (See below.)

  • git apply /path/to/patchfile.patch does exactly the same thing as patch -p1 so you can use the exact same workflow.
  • git am /path/to/patchfile.patch only works with patches created using git format-patch that contain commit information. But when you use it, it actually makes the exact commits (with the associated commit messages) that the original patcher made. You can then continue to make changes yourself, make additional commits, and then rebase/squash and commit a nicely formatted version of the patch (see below).

I could now push the commits with

git push  # If you have push.default set
OR
git push origin HEAD:7.x-1.x

Squashing, rebasing, and editing the commit message

Let's say that this patch works, we've committed whatever edits we want, and it's time to go ahead and fix it up and push it. Now we can rebase/squash it, fix up the commit message, and push the result.

Some things to know here: Although you may have heard that "rebasing is bad" it's not true. Rebasing commits that have already been made public and that might affect someone else is in fact bad. But here we have not done that. We are just using Git's greatest features to prepare a clean, single commit. It will not break anything, it will not rewrite anybody else's history.

Let's squash our all of our work into a single commit that has a good message: "Issue #99999 by user999: Added A.txt and B.txt"

# Make sure our local branch is up-to-date
git pull --rebase
# Now rebase against the branch we are working against.
git rebase -i origin/7.x-1.x

We'll have an editor session pop up with a rebase script specified, something like this:

    pick a5c1399 added A
    pick d3f45f7 Added B
  
    # Rebase c98c91a..d3f45f7 onto c98c91a
    #
    # Commands:
    #  p, pick = use commit
    #  r, reword = use commit, but edit the commit message
    #  e, edit = use commit, but stop for amending
    #  s, squash = use commit, but meld into previous commit
    #  f, fixup = like "squash", but discard this commit's log message
    #
    # If you remove a line here THAT COMMIT WILL BE LOST.
    # However, if you remove everything, the rebase will be aborted.
    #

To squash, all we have to do here is change the second and following commands of this little script from "pick" to "s" or "squash". Leave the top one as 'pick' and everything will be squashed into it. I'll change mine like this:

pick a5c1399 added A
s d3f45f7 Added B

and save the file and exit.

Then you get the chance to edit the commit message for the entire squashed commit. An editor session pops up with:

  # This is a combination of 2 commits.
  # The first commit's message is:
 
  added A
 
  # This is the 2nd commit message:
 
  Added B
 
  # Please enter the commit message for your changes. Lines starting
  # with '#' will be ignored, and an empty message aborts the commit.
  # Not currently on any branch.
  # Changes to be committed:
  #   (use "git reset HEAD <file>..." to unstage)
  #
  # new file:   A.txt
  # new file:   B.txt

At this point, everything in the commit message that starts with '#' will be a comment. You can just edit this file to have the commit that you want. I'll delete all the lines and put

Issue #99999 by user999: Added A.txt and B.txt

Now I can just

git push  # If push.default is set to tracking
OR
git push origin HEAD:7.x-1.x

Note: A maintainer who never gets lost in what they're doing and always finishes things sequentially doesn't absolutely need to create a new branch for all this. Instead of creating a branch with

git checkout -b received_patch_99999/03 origin/7.x-1.x

we could have done all this on a 7.x-1.x tracking branch. The only complexity is that we have to figure out what commit to rebase to, which we can figure out with git log

git checkout 7.x-1.x
git pull
git am /path/to/patchfile.patch
git rebase -i origin/7.x-1.x
git push

Rebasing to prepare a nicer single-commit patch

OK, so let's assume that the maintainers of this project ask for better-prepared patches because they don't care to rebase and never edit a commit, but rather just apply them. The patch contributor can do the exact same squashing process before creating the patch.

Above, when creating doing work to create a new patch, I did this:

  # Create a feature branch by branching from 7.x-1.x
  git checkout -b my_feature_99999 origin/7.x-1.x
  [edit A.txt]
  git add .
  git commit -m "Added A.txt"
  [edit B.txt]
  git add .
  git commit -m "Added B.txt"

There were two commits on my feature branch, and they have my typical, lousy, work-in-progress commit messages that I wouldn't want any maintainer to have to deal with. So I'll rebase/squash them into a single commit before creating my patch.

git rebase -i origin/7.x-1.x

Then I get the same exact options that the maintainer had in the section above, and follow the exact same procedure to consolidate them, and give a good commit message. Then,

git format-patch --stdout origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch

will make a very nice single-commit patch with a good message that the maintainers can use out of the box if they'd like to.

Summary

We have lots of options in creating and applying patches, but the git format-patch + git am + git rebase -i toolset is remarkably powerful, and we may be able to build a community consensus around this toolset.

Rebasing sounds hard and odd, and squashing really awful, but they're essentially the same thing as patching. In reality, they bring the patch workflow into the 21st gitury. One patch == one commit. Sure you can do lots of obscure things with them, but here we're just combining and cleaning up commits.

And yes, you've been warned not to rewrite publicly exposed history using rebase, but we're not doing that. We're preparing something to be made public so it's completely harmless.

8 Comments

encouraging multiple logical commits over one "nicer" single one

Hi Randy,

your series of articles about git is very good, but sometimes you _seem_ to be encouraging to stay with the old way of thinking to push a "single commit per issue".

An issue may need several preparatory steps to be solved so there could be multiple logically independent commits (from different people interested in the issue) to address it, even if the actual solution will be in one of these steps.I think there is nothing wrong with these steps showing up in the project history.

So I think we should strongly prefer "git format-patch" over "git diff" to generate patches and symmetrically "git am" over "git apply" so to bring in the commit messages and the authorship as the original patch developer(s) meant. And if a patch/commit message is suboptimal we should tell the original author to adjust and resend rather than import and fix up by ourselves; this process can look slower at first, but it's the slowness which _education_ requires, so it's not bad IMHO.

We could continue the discussion about patches and review practices in the comments at http://drupal.org/node/1054616 if you like.

Regards,
Antonio

Looking forward to exploring this

I agree with you about git format-patch and git am. But it takes more than tools to get to where we have a good approach to showing the actual history of an issue. I look forward to experimenting with ways to implement that (until we have per-issue repos, at which point we'll have to rethink again.

I'm actually just experimenting and conversing on the possibilities. We have to figure out what's do-able and appropriate in the community. I'll hope to follow up with a post that explains how somebody could implement

The Drupal.org Patch Contributor Guide you link to is intended to be be simple, basic, and relatively inflexible. It doesn't really deal with the issues of how to give credit to people in a long-running issue, etc. I look forward to exploring that workflow.

The cool thing about all this is we're being challenged to figure out what the new possibilities are. So wonderful.

Workflow discussions.

I agree that the Drupal.org Patch Contributor Guide should be simple and to the point, that's exactly why it should tell just the preferred way of dealing with patches... once we found out what the preferred way is, of course :)

Another point of discussion could be the use of "git commit -m" to supply a short commit message, I don't like its use very much as this also discourages writing more detailed and informative commit messages.

The reality is that the tools we use influence the workflow massively, for example I am very (maybe too much) careful about having a clean project history because when a problem arises I want to be able to use "git bisect" to spot the cause and if commits do several unrelated changes this gets harder. If it is certainly true that a good workflow is independent from the tools, it also true that good tools can help improving the workflow.

Randy, maybe we should move the discussion over to the Git Migration issue queue?

I'm sure you're right - I get lost over there :-)

It's an oddity of the Drupal world that (IMO) people are more likely to read a workflow discussion on my blog than they are to read it in a Drupal-sponsored venue. I don't like that, but I just think it is what it is, and that's why I have been posting my speculative and exploratory posts here. The Git Migration Issue Queue is really just for insiders, and I want to explore this for everybody.

That said, I completely agree with you and of course will participate there.

let's stay here

Well, if you think there is going to be more attention here, let's stay here. I have no problem with that.

Looking forward to your next articles about git and Drupal.

Not sure about that yet

I am not convinced yet about using multiple commits. While I obviously see advantages, I'm not yet sure how to address the following issues:

- We have currently have standards for commit messages, they look like this "Issue #123 by Berdir, rfay: Added something.". (We might want to think about changing that now that we have git, but that's another story) Others might or might not follow that, so I have to mess around with their commit messages with git rebase -i

- It is easy to create a simple changelog based on the commit messages, there was a script for CVS and there is now drush integration to do it with Git. That becomes useless when there are multiple commits per issue.

I know that these are things which are going to be discussed during DrupalCon and it also depends on what Core will be handling this.

About 'tracking'

Randy, can you explain what 'tracking' branches does?
What does git config --global push.default tracking really do and why should we set that option?

You have mentioned that a few times, and I cannot find any clear explanations about it.

Thanks, for all the great articles and videos!

push.default

If you set push.default = tracking, then a "git push" will by default push to the branch named as the branch you're tracking. Otherwise you have to use fancy syntax to name that.

So for example, if you set up a local branch whose name is not the same as the remote: git checkout origin/7.x-1.x --branch mybranch

With push.default = tracking, that will work for push and pull.

Wihout push.default = tracking, it will work only for pull, and you'll have to use fancier syntax (as shown in the article) to push it correctly.