Using Git subtrees for repository separation

Yesterday I needed to break out some shared modules into their own repositories to avoid the monolithic repository anti-pattern I’ve been so fond of in the past.

I faced a few options:

Git submodules: Yuck! I’ve used them before and we didn’t get on well. They’re too intrusive, requiring submodules to be initialised and updated. And switching between branches, which is where Git really shines, suddenly becomes painful because submodules don’t do what you expect. Not convinced? Read more about the issues with submodules.
Gitslave and Repo look interesting but I was easily dissuaded after a brief read here. They don’t seem to fit my requirements.
Git subtree: This promised to give me what I wanted from git submodules without the administrative downsides. The supposed disadvantage is that with subtrees you end up with a copy of the upstream module’s source code in your downstream repository. However, to me that’s a small price to pay and it may even have it’s advantages in that you have more control, e.g. you could make temporary project-specific tweaks to the code.

So I decided to try out the git subtree approach. I’ve only used it for a day but it seems to be working nicely.

Splitting code into its own repository

The code I wanted to have as a shared, upstream repository was already living in my downstream repository so I needed to first split it out. If you are starting a new project you don’t need to this and can skip to the next part.

Let’s assume that in your main repository you have a directory at /path/to/code that you want to split off into it’s own repository called “shared”.

Create a new BARE local repository, e.g. ~/shared/

      mkdir shared
      cd shared
      git init --bare

Create a new remote repository for your shared code, e.g via the GitHub or Bitbucket web interface.
Back in the main repository you are splitting from, split the shared code into a branch called “split”

      git subtree split --prefix=path/to/code -b split

The new branch “split” will only contain the code from that path. Note: It will have all the commit history, which, unless you’ve been fastidious with your commits, will probably contain messages that pertain to your main repository. If you prefer you can squash all the commits into a single one by using the –squash switch when issuing the split command above.

Now push the new branch to your new local shared repository

      git push ~/shared/ split:master

From the new local shared repository, push the commit to the new remote shared repository

      git remote add origin ssh://git@bitbucket.org/xyz/shared.git
      git push origin master

Done! You now have a shiny new repository containing your shared code, and you’re ready to share it with the world.

The next step is to make the shared repository a subtree of your main repository.

Adding the repository as a subtree of your main repository

In your main repository, you need to get rid of the original files that you split, and then add the remote repository as a subtree instead.

Delete the entire directory you split from, and then commit.

      git rm -r path/to/code
      git commit -am "Remove split code."

Add the new shared repository as a remote

      git remote add shared ssh://git@bitbucket.org/xyz/shared.git

Now add the remote repository as a subtree

      git subtree add --prefix=path/to/code --squash shared master

Note: we use the –squash switch because we probably just want a single snapshot commit representing version X of the shared module, rather than complicating our own commit history with spurious upstream bugfix commits. Of course if you want the entire history then feel free to leave off that switch.

You now have subtree based on an upstream repository. Nice.

In the image you can see the bottom commit is the squashed commit containing all the upstream code and this is merged with your code.

Important note: Do not be tempted to rebase this. Push it as is. If you rebase, git subtree won’t be able to reconcile the commits when you do your next subtree pull.

So far so good. But this isn’t much use if you can’t receive changes from the upstream repository. Luckily that’s easy.

Pulling upstream changes

To pull changes from the upstream repository, just use the following command:

      git subtree pull --prefix=path/to/code --squash shared master

(You are squashing all newer upstream commits into a single one that will then be merged into your repository). Important: as mentioned above, do not rebase these commits.

Pushing changes to the upstream repository

Contributing changes to the upstream repository is as simple as:

      git subtree push --prefix=path/to/code --squash shared master

That’s really the extend of my knowledge at this point. I’ve used this approach to reorganise module sharing between several applications and so far so good. But it’s early days still; if I come across any issues I’ll blog them.

tl;dr

Git subtree seems to be a neat way of sharing modules between Git repositories. Watch this space for gotchas.

28 thoughts on “Using Git subtrees for repository separation”

el.servas says:

June 14, 2013 at 4:28 pm

Great post!

Reply
Luke says:

August 27, 2013 at 6:42 am

this was soooo useful. I had a project with many shared modules that weren’t always used in every solution. By allow this to be our new pattern we could keep the history of our changes to each module and still pull and push to each module as necessary

Reply
NormanNoble says:

March 19, 2014 at 12:41 am

Hi there,

I’ve been working on splitting up a large repository but I’ve come across a problem using the git subtree split.

If the folder name has dots in it then it looks like it is working but there is no branch at the end…

Reply
- Stu says:
  
  June 5, 2014 at 4:06 pm
  
  I’ve not come across this problem: lots of my directories have dots in them. Not sure what to suggest.
  
  Reply
Abhishek says:

May 28, 2014 at 9:41 pm

“Note: It will have all the commit history, which, unless you’ve been fastidious with your commits, will probably contain messages that pertain to your main repository.”

Will the new shared repository also contain the data corresponding to the old contents of files of the old main repo? In other words, can someone checkout (from the history info) old versions of the old files from the main repo that are not included in the shared repository?

Reply
- Stu says:
  
  June 5, 2014 at 3:48 pm
  
  No, I should have worded that better. The shared repo will contain the commit message history for any commit that changed a shared file. So a `git log` on your shared repo might contain messages pertaining to the commits of your old main repo.
  
  Reply
  - Eric says:
    
    July 18, 2014 at 8:28 pm
    
    I believe it matters whether you do
    
    git remote add shared ssh://git@bitbucket.org/xyz/shared.git
    git subtree add –prefix=path/to/code –squash shared master
    
    as you described vs. doing this single command with the remote repo expressed explicitly and directly.
    
    git subtree add –prefix=path/to/code –squash ssh://git@bitbucket.org/xyz/shared.git master
    
    If you look carefully at the output from the git fetch that is done as part of the git subtree add, the former method adds a “[new branch]” that is not added by doing the latter method. (Try it each way in two test repos and use git branch -a to also see the difference.)
    
    This “[new branch]” is a remote-tracking branch and it holds the history of the remote that one can see mixed in, for example, when you are looking graphically at all branches of your repo.
    
    If you use the latter method and add the subtree without referencing a local remote name, I believe you should find that the –squash prevents the history of the subtree from cluttering up your local repository history and log.
    
    (My next question is what others would recommend for the proper way to clean out the remote’s added stuff when a repo has already mixed remote-tracking branch history into a repo. Some steps I thought should work do not always work completely.)
  - Stu says:
    
    July 19, 2014 at 8:47 am
    
    Thanks, that’s interesting Eric. In my case I’m not concerned about the history from the shared repo being mixed in with local – after all I’m using the shared code so the commit history is relevant. But I can imagine scenarios where your approach would be useful.
  - Eric says:
    
    October 27, 2014 at 5:34 pm
    
    BTW, for those who use –squash with subtree add, pull, etc. (because they want to avoid mixing subtree repo history and parent project history), if they want to use a defined remote for convenience, they should also use “git remote add –no-tags …”. The –no-tags will exclude bringing over any tags into the remote tracking branch. That is what causes trouble for bringing in unwanted subtree history into the parent repo.
Jochen says:

June 5, 2014 at 2:38 pm

Thanks for your explanation. It seems very useful to me.
One question: are the changes pushed to local repositories of other users or do they have to add the remote repositories by themselves?
They will pull the deletion but will they geht the subtree information?

Reply
- Stu says:
  
  June 5, 2014 at 4:04 pm
  
  If you make changes to a “shared file” in your main repo and push it, other user of your main repo will obviously get those changes like any other change.
  If other users of your main repo want to be able to push changes to the shared repo they will need to do a `git remote add …` and a `git subtree add …` before they can `git subtree push …`.
  If users of another repo (say a different project) want to pull your changes from the shared repo then they will also need to `git remote add …` and a `git subtree add …` before they can `git subtree pull …`.
  
  Reply
  - Jochen says:
    
    June 6, 2014 at 6:47 am
    
    Many thanks.
Jochen says:

June 10, 2014 at 10:27 am

Hi again.
Now I did some tests and I´m always ending up in a merge conflict if I try to pull the changes from a “subtree”-repository into my main repository.
I cannot explain where the conflict comes from, there are no changes on the file in the main repository.
Do you have any ideas where the conflicts come from and how to prevent them?

Reply
- Eric says:
  
  October 3, 2014 at 1:11 pm
  
  @Jochen, multiple posters on StackOverflow have had this problem as well. Have you been using –squash for all your subtree pushing and pulling? What version of Git are you using and on what platform?
  
  Reply
  - Eric says:
    
    October 3, 2014 at 1:20 pm
    
    p.s. To be more clear, I have not encountered the problem as yet, but I do use –squash for all subtree pushing and pulling.
- Igor says:
  
  February 3, 2015 at 12:44 am
  
  Hi Jochen, have you been able to solve the issue? I’m running into the exact same problem: Pull > merge conflict, Push > rejected. I’m stuck.
  Thanks, Igor
  
  Reply
Merging subtree split to another branch: Is it safe?
assertion failed errors when trying to git subtree split
Aaron Seet (@icelava) says:

November 5, 2014 at 8:50 am

Each time I clone the super-repo, or checkout a new branch, it seems to me that it is necessary to perform the entire sequence again

– remove sub-directory of the sub-repo; commit
– add/fetch remote branch of the sub-repo
– configure subtree linkage to local remote branch

What we have are both super-repo and sub-repos having branches targeting a deployment environment (e.g. Test, Staging). So say the Test branch of the super-repo should be referencing the Test branch of the sub-repos. Correspondingly, the Staging branch uses sub-repos’ Staging branch code.

Would have been nice if these configurations somehow are preserved in the super-repo’s git config. Then anybody cloning the repo wouldn’t have to go through these hassles again.

Reply
Nobita says:

December 16, 2014 at 12:15 pm

Amazing stuff. I would love to be able to sort of cherry pick only the relevant files that I need instead of pulling in the whole repo, but still this seems to be a much better approach than submodules, at least in my case. Also, the ability to push changes to the upstream repo is great.

Thanks for the post, I think this feature should be more documented as I can’t find that much of information around about it.

Reply
Horst says:

January 30, 2015 at 3:47 pm

Hi there. Can you tell me what program you used in the screenshot to visualize your git repository?

Reply
- Stu says:
  
  January 30, 2015 at 6:15 pm
  
  I was using the History view in Git Source Control Provider for Visual Studio.
  
  It’s okay for simple branches but can get quite confusing when things get more complex.
  
  Reply
shashank sharma says:

October 14, 2015 at 6:20 am

Thank you for this wonderfully written article. I was looking for getting started with git subtree and this was band on target !

Reply
Mauricio Navarro Miranda says:

October 28, 2015 at 1:55 am

Thank you very much for this post! I’ll try to use subtrees to setup a new structure in some of my projects. 🙂

The best!

Reply
Adam Moore (@aj8) says:

January 8, 2017 at 5:07 pm

Thank you! This is the best article I’ve read on subtrees – and I’ve read them all! :p

Finally feel like I’m ready to give them a try.

Reply
Ruslan Novikov says:

February 20, 2017 at 6:02 pm

Thank you very much for the very useful article.
I’m having difficulty though with force-pushing changes to a subtree.
Maybe you know how to do that?

Thank you.

Reply
Michael Freidgeim says:

April 2, 2017 at 1:00 pm

I tried git subtree to copy only a folder from a repository . Unfortunately I didn’t find a way to to copy subdirectory from another repo, that support later PUSH the changes back to original repo. The answer
http://stackoverflow.com/questions/23937436/add-subdirectory-of-remote-repo-with-git-subtree has a few options, but all of them seems only discussed one-way (pull) syncronization. Can you suggest, how to copy to subfolder of my repository the folder from library repository, that will allow later two-way (pull/push) syncronization?

Reply
- Eric says:
  
  May 10, 2017 at 3:26 pm
  
  I think you would need to make that subdirectory into its own –bare repository. Then make its original repository (and any other repository that needs it) use git subtree to pull in (or push out to) that separated content.
  
  In other words, let go of the idea of making the original repository the “home” of that subdirectory. Give it its own –bare repository home and make the original repository another one of the clients of that access that content using git subtree.
  
  Git supports extracting that subdirectory content to its own –bare repository while retaining its past history. Search for other posts about exactly how to do that.
  
  Reply