Yesterday I needed to break out some shared modules into their own repositories to avoid the monolithic repository anti-pattern I’ve been so fond of in the past.
I faced a few options:
- Git submodules: Yuck! I’ve used them before and we didn’t get on well. They’re too intrusive, requiring submodules to be initialised and updated. And switching between branches, which is where Git really shines, suddenly becomes painful because submodules don’t do what you expect. Not convinced? Read more about the issues with submodules.
- Gitslave and Repo look interesting but I was easily dissuaded after a brief read here. They don’t seem to fit my requirements.
- Git subtree: This promised to give me what I wanted from git submodules without the administrative downsides. The supposed disadvantage is that with subtrees you end up with a copy of the upstream module’s source code in your downstream repository. However, to me that’s a small price to pay and it may even have it’s advantages in that you have more control, e.g. you could make temporary project-specific tweaks to the code.
So I decided to try out the git subtree approach. I’ve only used it for a day but it seems to be working nicely.
Splitting code into its own repository
The code I wanted to have as a shared, upstream repository was already living in my downstream repository so I needed to first split it out. If you are starting a new project you don’t need to this and can skip to the next part.
Let’s assume that in your main repository you have a directory at /path/to/code that you want to split off into it’s own repository called “shared”.
- Create a new BARE local repository, e.g. ~/shared/
mkdir shared cd shared git init --bare
- Create a new remote repository for your shared code, e.g via the GitHub or Bitbucket web interface.
- Back in the main repository you are splitting from, split the shared code into a branch called “split”
git subtree split --prefix=path/to/code -b split
The new branch “split” will only contain the code from that path. Note: It will have all the commit history, which, unless you’ve been fastidious with your commits, will probably contain messages that pertain to your main repository. If you prefer you can squash all the commits into a single one by using the –squash switch when issuing the split command above.
- Now push the new branch to your new local shared repository
git push ~/shared/ split:master
- From the new local shared repository, push the commit to the new remote shared repository
git remote add origin ssh://firstname.lastname@example.org/xyz/shared.git git push origin master
Done! You now have a shiny new repository containing your shared code, and you’re ready to share it with the world.
The next step is to make the shared repository a subtree of your main repository.
Adding the repository as a subtree of your main repository
In your main repository, you need to get rid of the original files that you split, and then add the remote repository as a subtree instead.
- Delete the entire directory you split from, and then commit.
git rm -r path/to/code git commit -am "Remove split code."
- Add the new shared repository as a remote
git remote add shared ssh://email@example.com/xyz/shared.git
- Now add the remote repository as a subtree
git subtree add --prefix=path/to/code --squash shared master
Note: we use the –squash switch because we probably just want a single snapshot commit representing version X of the shared module, rather than complicating our own commit history with spurious upstream bugfix commits. Of course if you want the entire history then feel free to leave off that switch.
You now have subtree based on an upstream repository. Nice.
In the image you can see the bottom commit is the squashed commit containing all the upstream code and this is merged with your code.
Important note: Do not be tempted to rebase this. Push it as is. If you rebase, git subtree won’t be able to reconcile the commits when you do your next subtree pull.
So far so good. But this isn’t much use if you can’t receive changes from the upstream repository. Luckily that’s easy.
Pulling upstream changes
To pull changes from the upstream repository, just use the following command:
git subtree pull --prefix=path/to/code --squash shared master
(You are squashing all newer upstream commits into a single one that will then be merged into your repository). Important: as mentioned above, do not rebase these commits.
Pushing changes to the upstream repository
Contributing changes to the upstream repository is as simple as:
git subtree push --prefix=path/to/code --squash shared master
That’s really the extend of my knowledge at this point. I’ve used this approach to reorganise module sharing between several applications and so far so good. But it’s early days still; if I come across any issues I’ll blog them.
Git subtree seems to be a neat way of sharing modules between Git repositories. Watch this space for gotchas.