Migrating from SVN to Git with Git-Externals
Why Git?
In recent years Git, a distributed version control system (VCS), created by Linus Torvalds in 2005, has spread like wildfire.
Unlike other VCSs, Git allows you to do most of the work on your machine, without having to constantly contact the server.
This presents a number of advantages: the server needn’t be continuously reachable; Git is almost always faster than SVN as it does not work for files but rather for the contents of the same and in this way it is not necessary to download the entire file if part of it is already in the local copy.
The only “problem” is that to fully understand Git takes quite a long time as it involves a workflow that is different from what is usually encountered if you are used to SVN.
15 minutes are, however, all it takes to become productive.
Why not use Git-SVN?
Git-SVN is a tool that is used to obtain an SVN repo and to use Git on the client side for everyday work. While doing its job well, Git-SVN has certain limitations, the most important of which is the fact that it does not support svn-externals.
These are a convenient way to import other repo or parts of them into an SVN project.
Furthermore, Git-SVN doesn’t allow the use of some Git extensions such as git-lfs, the purpose of which is to store large files in a Git repo without overly degrading performance.
This limit is due to the fact that the server is still pure SVN.
Migration
Numerous tutorials can be found on how to migrate an SVN repo series to Git by trawling a little around the web.
The typical workflow is:
- Collecting all the contributors to the repo and creating a “map” between SVN users and Git users, both because there might be different usernames and because SVN does not keep track of the emails that are essential in Git
- Cloning the SVN repository with Git-SVN in a local temporary Git repo; this can take up to several hours to complete
- “Cleaning up” the git repo by converting any SVN tags that Git sees as branches (as in SVN they are to all effects branches) and other small aspects such as renaming trunk as master and converting the attribute svn:ignore into the file .gitignore
- Pushing the repo onto an external provider such as Github or perhaps onto a self-hosted version of Gitlab
While being easily automatable, this procedure does present a number of problems:
- It is not very fast for medium to large repositories
- It does not support externals
These problems led us to create an ad hoc tool to manage migration; thus Git-Externals was conceived.
Git-Externals
Git-Externals was initially only dedicated to the management of externals, but then it became a useful tool for migrations that may not involve externals at all.
Actually this name includes a series of Python scripts that are used to manage the migration in a modular way:
- gittify, a tool used to perform the actual migration
- git-externals, a utility that facilitates the management of externals in a Git repo
- svn-externals-info, generic script to obtain interesting information on the externals of a given SVN repo
- gittify-gitlab, it allows Git repos to be pushed automatically onto a Gitlab server
Git-Externals was designed from the outset to work in 3 steps:
- to clone the various repos (using under the hood Git-SVN) with:
$ gittify clone --authors-file authors-file.txt file:///var/lib/svn foo
- to keep them in sync with SVN until the time is right for the final migration; this is because perhaps we have continued to work on SVN and therefore we need to bring the new commits into the Git repo
$ gittify fetch --authors-file authors-file.txt file:///var/lib/svn foo
- to complete the migration to Git by cleaning up with:
$ gittify cleanup foo $ gittify finalize foo
- pushing the repo onto a server such as GitLab with:
$ cd foo.git $ git remote add origin https://gitlab.com/bar/foo.git $ git push origin --all $ git push origin --tags
In this way, waiting times are minimised.
As for the externals it was decided not to use any solution already integrated with Git, such as the submodules, as they all had limitations in mapping externals. In the specific case of submodules the problem is that they are used to specify only an entire Git repo as a submodule, when one of the few convenient aspects about SVN is the fact that it is used to checkout (/clone) only one part of the repo. We, therefore, decided it was time to create something similar to the submodules but more flexible: git-externals.
It is interesting to note that in reality git-externals can be used in a Git repo that has never migrated from SVN, because it in itself has no notion of SVN. For example, it may be a more convenient alternative to submodules when greater flexibility is needed in the management of externals. Submodules work well when there is a dependency between truly independent projects such as a development cycle. When, instead, the main project and the sub-module are developed simultaneously, then it is “onerous” to use as it is necessary to indicate from the main repository the commit of the submodule to be used and, therefore, every time the submodule is updated, it is also necessary to update the reference .
All that is required is a JSON git_externals.json file in .git/externals. This file is created automatically when using the scripts for migration but it can also be changed with the git-externals script itself.
For example, the following commands will add 2 externals to the current repo:
- the shared directory of the gitlab-ce repo in foo and in bar and the README.md in baz/README.md
- the Linux Makefile in Makefile
$ git externals add --branch=master https://gitlab.com/gitlab-org/gitlab-ce.git shared/ foo
$ git externals add --branch=master https://gitlab.com/gitlab-org/gitlab-ce.git shared/ bar
$ git externals add --branch=master https://gitlab.com/gitlab-org/gitlab-ce.git README.md baz/README.md
$ git externals add --tag=v4.4 https://github.com/torvalds/linux.git Makefile Makefile
$ git add git_externals.json
$ git commit -m "DO NOT FORGET TO COMMIT git_externals.json!!!"
Note how, in order to make the addition of these externals effective, it is necessary to commit the configuration file git_externals.json; this to make the update of the externals versions traceable.
Advantages
Git-Externals tries to map as closely as possible some of the SVN features in Git. For example, SVN’s ability to only download repo parts is mapped in Git through the sparse-checkout.
However this is somewhat boring to use especially if we want to run the sparse-checkout on a submodule. Git-Externals hides all this from the end user, because the configuration process for sparse-checkout is integrated directly within Git-Externals.
Furthermore git-externals has a good number of accessory commands to manage the externals. For example, you can watch the diff or status on all externals (or a part of them) with simple commands.
$ git externals status
$ git externals status ext1 ext1
$ git externals diff
Furthermore it is possible to update all the externals to the versions specified in git_externals.json
with git externals update.
In any case git externals --help
is your friend.
Limitations
Under the hood git-externals uses symlinks to map the actual position of an external to the desired position in the main repo. This is because if you copy a file from the external into the project and changed it, there would be a misalignment between the version used by the main repo and the one in the external.
The problem stems from the fact that Windows only allows them to be used by users with Admin Privileges. We have decided to live with this, because we believe that it is not a major obstacle for most Windows developers.
Also, as we are able to use a subdirectory of a repository as external, certain relative paths used within the subdirectory may simply not be present. For example, if we have an external that assumes we find a foo.baz file in the parent directory, most likely that file will not be there because the external parent directory does not match that of the project. However, we believe that these cases should not exist, because semantically this would mean that a dependency makes assumptions about the configuration of the project, making the dependency not easily reusable in other projects.
Happy gittifying!