Resolving Application Dependencies with Git Submodules
Last updated May 09, 2024
Most modern applications rely heavily on third-party libraries and must specify these dependencies within the application repository. Tools like RubyGems, Maven in Java, or Python’s pip are all dependency managers that translate a list of stated application dependencies into the code or binaries that the application uses during execution.
Sometimes the dependency manager can’t resolve the required third-party libraries. Examples are private libraries that aren’t publicly accessible or libraries whose maintainers haven’t packaged them for distribution via the dependency manager. In these cases, you can use Git submodules to manually manage external dependencies.
This guide discusses the pros and cons of dependency management with Git submodules and some alternative approaches to consider to avoid using submodules.
Git Submodules
Git submodules are a feature of the Git SCM that you can use to include the contents of one repository within another by specifying the referenced repository location. It’s a mechanism of including an external library’s source into an application’s source tree.
For example, to include the FooBar
source into the heroku-rails
project, use the git submodule add
command.
$ cd ~/Code/heroku-rails
$ git submodule add https://github.com/myusername/FooBar lib/FooBar
Cloning into 'lib/FooBar'...
remote: Counting objects: 26, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 26 (delta 8), reused 19 (delta 5)
Unpacking objects: 100% (26/26), done.
This command creates a submodule called FooBar
and places a FooBar
directory with the library’s full source tree into the lib
application directory.
After a Git submodule is added locally, commit the new submodule reference to your application repository.
$ git commit -am "adding a submodule for FooBar"
[main 314ef62] adding a submodule for FooBar
2 files changed, 4 insertions(+)
create mode 160000 FooBar
Heroku resolves and fetches submodules as part of deployment.
$ git push heroku
Counting objects: 13, done.
...
-----> Heroku receiving push
-----> Git submodules detected, installing Submodule 'FooBar' (https://github.com/myusername/FooBar.git) registered for path 'FooBar'
Initialized empty Git repository in /tmp/build_2qfce3fkvrug9/FooBar/.git/
Submodule path 'FooBar': checked out '667e0b5717631a8cca657a0aa306c045f06cfda4'
-----> Ruby/Rails app detected
...
Failures to fetch the submodules cause the build to fail.
If possible, use your language’s preferred dependency resolution mechanisms. Submodules can be confusing and error-prone.
Using submodules for builds on Heroku is only supported for builds triggered with Git pushes. Builds created with the API don’t resolve submodules. The same is true for GitHub sync.
Protected Git Submodules
If the referenced Git repository is protected via a username and password, you can reference it with a submodule. You must embed the username and password into the repository URL because remote environments like Heroku don’t have access to locally available credentials.
For example, to add the FooBar
submodule using an HTTP basic authentication URL scheme requires username:password
in this format.
$ git submodule add https://username:password@github.com/myusername/FooBar
This command adds a private submodule dependency to the application while still allowing it to resolve in non-local environments.
Because submodule references are stored in plaintext in the .git/submodules
directory, make sure that their use aligns with your particular security requirements.
Vendoring
While Git submodules are one way to quickly reference external library sources, users often run into issues with its nuanced update lifecycle. If you find the usability of submodules to be counterproductive, you can vendor the code into the project.
Many frameworks allow the use of vendored code, which simply copies the source of the reference library into the application’s source tree.
$ git clone <remote repo> /path/to/some/directory
$ cp -R /path/to/some/directory /app/vendor/directory
$ git add app/vendor/directory
A downside of this approach is that it requires a manual download and copy process when the external library is updated. However, for an external resource that changes slowly or one that you don’t want to introduce changes from, this approach is an option.
Private Dependency Repositories
A robust and scalable approach to dependency management is using a private package repository. For Ruby, Python, and Node.js, this repository is available on Heroku with the Gemfury add-on. For JVM-based languages, you can use a private S3 bucket with the s3-wagon-private tool. Another possibility is hosting your dependencies on Heroku using custom buildpack functionality.
With private package repositories, you can use your language’s dependency management tools while limiting access to only your application or organization. This approach incurs the overhead of properly packaging your referenced libraries for broader distribution. However, it’s a much more scalable approach that takes advantage of your language’s well-supported and vetted dependency toolset.