If you’re a developer and you’re not using git for version control, you’re wrong. Moreover, git can, and should be used when developing statistical or machine learning models, for example if developing a “customized” Bayesian model in Stan or developing your own neural network architecture in TF/PyTorch, the latter I’m not as familiar with. Incrementalism.
My Set-up
I used Ubuntu and fedora for years, but after working in a corporate environment and destroying my computer a few times dual-booting Linux and Windows, I’m just using windows. After a few different Linux terminal emulators, I’m using WSL, which is kinda like an Ubuntu virtual machine. I sometimes use Windows’ visual studio, but just as a package manager for installing tool chains. So if you can’t find something on apt in wsl, try installing via Visual Studio. It’s more Windows compatible. I found that, for some reason, I have to start wsl using command prompt in windows, or else there’s some pathway errors.
Anyway, for version control, you want to have your code saved in multiple places, or “distributed,” so that, for example, there is a leak in your roof and it rains and leaks on your computer (which has actually happened to me), you don’t have to re-do a bunch of work.
Getting Started
Git was actually written by Linus Torvalds in 10 days, which should make you feel bad, but hopefully as a motivator to improve your programming skills. Here’s a quote from Linus:
“Software is like sex, it’s better when it’s free.”
Anyway,
In terminal:
$ git init
Sets up (“installs”) a git repository, and makes all git commands available to you. The dollar sign isn’t part of the command, it just indicates we’re using command-line or terminal.
The basic commands you’re using, if only locally, are $ git add and git commit -m "[YOUR MESSAGE HERE]". After $ git add, you should specify a file pathway, or to add the entire directory, you can use $ git add . or $ git add *, the latter just adding every change you made in the current directory. Sometimes it’s good to add one file at a time, in case you want to revert a certain commit. Suppose to want to keep changes you made in one file, and not the other file, and you might want to revert changes just made in one file.
After a commit, you can check out the history of your commits using $ git log. I use this all of the time. I like numbers, C++ and probability, so I like to annoy the Stan Development Team. I was actually named a “Stan Developer” like, 8 years ago, but stopped contributing, but started again because it’s fun, and we need to practice.
“Most good programmers do programming not because they expect to get paid or get adulation by the public, but because it is fun to program.” – Linus
Here’s what I see when I run $ git log on a branch of the Stan math library, the C++ back-end to Stan, where all of the autodiff magic happens.
andre@compy:~/stan-dev/math$ git logcommit 7a955d9a528e1941dd9fd97b81d664412ffd0c4a (HEAD -> feature/issue-3311-test-thread-tbb-exp, origin/feature/issue-3311-test-thread-tbb-exp)Author: Andre Zapico <andrezapico@gmail.com>Date: Thu Apr 23 17:58:45 2026 -0400 0;10;1c# This is a combination of 14 commits. parallel_for, blocked range compiles for stan::math::exp compiling blocked_range works fine some progress, now a type deduction issue? ok something closer... implement struct version for parallel_for... uncompiled begin new class to use parallel for almost compiles... getting close, have template deduction failed which we can figure out almost compiles hold on compiles remove dead code compiled parallel_for, blocked_range for stan::math::exp compiled parallel_for, blocked_range for stan::math::expcommit 45d1de233ef3685e4e1ad3cd0c2d0392aa1bfd71 (stan-dev/develop, stan-dev/HEAD, origin/develop, develop)Merge: ac8c21a5e9 a587b5c169Author: Brian Ward <bward@flatironinstitute.org>Date: Tue Apr 21 09:06:22 2026 -0400 Merge pull request #3307 from stan-dev/ci/distribution-tests-tweaks Try to improve error handling of parallel distribution tests
What I’ve done here, which is a little fancy, is rebase my commits into one commit, so that when I push to a remote branch, they don’t need to see all of my personal notes. The command I used to “rebase” was
$ git rebase -i HEAD~14
Which took my last 14 commits, squashed them into one, and I renamed the squashed commits into something that’s readable by other developers. In reality, we’re adding onto an existing code base, which can vary in size, so we need to make sure we are articulate in what we’d like to add.
I can also check out what my git log history looks like, prior to squashing the commits, using $git reflog. Notice, below, we have all of the unique commit hashes so we can restore the codebase to any commit, in case we really mess things up.
andre@compy:~/stan-dev/math$ git reflog7a955d9a52 (HEAD -> feature/issue-3311-test-thread-tbb-exp, origin/feature/issue-3311-test-thread-tbb-exp) HEAD@{0}: rebase (finish): returning to refs/heads/feature/issue-3311-test-thread-tbb-exp7a955d9a52 (HEAD -> feature/issue-3311-test-thread-tbb-exp, origin/feature/issue-3311-test-thread-tbb-exp) HEAD@{1}: rebase (squash): 0;10;1c# This is a combination of 14 commits.10d142ca4c HEAD@{2}: rebase (squash): # This is a combination of 13 commits.763d48f415 HEAD@{3}: rebase (squash): # This is a combination of 12 commits.7af8d5bae7 HEAD@{4}: rebase (squash): # This is a combination of 11 commits.561631a42e HEAD@{5}: rebase (squash): # This is a combination of 10 commits.cae32c810c HEAD@{6}: rebase (squash): # This is a combination of 9 commits.c7a6e5daaf HEAD@{7}: rebase (squash): # This is a combination of 8 commits.d2b57cd034 HEAD@{8}: rebase (squash): # This is a combination of 7 commits.3c3492c4c4 HEAD@{9}: rebase (squash): # This is a combination of 6 commits.d084728108 HEAD@{10}: rebase (squash): # This is a combination of 5 commits.8760edf095 HEAD@{11}: rebase (squash): # This is a combination of 4 commits.7c810c19c9 HEAD@{12}: rebase (squash): # This is a combination of 3 commits.5fdefaaf63 HEAD@{13}: rebase (squash): # This is a combination of 2 commits.27c6b4c9f4 HEAD@{14}: rebase (reword): parallel_for, blocked range compiles for stan::math::exp14b4cd1680 HEAD@{15}: rebase: fast-forward45d1de233e (stan-dev/develop, stan-dev/HEAD, origin/develop, develop) HEAD@{16}: rebase (start): checkout HEAD~149309f4f245 HEAD@{17}: rebase (abort): returning to refs/heads/feature/issue-3311-test-thread-tbb-exp45d1de233e (stan-dev/develop, stan-dev/HEAD, origin/develop, develop) HEAD@{18}: rebase (start): checkout HEAD~149309f4f245 HEAD@{19}: rebase (finish): returning to refs/heads/feature/issue-3311-test-thread-tbb-exp9309f4f245 HEAD@{20}: rebase (start): checkout HEAD~129309f4f245 HEAD@{21}: commit: compiled parallel_for, blocked_range for stan::math::expd934f2ed28 HEAD@{22}: commit: compiled parallel_for, blocked_range for stan::math::exp68f4eea5a7 HEAD@{23}: commit: remove dead codebc0a3c4772 HEAD@{24}: commit: compiles9507e01f67 HEAD@{25}: commit: hold on0299015480 HEAD@{26}: commit: almost compiles62cca17d7a HEAD@{27}: commit: getting close, have template deduction failed which we can figure outae7c448f56 HEAD@{28}: commit: almost compiles...227feb825d HEAD@{29}: commit: begin new class to use parallel for7802b4151a HEAD@{30}: commit: implement struct version for parallel_for... uncompiled0743a4c317 HEAD@{31}: commit: ok something closer...9723de36a7 HEAD@{32}: commit: some progress, now a type deduction issue?f888195280 HEAD@{33}: commit: compiling blocked_range works fine14b4cd1680 HEAD@{34}: commit: some progress45d1de233e (stan-dev/develop, stan-dev/HEAD, origin/develop, develop) HEAD@{35}: checkout: moving from develop to feature/issue-3311-test-thread-tbb-exp
This is mostly so I’m not wasting other developers’ time by having them reading my comments talking to myself. I’m experimenting, here.
Branching
Some groups have different styles. SAS did something different, but I prefer the way the Stan development team taught me. And at some hedge fund, with only 4 people, they just said, “make sure we’re all working on something different so we don’t have merge conflicts.” Which retrospectively is hilarious, and not good practice when you’re working on a team with a bunch of people. I think they used Tortoise or something, for verion control.
In the Stan project, develop is the branch for code that’s already been reviewed and tested by developers, and ready to be released in the next version. main branch is for production code that’s available to users, tested and should not be experimental and should be bug-free. Sometimes they slip through.
In general, you want to branch from develop onto a feature branch. Different projects have different syntax for branching and commit messages. On the Stan project, they create an issue on github first, and the commit message must include the issue number on a feature branch, branched from develop.
When I run $git branch, here’s what I get:
andre@compy:~/stan-dev/math$ git branch develop feature/issue-3308-testing-tbb-gp_exp_quad_cov* feature/issue-3311-test-thread-tbb-exp
The star indicates the branch I have checked out and I’m working on, which I have branched from develop, and includes all changes from develop. The second branch I abandoned, to do something simpler. I can delete this. Or, if you’re working with another developer on an experiment, you can push to an experimental branch and they can pull and review or make additions. This is also useful for statistical or machine learning modeling, as well, if you’re working on developing a model together.
This is what helped me get started with C++, I would push to a feature branch, and Daniel Lee would tell me it sucked, and then pass the code back to me. This is to be expected.
To create a new branch, that includes code from the current branch, you can use $ git checkout -b [NEW-BRANCH-NAME].
Working on a remote branch with a team
Suppose we’re using Github. After you’ve thoroughly tested your code, push to a clone of your target repository on Github. I think I have merge privileges for stan-dev/math, but I don’t really touch it, because I want to mess anything up. This is safer. Let’s see what my remote set up looks like. I have both my clone, and the development version of Stan as two different remotes. I can use $ git remote -v to show all of the remotes I set up.
andre@compy:~/stan-dev/math$ git remote -vorigin git@github.com:drezap/math.git (fetch)origin git@github.com:drezap/math.git (push)stan-dev git@github.com:stan-dev/math.git (fetch)stan-dev git@github.com:stan-dev/math.git (push)
So, if I want to push to my clone on a specific feature branch, I can use `$ git push origin So, if I want to push to my clone on a specific feature branch, I can use $git push origin feature feature/issue-3311-test-thread-tbb-exp.
Note 1: You have to make sure your branch is up-to-date with the branch you’ve branch from, so prior to working on a code-base, you need to $git pull origin develop, into your branch, so that when you make a push or need to merge, there’s no merge conflicts, and your current branch is up-to-date with changes that have already been made.
Note 2: Some times, instead of using $ git pull, some developers like to use $ git fetch, after committing local changes, so there’s no merge conflicts, but I’m reckless so I’ll just $ git pull and manage merge conflicts. It’s good to pull from develop prior to adding changes on your local branch for the day. I do this prior to developing on a branch that’s consistently being updated, so I don’t have merge conflict to manage later.
Note ssh-keys:
Notice how we’re using git@github.com:drezap/math.git. This is a recent change, where we’re using rsa keys instead of passwords for github for additional security. We can’t just clone the URL, we must set up SSH keys. Cypher, cryptography, blah blah, go read about it.
Go to your home directory. For wsl, I’m using the Ubuntu “virtual machine” home directory, which in windows acts like a mounted directory.
$ ssh-keygen ssh-keygen -t ed25519 -C "your_email@example.com"
The email you used for Github. Just click enter for password and leave it blank. Add that to your keys in Github.com.
We should then be able to push to a feature branch, with a new name, on your clone of the target repository. Sometimes I mess things up, so please let me know in the comments below if this doesn’t work.
What’s left?
There’s a lot of useful tools for developers to explore, and I encourage exploring and exploiting tools in $ git. I should do something on managing merge conflicts, but I’m about done writing for today. I’d rather be programming. I’m forgetting stuff, I’m sure. There’s lots of cool tools to help and save time.
Citations
Whatever, you’ll figure it out.
“In real open-source, you have the right to control your own destiny.” – Linus
Leave a comment