Skip to content

GitHub Repository Cleanup

Go to | #1 | #2 | #3 | #4 | #5 | #6

In this fourth organization story, I will talk about a coder's paradise, or hell: Github! Github is the web interface for a program called Git, which is a version control software designed for code projects, but can be used for other things as well. I have a link to my GitHub user nicfv at the footer of each page on this website.

Repository Limit

One code project on GitHub is called a repository, or repo. GitHub used to limit free users to something like 5 public repositories, and no private repositories. Since Microsoft has purchased GitHub, the rules have relaxed to unlimited public and private repositories for free users, which is awesome! I started using GitHub to track code and various things including lecture notes I took on my tablet, homework assignments, and other academic things, as well as other non-code projects.

Eventually, my account (and local files on my computer) were starting to get littered with junk projects I would touch once and never again. This isn't really that much of an issue on GitHub, because of the fact you can have as many repositories as you want (with a limit of 5GB, I'm not sure if this is per repo or a grand total). It just depends on how you want to present your account - having too many repositories can subjectively look "spammy", or old repositories may have bad code, dead projects, or just general clutter.

However, there is a bit of an issue with local repositories. Running the git init command will initialize a git repository in the working directory, and a side effect of that is it creates a hidden .git/ folder. With a completely empty repository, the disk size of .git/ is 116 kB. That's not that big, relatively speaking, but again that is for a completely empty repository.

Just for testing, I repeated this command several times to get the size increase of each small commit.

Bash
echo 'hello' >> readme.md && git add . && git commit -m'something' && du -hs .git/

This is the results after 10 commits:

# of commits .git/ size Delta
0 116 kB +116 kB
1 172 kB +56 kB
2 196 kB +24 kB
3 220 kB +24 kB
4 244 kB +24 kB
5 268 kB +24 kB
6 292 kB +24 kB
7 316 kB +24 kB
8 336 kB +20 kB
9 356 kB +20 kB
10 376 kB +20 kB

Small commits like this only increase the size by about 20 kB per commit, but the total file size only increases. Even if I delete readme.md and then commit again, the size of .git/ is now 392 kB! This is because it stores the entire history of that repo and lets you check out any revision of any tracked file. If you start tracking binary files, like images or executables, the size increases much faster. The size of .git/ for this website is currently 92 MB!

Cleanup Process

For some of my academic repositories, the size of .git/ was massive, some of them up to 1 GB. First I needed to find where my repos are locally. This simple command lists out the directories of each git repository on my computer.

Bash
find . -name .git

For many of them, I completely deleted the .git/ folder if it was no longer necessary. This means that I lose access to the history of each tracked file, but in most cases, it didn't matter anymore. And for others, I had a clever idea: combining histories.

Combining Histories

This idea would be the reason for my one new GitHub and Git repository. I called it simply code, for code projects. I decided to only track code-related projects with Git, as anything else really bogs down the disk usage. Maybe I'll use Git if I have something really important to track the history for.

The following set of commands merges a repository's history with that of code. If you, the reader, are following along, make sure to change the remote url to your own GitHub repository.

Bash
1
2
3
4
5
6
7
8
9
cd ~/path/to/repository
mkdir repository-name
mv * repository-name
git add .
git commit -m 'Move into project subdirectory'
git remote add code git@github.com:nicfv/code.git
git config pull.rebase false
git pull code main --allow-unrelated-histories
git push code main

After this, I can safely delete the original repository's entire folder since the history now belongs to code and I can check out any files I want there. This massively cleaned up my PC, freeing several gigabytes of storage. The .git/ file in code increased but not as much as the savings I got from deleting all those extra directories. Also, I went from having almost 30 repositories on GitHub down to only 12, and still planning to merge more of them into code.

Conclusion

This project definitely wasn't necessary, but it has been a major file cleanup both locally and on GitHub. It's also made it easier to switch from project to project in code since everything belongs to the same repository. I would not recommend following this method for larger projects, and definitely would not recommend it either for public projects. However, it's a nice way to store all your private, small projects in just one place.