Taking notes on a new codebase

Readers –

I'm Nat Bennett, and I read a lot of code.

In my day job I’m a consultant — I do a few different things and occasionally some greenfield work, but usually there’s a lot of code that someone else wrote, and I’m trying to make sense of it and extract some Business Insights.

I’ve been trying for a few weeks to write about how I do this, since it involves some note-taking techniques and I know that some of you are very much here for that. The piece keeps expanding into something bigger and woolier on “getting context on new codebases” generally, and I keep running out of steam before I get to specific techniques.

So in today's Simpler Machines, let’s just get tactical write away, and talk about note-taking for dependency mapping.


Here’s the scenario: I’m looking at a big system with several micro services, several data stores, a pub-sub queue, and integrations with several APIs. I have some questions I need to answer, ranging from “how much work would it take to replace one of the external APIs with a similar service from a different vendor?” to “which parts of this system are actually necessary to do some subset of its operations” to “what does this subset of services even do?” I don’t have access to anyone who designed or worked on the system. I do have some architecture documentation, but it’s for an older version of the system — I don’t actually know how accurate that documentation is.

In order to answer these questions, I want to make a map of the major components and how they call each other. My weapon of choice here is Obsidian, because of the way its visualization tools work. (It's also convenient that it works on Markdown files – Obsidian Vaults are easy to separate from each other, and share via Github with a client.) I’ll start with a list of the components I want to map — in this case, every independently deployed service, every external API, and every data service.

I turn that list into a set of notes, one for each service. Creating the list of notes first is important because Obsidian will suggest notes that have already been created when I type in the [[ double open bracket, which helps me avoid accidentally creating duplicate notes with different names.

Once I’ve got that starting list, I’ll go through, read the code, and add a list of links to each note, which ends up looking something like this:

Cloud Controller

apps handler has clients for
[[storage]]
[[identity service]]

routes handler has clients for
[[routing_api]]
[[identity service]]
[[storage]]

...

(I’m using Cloud Foundry as the domain example because it’s an open source service-oriented codebase, but the style here is more like a Kotlin codebase I worked with recently.)

“Read the code” is doing a lot of work. I don’t read all the code in every service at this point. If I’m lucky I can get everything I need from one or two files per service. If dependency management is more complicated, I usually start by looking for a service that I know the one I’m reading talks to, and then check for things that look like its client. Cloud Controller, for instance, has a thing called a DependencyLocator that lists most (all?) of its dependencies.

Even in the simplest case I often grep a bit for things that look like HTTP or RPC requests, or for libraries that make HTTP and RPC requests, just in case there’s some one-off call-maker lurking.

(This can get really tricky if the relationship between subcomponents is very abstract and modular, but I luckily have not had to deal with too much of that.)

Once I’ve made that set of interlinked notes, I can switch over to graph view and see what’s basically a directed graph of the system. I don’t have a great example that I can share, but this is usually a lot messier than the official architecture diagram, and might include entire major paths that the diagram doesn’t cover. This is often useful for communicating about the complexity but I also don’t usually use this graph directly for any client facing artifact — I’ll use it to help me draw something more comprehensible in a vector graphics tool.

Further notes usually go into notes named by the function, linked to within the note for the service that owns the function — or by a function further up in the call chain. This makes the main graph less tidy, but it’s easy to retain the “just the services” graph by keeping all those notes in their own folder and then filtering based on path.


Do you have code note-taking techniques yourself? Do you know anyone who writes about this well? Most of what I know about understanding software I learned early in my career from exploratory testers – it doesn't seem to be something that software developers write about much, except maybe in the context of changing legacy code.

Oh and let me know if you try this yourself, especially on something that you can share – I'd love to get some better examples.

- Nat