Comments are a topic of vibrant discussion. Ever since programmers could leave some text in the program that was ignored by the machine the debate started: “what’s a good comment, what’s a bad comment, why comment?”
There are endless instructions to programmers that say many of the following things:
1) Describe your function!
2) Don’t write in the comment what you wrote in the code.
3) Tell people why you are doing what you are doing.
What I think has been missing from this discourse is the audience for comments and through those audiences there is intent. The code is being read, skimmed or analyzed by people and tools. So what are the audiences and reading modes?
1) Maintaining and enhancing the code
2) Skimming through the entire module or file to figure out what the overall structure is
3) Reviewing the test files to check out the test coverage and edge cases
4) Seeing the docstrings of functions while being in a separate file altogether
5) Reviewing the files for copyright and intellectual property
I believe the biggest problems in commenting recommendations happen when those intents are mixed up. For instance: when someone is fixing the code they really should read the code and not the comments, comments that just duplicate what the code is doing can only mislead someone in case the function evolves and gets refactored.
Here is what I believe are the best kind of comments for those different modes.
1) Maintaining the code
When people are in maintenance mode, they need to read the code. A lot of the bugs happen because the initial implementation of the function has diverged and other side effects were introduced. Comments that duplicate the functionality only increase the chance of misleading someone about what the code is currently doing. Comments that duplicate what code next to them is doing also increase the amount of time it takes to read and update the functionality.
If there are hard or unintuitive things done in the code, it’s good to leave some commentary behind. This is where the “Why?” comments are so useful.
# It turns this API doesn’t support more than 30 chunks at a time.
# I am splitting up the stream into max number of chunks here to work around that.
2) Skimming through the entire module or a long file.
This is where the approach of not repeating yourself doesn’t quite work. If someone has to read many lines of code to understand the intent of a function, class or a big code block, it slows someone down. I find it very appropriate to write comments in files and blocks to give someone a high level map of what these files are for and most importantly what else should go in them. What we are optimizing for here is someone scanning through several thousand lines to find the right place to dig in, penalizing that person with the approach of making them read all the code is impractical.
3) Tests case documentation
When I look through a test file or a group of test files often times I am looking to see if a particular customer facing test case is covered. When we ask people to file a bug we ask them for steps and then we turn around so why not record those high level steps in the automated test? I also skim through a set of test cases to get a general understanding of what is covered with both negative and positive scenarios. A lot of people say that the test case name (which is generally 20-50 characters) should tell you what it does, in my experience it not enough text outside of simple unit tests. Here is an example of a test case outline that can save someone from reading 30-40 lines of code:
# Setup a new user with $100 in his wallet
# User goes to the store
# User buy a bottle of water
# User checks out
# Verify that user has has $98 left
# Verify that the user a bottle of water in the bag
The counter argument is that you can and should create test functions and help functions that will eventually make that test case look exactly like the set of steps above, but sometimes it’s impractical. In this case setting up a user with a wallet might need to be 10-15 lines of code and then other steps might be as well.
4) Function docstrings
A lot of modern editors have the power to help the programmer with auto complete as well as showing them a quick snippet of documentation for the function they are calling. If the document string (docstring) for the function is in the right place with the right format it’s very powerful and could be a big time saver. You are not making the person go navigate to your file and read what the function is about. There are tools out there that take docstrings and change them to a full blown documentation pages.
I think docstrings for public functions start paying dividends almost immediately and unless it’s a toy project you shouldn’t skip them. A good function documentation culture even on a medium size code base (50,000 lines of code+) is going to help people reuse functions and not look around.
5) Copyright and intellectual property
This is easy: if you are publishing your code for other people to reuse it’s important to tell them what strings (if any) are attached to that reuse. If you have ever gone through M&A you know that people want to see that you own your IP and just having a copyright message up top is pretty helpful. This doesn’t take much effort and there are some automated tools that help you automatically insert that text.
In summary the real source of truth for the code is the code, but skillful comments can help people understand the bigger picture, reasons why some things are not obvious and skim through large bodies of code. You can be doing someone a huge favor by putting the comments into the code and help them speed up their understanding.