Skip to main content

Intuitive Programming - Comments

Comments are a topic of vibrant discussion.  Ever since programmers could leave some text in the program that was ignored by the machine the debate started: “what’s a good comment, what’s a bad comment, why comment?”

There are endless instructions to programmers that say many of the following things:
1) Describe your function!
2) Don’t write in the comment what you wrote in the code.
3) Tell people why you are doing what you are doing.

What I think has been missing from this discourse is the audience for comments and through those audiences there is intent.  The code is being read, skimmed or analyzed by people and tools.  So what are the audiences and reading modes?

1) Maintaining and enhancing the code
2) Skimming through the entire module or file to figure out what the overall structure is
3) Reviewing the test files to check out the test coverage and edge cases
4) Seeing the docstrings of functions while being in a separate file altogether
5) Reviewing the files for copyright and intellectual property

I believe the biggest problems in commenting recommendations happen when those intents are mixed up. For instance: when someone is fixing the code they really should read the code and not the comments, comments that just duplicate what the code is doing can only mislead someone in case the function evolves and gets refactored.

Here is what I believe are the best kind of comments for those different modes.

1) Maintaining the code

When people are in maintenance mode, they need to read the code.  A lot of the bugs happen because the initial implementation of the function has diverged and other side effects were introduced.  Comments that duplicate the functionality only increase the chance of misleading someone about what the code is currently doing.  Comments that duplicate what code next to them is doing also increase the amount of time it takes to read and update the functionality.

If there are hard or unintuitive things done in the code, it’s good to leave some commentary behind.  This is where the “Why?” comments are so useful.

# It turns this API doesn’t support more than 30 chunks at a time.
# I am splitting up the stream into max number of chunks here to work around that.

2) Skimming through the entire module or a long file.

This is where the approach of not repeating yourself doesn’t quite work.  If someone has to read many lines of code to understand the intent of a function, class or a big code block, it slows someone down.  I find it very appropriate to write comments in files and blocks to give someone a high level map of what these files are for and most importantly what else should go in them.  What we are optimizing for here is someone scanning through several thousand lines to find the right place to dig in, penalizing that person with the approach of making them read all the code is impractical. 

3) Tests case documentation

When I look through a test file or a group of test files often times I am looking to see if a particular customer facing test case is covered. When we ask people to file a bug we ask them for steps and then we turn around so why not record those high level steps in the automated test?  I also skim through a set of test cases to get a general understanding of what is covered with both negative and positive scenarios.  A lot of people say that the test case name (which is generally 20-50 characters) should tell you what it does, in my experience it not enough text outside of simple unit tests. Here is an example of a test case outline that can save someone from reading 30-40 lines of code:

# Setup a new user with $100 in his wallet
# User goes to the store
# User buy a bottle of water
# User checks out
# Verify that user has has $98 left
# Verify that the user a bottle of water in the bag

The counter argument is that you can and should create test functions and help functions that will eventually make that test case look exactly like the set of steps above, but sometimes it’s impractical.  In this case setting up a user with a wallet might need to be 10-15 lines of code and then other steps might be as well.

4) Function docstrings

A lot of modern editors have the power to help the programmer with auto complete as well as showing them a quick snippet of documentation for the function they are calling.  If the document string (docstring) for the function is in the right place with the right format it’s very powerful and could be a big time saver.  You are not making the person go navigate to your file and read what the function is about.  There are tools out there that take docstrings and change them to a full blown documentation pages.

I think docstrings for public functions start paying dividends almost immediately and unless it’s a toy project you shouldn’t skip them.  A good function documentation culture even on a medium size code base (50,000 lines of code+) is going to help people reuse functions and not look around.

5) Copyright and intellectual property

This is easy: if you are publishing your code for other people to reuse it’s important to tell them what strings (if any) are attached to that reuse.  If you have ever gone through M&A you know that people want to see that you own your IP and just having a copyright message up top is pretty helpful.  This doesn’t take much effort and there are some automated tools that help you automatically insert that text.


In summary the real source of truth for the code is the code, but skillful comments can help people understand the bigger picture, reasons why some things are not obvious and skim through large bodies of code.  You can be doing someone a huge favor by putting the comments into the code and help them speed up their understanding.


Comments

Popular posts from this blog

SDET / QA Engineer Interview Checklist

After interviewing and hiring hundreds of engineers over the past 12+  years I have come up with a few checklists.  I wanted to share one of those with you so you could conduct comprehensive interviews of QA Engineers for your team. I use this checklist when I review incoming resumes and during the interview.  It keeps me from missing areas that ensure a good team and technology fit.  I hope you make good use of them.  If you think there are good questions or topics that I have missed - get in touch with me! SDE/T or QA Engineer interview checklist from Mike Borozdin If you like this checklist you might want to check out these posts: Emotional Intelligence in Software Teams   and  Good-bye manual tester, hello crowdsourcing!

Code versus Configuration

At Ethos we are building a distributed mortgage origination system and in mortgage there is a lot of different user types with processes that vary depending on geography.  One of our ongoing discussions is about how much of the logic resides in code vs. being in a workflow system or configuration.  After researching this topic for a bit, I have arrived at a conclusion that the logic should live outside of code very infrequently, which might come as a surprise to a lot of enterprise software engineers. Costs of configuration files and workflow engines First thing that I assume is true is that having any logic outside of the code has costs associated with it.  Debugging highly configurable system involves not only getting the appropriate branch from source control, you also need to make sure that the right configuration values or the database.  In most cases this is harder for programmers to deal with.  In many FinTech companies where the production data is not made readily acce

Should this be a microservice?

After having developed several distributed systems and been a part of dozens of architectural discussions I decided to put together a way to frame the microservices debate. Microservices have been fashionable for some time. A lot of it stemmed from the fact that big and successful cloud companies are using microservices.  It seems reasonable that to create a “serious system” one must be using serious tools and architecture, today it’s microservices.  No engineer wants to be called out for creating a solution that “doesn’t scale.” The definition for a microservice varies, but overall it tends to be a piece of your system that can run somewhat independently (unless of course it depends on other microservices) and has a REST or queue processing interface.  Overall code encapsulation and separation of concerns have all been around for a long period of time.  Current evolution with containers, fast networks and REST API allows people to easily integrate pieces of their system using web