Thursday, March 05, 2009

Information hiding for flyweight code units

To make code more readable we divide it into units like functions, objects and modules. Following the principles of information hiding we give these units meaningful names that express what their function is, and supplement that with comments providing additional details.

Once a person reading the code is satisfied that those statements do what the name or description says, they can skim the code by paying attention to the names/comments but skimming the details how how the named functionality is implemented. It helps conserve precious mental resources.

There are units of code smaller than a function –- a number of lines of code within a function that togther perform a fairly coherent function like reading a file -– that you want to be able to apply the principles of information hiding to.

I’m talking about cases where separating the lines of code out into a separate function would be too heavyweight a solution – doing so wouldn’t be worth the effort and would actually decrease the code’s readability.

One difference between these and the other units of functionality is that they don’t have some sort of explicit delimiter to indicate their start and end, and this makes the task more difficult. If you put the comment at the start of the lines, does it apply only to the first line or to all of them? If there’s uncertainty you have to look at the code to resolve it, thus destroying the good information hiding properties.

If you have a convention –- that it always applies to all the lines of code till the next blank line, for example -– then you have to learn and remember the convention, and other programmers can always break it by accident or forgetfulness or from simply not knowing it. And if you want to be able to have blank lines within that set of lines of code -– which is often useful to do -– then you have to have an even more complex convention.

Here’s a code formatting idiom that provides a solution to the problem. Write a comment at the start of those lines of code that names/describes the functionality they implement, and then indent all of the lines implementing the functionality:

function X()

statement before task

// comment describing task

statement implementing task
statement implementing task

statement implementing task
statement implementing task

statement following task
statement following task




A language could go a step further and have a way to use an explicit label instead of the comment, as in:
function X()
statement before task

readFile:

statement implementing task
statement implementing task

statement implementing task
statement implementing task

statement following task
statement following task

This could
  • encourage a concise description of the function being performed
  • be used for documentation purposes
  • make it easier for an editor to collapse/expand sections -- i.e. collapse the code 'contained' in the label, so it just shows the label.
  • support refactoring by having operations turn one of those sections into a full-blown function.
I don't know whether such labels would be better to have or not.

1 comment:

  1. i talk about the problem with conventions for which lines the comment applies to... and then i go and suggest something which is really just another sort of convention.

    what I should've said is that the sort of conventions that explicitly deal with blank lines are going to be fairly complex and easily ambiguous, whereas what i'm suggesting is a simple and unambiguous convention.

    ReplyDelete