This paper covers the history and use of comments in programming languages, from the beginning of programming to the present day. Comments in many programming languages are discussed including modern languages such as C, Java, scripting languages, and older languages such as Ada, COBOL, and FORTRAN. Design issues, types of comments, and problems with comments are illustrated.

COBOL Comments. 3

Position of Comment Indicator 3

End-of-Line Comments. 4

Block Comments. 5

Syntax of Comments. 6

Placement of Comments. 6

XHTML Comments. 7

Nested Comments. 7

Comments for Backward Compatibility. 9

Comments for Hiding Code. 9

Mega-Comments. 10

Questions. 10

Answers: 11

Comments

Please send suggestions and comments to dvantassel@gavilan.edu

Comments are used in a programming language to document the program and remind programmers of what tricky things they just did with the code, or to warn later generations of programmers stuck with maintaining some spaghetti code. While comments may seem to be a minor issue in a language, an awkward comment format in a language is a nuisance and can be a source of nasty errors. The content of a comment is handled as if it were not there by the compiler. Examples of modern-day comments are:

max = 100; // using default size.

/* check input for valid values and

print error message for accounting if problems. */

We have two types of comments here, the end-of-line comment and the block comment. An end-of-line comment terminates at the end of the line. A block line comment has a terminator and can continue for several lines, or be less than one line.

Comments were called REMarks in BASIC. COBOL used a NOTE among other types of comments. ALGOL 60 used the reserved word comment to start a comment and the semicolon to terminate the comment.

Comment Design Issues

There are a few comment design issues for us to consider. Some are:

Where do comments start? Do they start any place or at a particular column[1]? Early COBOL, BASIC, and FORTRAN started comments at a particular position.
How are comments ended? Obvious choices are at the end of the line, or with a comment terminator like Java */.
Can comments nest? If so, exactly how does the syntax work?
How can we comment out a hundred lines of code that has comments when we want to do testing or debugging?

Some of these issues have answers in modern languages but other of the issues are still unresolved.

Full-Line Comments

In FORTRAN, BASIC, and COBOL languages, comments are full lines; and each comment is begun by a specific comment mark in a fixed position on the line. In BASIC, REMark lines start with REM.

010 REM FIND PRIME NUMBERS LESS THAN 100

020 REM BY DENNIE VAN TASSEL

030 REM JULY 4, 1965

040 LET A = 1

The same thing would be done in FORTRAN as follows:

C FIND PRIME NUMBERS LESS THAN 100

C BY DENNIE VAN TASSEL

C JULY 4, 1957

A = 1

A FORTRAN comment is indicated by a C in position 1, and only works if the C is in position 1. The comment takes the entire line. In these early languages, programming was done with cards so there was an obsession with lines (cards) and the beginning and end of cards (lines) that present generation programmers cannot understand. Multiple-line statements or two statements on the same line were not imaged. Since both BASIC and FORTRAN used single lines for their statements, it is not surprising they used the same convention for comments. With these full-line comments, they are used on separate lines before or after code that needs to be commented.

COBOL Comments

COBOL has a similar style of comments. An asterisk has to be put in position 7, and then the rest of the line is a comment. COBOL labels have to start in position 8 or later, and COBOL statements have to start in position 12 or later. Here are how comments would look in COBOL:

010010* FIND PRIME NUMBERS LESS THAN 100

010020* BY DENNIE VAN TASSEL

010030* JULY 4, 1959

010035 START-LOOP.

010040 MOVE 1 TO A.

In above code the numbers in position 1-6 were the used for page number (i.e. 010) in positions 1-3 and card number (i.e. 040) in positions 4-6 for the last line of the above code. Positions 73-80 were often used to indicate the name of the program, so most comments would end by position 72. A good 1960 COBOL (or FORTRAN) compiler could indicate if cards were out of sequence. In FORTRAN this same numbering scheme was used, but the numbers were in positions 72-80.

Position of Comment Indicator

BASIC, FORTRAN, and COBOL have two common characteristics for their comments. First, comments terminate at the end of the line. Second, the comment indicator was in a particular position. The 80-column cards made a particular column meaningful. All of these languages were very column oriented. We can call this type of comment a positional comment, since it must start in a particular position. Table x.1 describes this type of comment.

Language	Comment Syntax
FORTRAN	C in position 1
BASIC	REM at beginning of the line
COBOL	* in position 7

Full-Line Comments

Table x.1

Notice that all three of these languages are very old. When these languages started, computers had memory of 4K or 8K, which is probably less than your toaster. Knowing where comments had to start made it easy for early compilers to find the comment and dispose of it easily. The compilers needed all the help they could get. So if a compiler knew that all FORTRAN comments had to have a C in position 1, then it was easy to find the comments. Then the compiler could ignore that line. If you look at modern languages where the comment can start at any place on the line and end at any place on the line, a good portion of that available 4K would have been necessary just for processing comments!

End-of-Line Comments

With assembly language we have two improvements in comments. First, the comment do not have to be indicated in position 1; the comment could start in a later position. Second, the line could have useful commands or instructions to the left of the comment. Assembly language starts a comment with a semicolon any place on the line. Here is how comments can look in assembly language:

; FIND PRIME NUMBERS LESS THAN 100

; BY DENNIE VAN TASSEL

; JULY 4, 1954

MOV C, 1 ; SET COUNT TO 1 FOR THE STARTING VALUE.

Now we do not have to start the comment in a particular position. The MOVe command has a comment on the same line as the move command. These comments still terminate at the end of the line and are called end-of-line comments. We have expanded our comment capability quite a bit, especially since we can have useful commands on the same line to the left of the comments. Table x.2 illustrates end-of-line comments in several languages.

Language	Comment Syntax
ALGOL 60	; (semicolon)
Assembly Languages	; (semicolon)
Ada, mySQL	-- (two dashes)
C++/Java	// (two slashes)
FORTRAN 90	! (exclamation mark)
Perl, TCL, UNIX Shell, mySQL	# (hash sign
Visual Basic .NET	' (apostrophe)

End-of-Line Comment

Table x.2

All these end-of-line comments can start any place on the line, and can be placed after commands. These end-of-line comments are safer then block comments because end-of-line comments are terminated automatically at the end of the line.

Block Comments

When we get into languages with multiple-line programming statements we find comments that can be multi-line or in-line comments. These comments are not concerned with line boundaries. There are two needs not addressed by the previous two types of comments. We may want a short comment in the middle of some code (an in-line comment) or we may want comments that are several lines long. Wanting or needing short in-line comments in the middle of a line requires a comment with delimiters. Multiple-line comments can be done with several full-line comments.

Here is what some of the languages use for block comments:

Language	Comment Syntax
ALGOL	comment “ends with” ;
Pascal	(* . . . *) or { . . . }
Many languages	*/ . . . /*
Forth	( . . . )
HTML	<!-- . . . -->
Haskell	{- -}

Block Comments

Table x.3

ALGOL starts a comment with the word comment and ends the comment with the first semicolon it finds. Early Pascal used (* and *) for comments since they only had round parentheses on keyboards back then. After brackets were added to input keyboards brackets were allowed for comments.

It is obvious that the C style comments have won, but they came from B (ALGOL?? which one or did I make this up). “Multiple-line comments” is not quite correct terminology since these comments can be on only one line with commands on either side as follows:

sum = 0; /* initialize variables */ max = 100;

But if you do something like this, you need to be punished in some way. There does not seem to be a good name for this type of comment. The comment can be before a command, in the middle of a command, after a command, or be several lines long. The best terminology seems to be to call it a block comment, and that is what it is called in some textbooks.

When Ada was designed, both block and end-of-line comments were in common usage. But Ada has only one type of comment. Ada uses two dashes (--) to start a comment that ends at the end of the line. My guess is Ada designers did not feel the benefit of block comments was greater than the problem of run-away comments (not closing a block comment).

Syntax of Comments

Notice that some languages (assembly, FORTRAN 90, and Perl) start comments with only one character, but other languages (Ada, C++) use two characters to start a comment. Using two characters to start a comment such as // or /* helps prevent the accidental starting of a comment such as the semicolon single character (;) in assembly language and the exclamation (!) in FORTRAN 90. One other observation is we need to use two characters that will not otherwise have a meaning in the language. The double slash // does pretty well in this context, but the /* does not do quite as well. For example in C we use a lot of pointers. Suppose we have a pointer ptr, and want to use *ptr to get the contents of the address being pointed at. Then you (not me!) might type the following line:

a =1/*ptr + 4.3;

Do you see any problem here? The “/*ptr + 4.3” looks a lot like the start of a comment. This is one of the few places in C where a space is significant. So, we need to change the line to:

a =1/ *ptr + 4.3; or

a =1/(*ptr) + 4.3;

The last version using parentheses is probably better, but I needed a place to show that wonderful example of where a space is important in that first line.

A second lesser problem with a 2-character comment delimiter is that an extra space between the two-character comment delimiter will cause the comment to be missed. For example:

x_ptr = x / *ptr

So is the above trying to do division or was an accidental space put after the slash and before the asterisk of a comment? This problem will probably be caught be the compiler, or at least I have not been able to come up with an example where the compiler would not find it.

Placement of Comments

Where can comments be placed? Can comments go before or after the program? In most modern languages comments can go before or after the program. But in XML comments are not allowed before the first statement. In most languages a comment can go any place a space would occur except within a character string or within another comment.

While the previous rule is a common description of where comments can go, it is not quite correct. A comment cannot be placed where it would hide the start or end of a block comment. For example:

/* Dennie Van Tassel

wrote this nice program

// with great skill and few smarts. */

So were you smart enough to see what was wrong with the above comment? The last line has an end-of-line comment that hides the ending of the block comment. I am sure you saw it.

XHTML Comments

XHTML has similar potential problems. For example, XML comments cannot go within declarations, tags, or other comments. Also, since XML and HTML use a paired command structure, we must be careful not to mess up the pairing. All of the following are illegal in XML:

<!--

<x12>

Illegal since messes up pairing of x12 tag -->

</x12>

<!--

<B12>

</B12>

-->

There are several commenting errors in the above XML code. On the first line we have a comment in the tag <A00>, which is not allowed. Next, we start a comment on the line before the tag <x12>, which hides that tag. In the last four lines we have comments inside comments (nested comments) which is also not allowed. Comments inside comments (nested) would often be useful and this topic is discussed next.

Nested Comments

One serious problem with multiple-line comments is forgetting to terminate a comment. In C++ we could often have something like this:

/* set variables

a = 0;

/* set maximum size */

maxs = 100;

What is incorrect with the above code? Go back and look at it again. If you missed the error this shows how easy that error is to miss. The first comment was erroneously not closed so the statement “a = 0;” gets eated up (or swallowed up) in a comment.

This type of error, called a run-away comment, is a very difficult bug to locate! Comments that are stopped at the end of the line avoid this problem. This problem is the primary reason people argue that multiple line comments are a bad option. Thus there is some debate whether multiple-line comments are a good or bad idea.

Ada does not have multiple-line comments. Instead their comment starts with two dashes and terminate at the end of the line. I imagine they decided against having multiple-line comments to avoid the problem of run-away comments.

There are several solutions to this problem of nested comments. One is the compiler can warn about all nested comments, that is a comment that has a “/*” in it. The second solution is to forbid nested comments, which is done in some languages, including C++. If a comment starts with a /* then there cannot be another /* in the comment. Another method for avoiding the error of not terminating a comment is for the compiler to check for statement terminators (the semicolon) in comments and provide a warning.

In the above incorrect code, the semicolon at the end of the line “a = 0;” would generate a warning message by the compiler. Otherwise, we can allow nested comments, but the compiler can indicate any comments that do not nest properly, and warn about nested comments. Different languages use different approaches and each approach seems to have its own benefits and drawbacks.

For example, in XHTML, comments are opened with , but otherwise, we cannot insert two consecutive dashes in the comment. Thus

<!-- set variables

<b>careful</b>

<hr>

is a syntax error in this language. So one may jump to the conclusion that nested comments should be outlawed.

But there is another opinion. Besides that nested comments are useful, neat, and elegant, there is another good reason for wanting them. When we need to comment out statements that have comments:

/* comment out for testing

a = 0;

/* set parameters for end of year */

months = 12;

end of commented testing section */

If we do not allow and handle nested comments, the first comment will end with end of the second comment, and than the last line is like a dangling else, but now we have a dangling comment closing. If we allow nested comments, then everything works fine. The solution of allowing nested comments would be similar to our approach of nested blocks and nested if-then-else statements. We match the closing comment symbol with the closest previous opening comment symbol.

The problem of nested comments is a large problem. During testing, debugging, or for early releases of software, we may need to comment out hundreds of line of code. We hope all these lines have many comments. Few modern languages handle this problem well.

Some languages allow nested comments. REXX and Haskell have nested comments and they nest like any other structure needing nesting. It is interesting that few languages allow nested comments.

Comments for Backward Compatibility

With the web, comments are used to make code backward compatible which is a difficult task since we cannot change history except in science fiction and politics. We use comments to hide JavaScript code from old web browsers as follows:

<script>

<!--

JavaScript code here

//-->

</script>

HTML comments start with . In the above code the second line “” immediately before the closing </script> command. Now the new browsers are instructed to ignore comments inside script blocks. Thus the JavaScript code gets used. The old browsers see a comment and do not process any of the JavaScript commands because they think all that is just a comment. Otherwise, these JavaScript commands might cause errors for the browser.

When we add JavaScript or Cascading Style Sheets to a web page, we also require the closing comment to start on a new line with // and then -->. The // is a regular single-line comment in JavaScript. So the // is used to comment out the closing -->, otherwise we would have a syntax error in our JavaScript program. This use of comments is quite new and quite complicated. Some very clever people figured out all this!

Comments for Hiding Code

While comments are needed for documenting a program, comments are also used to hide code that is needed for debugging or testing but not for production. Here is a commented-out statement:

// cout << “count= “ << count << endl;

The above line is useful for debugging, but not needed for production. Often the best thing to do is leave the debugging or testing statements in the program but comment them out. Commenting out code is not just for debugging or testing. A half-implemented procedure in a production version can be left alone in the program by commenting it, without having to remove and then re-add the code later.

Mega-Comments

We need a fourth type of comment, a mega-comment, that can be used to comment out code that contains regular comments. Few languages have this category. XML has come up with a mega-comment to comment out code that avoids the problem of nested comments:

<![IGNORE[

DTD. . .

]]>

which will ignore the DTD line. Any other type of comment can be enclosed within the IGNORE block. Then we want it included, we change the first line to

<![INCLUDE[

and the code is included. This allows us to document what code is needed for debugging/testing and to switch back and forth easily. This type of XML comment can also be nested.

Thus we have four types of comments. They are

Full-line comments
End-of-line comments
Block or multiple-line comments
Mega-comments

Few languages have all four categories. Both full-line and end-of-line comments can be done the same way since all they need is a starting indicator since they both terminate at the end of the line. Block comments have a way to indicate the beginning and ending (the delimiters) of the comments. Thus these comments can be used for short comments in the middle of a line of code or for multiple-line comments. Mega-comments are presently rare in languages, but very useful for commenting out code with comments in the code.

In languages that have single-line comments and multiple-line comments, a careful programmer can create her own mega-comment. For example, in C++ we could only use the single line comments with the //. Then use the /* . . . */ for commenting out lines of code. This avoids the problem of nested comments.

Questions

When the Pascal language was first used the keyboards had only a limited character set, so they used (* . . . *) to indicate comments. Soon more characters were added to the keyboards and { } were added. One observant computer scientist (CS) noticed these characters, and Pascal comments were then allowed to use { comments here }. Now we only need to type one character to start a comment instead of two. Shall we give the person who suggested this the CS award of the year? Or shall we tell her or him, it was a bad change? Hint: what happens if a programmer types this single character by mistake?
Go back and read the earlier section on “Nested Comments.” In a couple of programming languages that you know, see if the compiler catches nested comments. There are two ways the compiler can find them: the start of another comment, or a statement terminator in the comment. What happens on your compiler? Do you get warnings or errors?
So now design the comment system for OPL. If you allow multiple-line comments, how will you prevent the problem of forgetting to close a comment? How will you start and terminate comments?
Bjarne Stroustrup has this interesting example of rare code[2] where C and C++ interpret the code differently due to the comments:

int b = a//* divide by 4 */4;

-a;

When comments are deleted, what is the result with C and C++. Reminder that C++ has // comments and C does not.

Should we forbid nested comments in OPL? Give some arguments for and against allowing nested comments.
Suppose we decide to allow nested comments. What problems do we need to solve? Set up a couple of examples.
Should we have some mega-comments in OPL that can be used to comment out lines of code, which may have other types of comments? Design a mega-comment for OPL.
The designer of C++, Bjarne Stroustrup, states that block comments do not nest. So what does this mean for compiler writers? Should the compiler issue a warning message and keep going, or issue a severe error and stop compiling. Try some nested comments in a couple of different languages (C++ and Java would be ok), and see what happens.
Ada only has end-of-line comments, probably to avoid run-away block comments. How do you feel about that design decision? Give some reasons for and against their decision.
JavaScript on the web uses XHTML comments to hide code from old browsers. Two dashes are used to start and end the comments. Are we allowed to use two dashes inside the JavaScript code? What happens if we have a complete XHTML comment inside the JavaScript code?

Answers:

1. My general impression is we want to use two characters to start and stop a comment to avoid the problem of starting a comment by accident. Thus /* is a nice way to start a comment but the single character ! of FORTRAN 90 is questionable. Likewise, we want to use two characters to end a comment if the end of the line does not end the comment. And we want the ending and starting symbols for comments different. But all my opinions here may be wrong.

4. In C++, when we remove comments we get:

int b = a -a;

But in C when we remove comments we get:

int b = a/4; -a;

5. If we have many lines of code we want to comment out for testing or debugging, then those lines may include comments. So one approach seems to be to allow nested comments, but have the compiler warn about the occurrences. For: 1. Can comment out code that contains comments. 2. Seems neat or elegant.

Against: 1. Source of nasty errors. 2. Not a statement (like if-then-else) that needs to nest, and many other items do not nest, such as strings.

6. We need to set up rules for ending nested comments, or we end up with dangling comment closers. The problem seems similar to solving the nesting of if-then-else statements and matching the else with the closest previous then.

7. But then XML has invented a special mega-comment used just for commenting out statements with comments. This is an interesting approach.

[1] Early computer languages used 80-column cards. Thus the language documentation states that some items have to be in column so and so FORTRAN comments start with a C in column 1) or within columns so and so (COBOL statements must be within columns 8-72). After languages when off of cards as our present day languages, the terminology changed to positions. But for this book column and position is the same thing.

[2] Stroustrup, Bjarne. The Annotated C++ Reference Manual, Reading, MA: Addison-Wesley Publishing Company. 1990. p. 6.