Best Practices for Software

Overview

The objective of this document is to provide a guideline for writing self-documenting code.

Code that is easy to understand at first sight is easier to write, expand, debug, and maintain.

Discussion is the only way this document will improve.

If you disagree with these best practices, please let us know. Reasonable discussion is in everyone's best interest

Differentiate between the essential, the important, and the desired.

There are some practices which should always be followed and some which should be taken as suggestions. Deviations from a spacing standard may be minor; naming an important system variable "q" is not.

The essential practices in this document:

Comments should explain why, not what.
Comment on the unusual.
Functions which perform an action should have that action indicated in their name.
Variable names should be specific.
Don't break the trunk.

Whitespace

Use tabs instead of spaces.

Some places use spaces, some places use tabs. Neither is better. We just happen to use tabs. Use tabs, because otherwise the code gets very hard to read in places where the tab stops aren't set perfectly.

Use spaces when aligning tables

TODO: fill in

You probably want to set your display to show four spaces per tab.

Tables and comments will look better (and the code will be easier to read) if the tabs line up on your screen like they do on everyone else's screen.

Keep it under 100 columns.

Most people use display windows that only show about 100 characters in width. Don't write comments or code that disappears off the screen, as it is hard to read. If you have a long line, continue it. (See If continuing from a previous line, indent more.)

If continuing from a previous line, indent more.

If you continue one "line" of code onto a second line, indent the second line a bit (typically, one tab stop) more than the first line. If you continue a comment onto a second line, indent the second line a bit (typically, one space) more than the first line.

if (nNumDocumentErrors > 3 || bParseFailure ||
	IsSingleStepMode())
{

// Verify that the com port settings are what the device needs.  If
//  they aren't, ask the user if we can change them.

/*
Verify that the com port settings are what the device needs.  If
 they aren't, ask the user if we can change them.
*/

Comments

Use comments liberally, but don't clutter the source with useless ones. A single good comment is better than five mediocre ones.

Comment as you code.

TODO (never have time to come back and comment code later, ideas aren't as fresh, etc.)

Comments should explain why, not what.

Comments should document decisions. A good programmer can determine what from a quick glance at the code. We've all written for loops, so if you were looking at a big for loop, would you rather see the comment "iterate through our documents" or "check each document for modifications and save if necessary"? Only comment on what when the work is uncommon or difficult to understand.

// check each document for modification and save if necessary
for (i=0; i < arDoc.size(); i++)

// note: using the rarely-seen btree_iterator here; do not confuse with
//  tree_iterator

Comment on the unusual.

If there is an unusual case or a concern you have about the code, put it in a comment. If you do something which deviates from the Best Practices or from the typical coding style, add a comment to explain why. (If you can't think of a convincing reason why, there probably isn't a reason to deviate from the well-understood norm.)

// get the length of the file name string
// note: we can't use strlen here because the bootloader doesn't yet
//  have the standard library loaded
const char *cp;
for (cp = strFileName; *cp; cp++) ;

// == on float is dangerous, but safe here due to the above case
if (fRSSI == 0.0f)

Precede function declarations with a comment describing how to use them.

A function declaration (typically in the header file) should, for all but the most trivial of functions, be prefixed by a comment. Placing a comment like this at the declaration (header file) rather than at the definition (source file) allows the comment to be most visible by the people who need it most (those who are using the function). It also allows advanced editing tools to provide the comment as a tooltip or popup when the user uses or examines that function. The comment should, at the least, contain:

the basic task the function will perform
an explanation of the return type (i.e., "returns non-zero on error")
an explanation of the parameters; if their names are very descriptive, little or no documentation should be required

// opens the given file name and adds the standard Z9 header
// fills phLogFile with the log file's handle
// returns zero on success, non-zero on error
// for explanation of the error return values, see Z9 1.4.1 (Headers)
int CreateZ9LogFile(const char *pszLogFileName, HANDLE *phLogFile);

Use a shared comment style.

If your comments are structured in a standard way, they're easy to understand and search. The recommended format is

// <type> <date> <name> - <comment>

Where <type> is one of the following words:

added
bugfix
todo - indicates work may need to be done here later
bug - indicates there is a bug here, but hasn't been fixed
warning - indicates that the following code is tricky or requires special attention
change
moved
refactored

where <date> is of the form "yyyyMMMdd", and where <name> is typically your first initial and your last name.

// change 2005jul05 jdubovsky - "empty" is from the external interface;
//  we now use the faster internal calls
// oldcode: if (pMsgIn->IsEmpty())
if (pMsg->CheckEmptyBit())

// change 2005jul05 jdubovsky - fixed firmware never reports mode 'dead',
//  so no need to slow this interrupt down with the extra state
/* oldcode:
	if (m_nReportedState == eHandsetDead)
{
	[...]
}
end oldcode */

// todo 2005jul05 jdubovsky - could probably gain large performance
//  benefits from a hashmap here

When commenting out large blocks of code, try to use block comments if possible, or #if 0 otherwise

Using block comments (/* and */) to comment out existing pieces of code is preferred over using preprocessor commands like "#if 0". Syntax-highlighting editors will show the commented-out code in a different color, making it clear that it will not be used. If the code you are commenting out already contains a block comment, you will have to use the less-viewer-friendly #if method.

It naturally follows that you should try to use line comments ("//") over block comments as much as possible, since you can't nest block comments.

When you can't use block comments, use #if 0 and a comment about why. Don't use things #ifdef NOT_DEFINED, because inevitably someone else choses the same name, someone wants to try out their code, switches that define on in the makefile, and suddenly your code is included.

#if 0      // legacy g2 code
  if (strchr(strUserName, ':'))   /* check for realm identifier */
    [...]
#endif

Comment your changes.

If you change a relatively stable source file, either to add a feature or resolve a bug, add a comment as to why the change is needed. If you are replacing a line of code that had an error, consider commenting out the old line to show the change. In all cases, try to add the date and your name to the comment.

// added 2005jul05 jdubovsky -- null messages are no longer added to
//  the message counter
if (!pMsgIn) return 0;

TODO: discuss why this should be used carefully -- the source code repository has most of this info in it

If the change relates to a QA incident, cite and summarize the QA incident in the comment.

If you do work relating to an official incident, cite and summarize the incident in your comment. This provides a link to the relevant bug reports, customer links, and other useful information. Use the standard comment style, but the beginning of your comment should be of the form "QA nnnn" (note the space) so that it can be searched.

// bugfix 2005jul05 jdubovsky - QA 1442 (User-mapped hotkeys are not
//  saved on program exit) - added call to hotkey object's save method
m_HotKeyMap.Save( &fileWorkspace );

If you are working against a spec, cite the spec and section.

If you are decoding or encoding a message, using a standardized value, formula, etc., cite the document, section number & name, and possibly message name if appropriate. Remember that section numbering can change, so constant things like names and message codes may be helpful for later users of your code.

// build up the file header
// see Z9 spec, 1.4.1 (Headers and Footers)
pFile.Write( &m_ulZ9MagicNumber, sizeof(m_ulZ9MagicNumber));

Naming - General

Choosing good names is the most important part of self-documenting programming.

Good names reduce the need for comments, make relationships clear, and use human expectations efficiently.

Use UpperCamelCase for names.

The first letter of a word is capitalized (see variable naming), use upper-case letters as word seperators, and do not use underscores.

void StartShredder();
void *pvUserArg;

Only capitalize the first letter of acronyms.

int EraseBadCdmaFrames();
unsigned int uPortFtpServer;

Naming - Variables

Variable names should be specific.

The more specific a variable name is, the easier it is to understand its role in the code. nCount indicates the variable counts something, but what? nErrorCount or nNumFailedLogins are more descriptive. Single- or few-letter variables should only be used for small-scope trivial tasks, such as iterating in a small loop.

Variable names should have a scope prefix.

Prefixing the variable with its scope helps the reader understand if the variable is persistant, visible to the whole world, etc.

for member variables (of a class), prefix the variable name with "m_".
for global variables, prefix the variable name with "g_".
for static variables, add an "s" to the prefix before the underscore (i.e., "ms_" or "gs_").

class CStorage
{
public:
	int m_nReadIndex;			// the file offset from which we came
	static int ms_nNumDebugDumps;	// num debug dumps for all CStorage
};

long g_lDebugMsgCount;	// global counter of debug messages seen

Use C99 type names for fixed-width variables.

If you need a variable to be a certain size in bits, use the C99 types defined in stdint.h

int8_t cMessageKey;
uint16_t wMessageLength;

Variables whose sizes are critical should have a Systems Hungarian type prefix.

Being able to quickly determine the variable type helps the reader understand its role and limitations. If you're using variables where size is the key to their meaning (i.e., embedded code or message processing), use Systems Hungarian. If you are using more abstract variables where size matters little, use Apps Hungarian.

Systems Hungarian prefixes for simple types:

Size/type Prefix

character ch

unsigned character uc

byte by

short s

unsigned short us

word w

(signed) integer n

unsigned integer u

long l

unsigned long ul

doubleword dw

long long (__int64) ll

quadword qw

float f

double d

C-style string (ASCII Z-string), non-const str

C-style string, const sz ("pointer to string, zero-terminated")

string class (CString, string, etc.) str

void v

pointer p (placed before the type of the item to which you are pointing)

enum (type) E

enum (value) e

(Note that some types are identical on some platforms, such as byte and unsigned character. The prefix used should be consistent with the type or typedef used and the application. For example, "BYTE byCarrierCount" and "unsigned char ucRowsUsed".)

Size/type	Prefix
character	ch
unsigned character	uc
byte	by
short	s
unsigned short	us
word	w
(signed) integer	n
unsigned integer	u
long	l
unsigned long	ul
doubleword	dw
long long (__int64)	ll
quadword	qw
float	f
double	d
C-style string (ASCII Z-string), non-const	str
C-style string, const	sz ("pointer to string, zero-terminated")
string class (CString, string, etc.)	str
void	v
pointer	p (placed before the type of the item to which you are pointing)
enum (type)	E
enum (value)	e

int nLastKnownTime;
float fRSSI;
unsigned long ulBoxWidth;
string strUserName;
const char *pszNamePrompt = "You must enter a user name.";
int *pnCurrentRow;
enum ETransportType { eCar, eTruck, ePlane };

Prefixes for aggregate types:

Type	Prefix
map	map
list	list
deque (double-ended queue)	q or deque (preferred)
queue	q or queue
array (vector)	ar

For aggregate types, the outer-most container should be left-most in the name. For example, an array ("ar") of integers ("n") would have the prefix "arn". For an array of lists, you would use "arlist". For a list of arrays, you would use "listar".

Use the singular name of the contained object in a collection variable. For instance, arnRowIndex, not arnRowIndicies. queuePosition, not queuePositions.

list<int> listnIndexOffset;
deque<CString> dequeCustomerName;
vector<queue<string>> arqstrWaitingCustomer;
list<string> *pliststrColumnName;		// (pointer to a list)
list<string *> listpstrColumnName;		// (list of pointers)

Objects do not require a type prefix, but it is often useful to use one to indicate the most basic (highest in the hierarchy) type that you intend to treat the variable as. For example, a function which processed objects derived from CStorage using only functions available at that level might take an argument of name "storageDeviceReport". This can make the code excessively verbose, so use it with care.

CStorageGpsRaw m_LastRawGps;
void PrintReadIndex(const CStorage *pstorageLoggedData);

Variable whose size isn't paramount should have an Apps Hungarian type prefix.

Being able to quickly determine the variable type helps the reader understand its role and limitations. Hungarian notation gets a bad reputation because many organizations use pure systems Hungarian, even when it offers no benefit. This leads to variable names which are far longer than they need be, which decreases readability. Systems hungarian indicates the data type of the variable; Apps Hungarian indicates the semantics of it.

bool GetLastPanic(char *strBuf, int nSizeBuf);   // does szBuf count characters or bytes?
bool GetLastPanic(char *strBuf, int nMaxCharsInBuf);  // count of chars in buffer -- clear
                                                      //  and acceptable, if a bit long
bool GetLastPanic(char *strBuf, int cbBuf);      // count of bytes in buffer -- very clear
bool GetLastPanic(char *strBuf, int cchBuf);     // count of chars in buffer -- very clear

Example Apps Hungarian prefixes:

Type	Prefix
count	c
row	row or rw
byte	b
character (any width)	ch

Naming - Classes

Class names start with their base class.

The leftmost name in a class name should be the class from which the new class is derived. It naturally follows from this that, left-to-right, the class name should give a full and ordered picture of the derivation of the object.

class CQueue { ... };
class CQueueSecure : public CQueue { ... };
class CQueueSecureFast : public CQueueSecure { ... };

Don't abbreviate the base class.

When including the base class name in a new class's name, do not abbreviate the base class. Keeping the full name makes it easier to see the derivation hierarchy in the class view, file listing, and code.

Good: CDocSource, CDocSourceJava, CDocSourceC

Bad: CDocSource, CDocSrcJava, CDocSrcC

A class's file name should be its class name, minus any class ("C") prefix.

If a class is declared and implemented in files names which are identical (save the "C" class prefix) to the class name, it is easier, when dealing with that class, to find the relevant files.

// file DocSourceJava.h
class CDocSourceJava : public CDocSource

// file DocSourceJava.cpp
CDocSourceJava::CDocSourceJava()

This necessarily requires only having one class per source file, which is a good thing.

Naming - Functions

Functions which perform an action should have that action indicated in their name.

The typical form is a verb followed by a noun or phrase. Be specific. For example: CreateFileDescriptor, PromptUserForDirectory. Generic words such as do and run are generally not good action words. Choosing good function names is second only to commenting in making code easy to understand.

The function should do what its name says and its name should say what it does. If you change the nature of a function sufficiently much that the name no longer meets this condition, rename the function. If the resulting name would be too complicated, that's a good indication you should split the code into two or more functions. If you cannot rename the function because of base-class limitations, it's almost certain that you're making the function do too much.

Attribute set and get functions should have simple names.

Functions which get or set an attribute should be of the form "Get" or "Set" followed by the attribute name. Examples: GetCurrentDirectory, SetUserName. If the function does more than a simple get or set (that is, it initiates a significant action based on the set call), the function is an action function, and the name should reflect that.

Function parameters should follow the normal variable naming conventions.

See Naming - Variables.

Use the const parameter modifier for all function inputs.

If your function uses a parameter only as an input (doesn't modify the object), declare it as const in the parameter list. This makes it obvious what is an input and what is an output, and also provides some insurance against accidentally modifying an input value.

void DisplayAndLog(const char *pszLogFile, const char *pszMessage);
float ComputeRangeInKM(const float fLatitude1, const float fLatitude2);
void CreateRestorePoint(const int nUserIndex, char *strRestorePointName);

Use the const function modifier appropriately but generously.

"Be liberal in what you accept, and conservative in what you produce." If your function is a member function and doesn't modify anything about the class in which it resides (example: an attribute get functions), declare the function as const.

class CStorage
{
	int m_nReadIndex;			// the file offset from which we came
public:
	// return the file offset from which this object was created
	int GetReadIndex() const;
};

If your function and all the ones you are calling really are const, and yet due to poor programming practices one of the functions you're calling wasn't declared const, do the following: if you double-check the functions you are calling and they do not (can not) modify the class, use the const_cast template function to temporarily cast away the const on this when you call it. It's ugly, but the appropriate thing to do. Do not simply remove the const attribute from your function, as it simply propagates the mess. If there is a reasonable chance that the functions you are calling really aren't const, you have to remove the const attribute from your function. (Remember that const and non-const functions are different beasts, and you cannot implement a virtual base class function with one of a different const-ness.)

const char *CWinderToolbar::GetWindowText() const
{
	// note: const_cast because the CWinder class forgot their consts
	return const_cast<CWinderToolbar *>(this)->(GetOwner()->GetText());
}

Documentation

If you change a message, update the ICD.

Any change which alters the contents or meaning of a message should be reflected in the ICD. Examples of such changes are adding new fields, removing old fields, adding values to an existing field, and changing the interpretation of fields. The check-in of the updated ICD should happen at the same time as the check-in of the code (see Atomic Check-ins).

Source Code Management

Don't break the trunk.

If you check in code which doesn't compile to the trunk / head, crashes the product, or has annoying side effects, the work of the entire team comes to a halt. If you had to do any merges, verify (compile and test) them before checking them in. As a general rule, do a diff before every check in and look at the results. [TODO: the last person to check in a broken file has to baby-sit the regular builds?]

Your check-in comments should be detailed, exhaustive, and readable.

Explain all the changes you made and why (see more about good comments in Comments.). Someone should be able to view your check-in comment and know exactly what you added, changed, or removed, and why, without having to go run a diff on the code. Make the comments as close to publishable English as you can.

Bad: "minor change to juicer"

Good: "Juicer module modified to use xfruit class instead of kfruit; faster, thread-safe, and more portable."

If your check-in relates to an official QA incident, cite and summarize the incident in your comment.

If you fix the bug described in QA incident 402 which was about the software's inability to recognize certain odd classes of farm animals, your check-in comment should start with state the incident number and a quick summary of the bug. This greatly reduces the work it takes to track bugs and generate release notes.

Example: "QA 402 (one-horned goats not recognized as animals) - image recognition thread upgraded to v2.1 of AniFind API, which correctly handles one-horned goats."

Your check-ins should be atomic.

One check-in operation should represent one feature, bugfix, etc. No more, no less. Inevitably, when you put two bugfixes in one check-in, one of them will need to be merged into a branch and the other won't. Inevitably, when you have one bugfix but check in the two changed files separately, someone did a Get Latest in the short moment between the two and now has broken code. Keep your check-ins separate but complete.

Only check in things which are unique information.

Check in source code, documentation, test projects, design flowcharts, and anything that is unique information. Don't check in files which are automatically (or can be easily) generated from other files. Don't check in compiled binaries. Don't check in copies of files - use your SCM tool's Share ability, instead. The same thing in two places just begs to get out of sync.

Avoid binary files if there is a text equivalent available.

Try to store documentation in a text format (RTF/XML for Word Documents, etc.). SCM tools are designed to understand text source files, and while they can handle binary files, many of their high-leverage abilities (such as diff) are made useless.

Check in anything original.

If you write some test code to experiment with something, check it into a codeline reserved for exploratory work. There's little harm in archiving code, and often that code is useful later.

Don't be afraid of having to merge changes.

Good SCM tools make merges fairly painless, most of the time. Non-exclusive locks or CVS mode (edit-merge-commit) changes allow developers to work without tripping over one another. The bigger the team gets and the more they focus in one place, the more expensive exclusive locks become. Learn when you need an exclusive lock and when you don't.

Propagating Changes

TODO: come up with a better title for this one

Make original changes in the branch that has evolved the least since branching.

"It is much easier to merge a change from a file that is close to the common ancestor than it is to merge a change from a file that has diverged considerably. This is because the change in the file that has diverged may be built upon changes that are not being propagated, and those unwanted changes can confound the merge process. You can minimize the merge complexity by making original changes in the branch that is the most stable. For example, if a release codeline is branched from a mainline, make a bug fix first in the release line and then merge it into the mainline. If you make the bug fix in the mainline first, subsequently merging it into a release codeline may require you to back out other, incompatible changes that aren't meant to go into the release codeline." (footnote i)

Propagate branch changes early and often.

"When it's feasible to propagate a change from one branch to another (that is, if the change wouldn't violate the target branch's policy), do it sooner rather than later. Postponed and batched change propagations can result in stunningly complex file merges." (footnote ii)

Get the right person to do the merge.

"The burden of change propagation can be lightened by assigning the responsibility to the engineer best prepared to resolve file conflicts. Changes can be propagated by (a) the owner of the target files, (b) the person who made the original changes, or (c) someone else. Either (a) or (b) will do a better job than (c)." (footnote iii)

Stay in sync with the source code repository.

Do a Get Latest / Update as often as possible without interrupting your work. You're more likely to notice bugs and detect conflicts with the code your writing. You will also have less trouble merging changes, should the need arise. At a minimum, do a full Get Latest once a week.

Check in your work as often as possible without breaking the tree or violating other best practices. This is more important when working in CVS mode (edit-merge-commit), but still good policy in checkout-edit-checkin systems for the reasons of detecting conflicts.

Never destroy history.

Don't use destroy commands. Don't promote labels. The history is as valuable as the source code. The best way to fix a bad check-in is to check in the proper code on top of it. The best way to fix a bad label is to re-label the code with a different (unique) label. Exception: if you do something really silly like accidentally check in a gig of windows temporary files, contact your SCM administrator and they can worry about destroying the files.

Don't be afraid to use labels.

Labels are fast and don't take much repository space. Use them to mark change points (you're about to check in a bunch of files which may break something, etc.) and releases. Make the label name concise and the label description detailed, exhaustive, and readable (see above).

Use exclusive check-outs judiciously, and don't keep files checked out for long periods.

A check-out discourages others from modifying a file, which may be a good thing. However, if the changes you are making are for test purposes, could be easily merged, etc., consider using a non-exclusive checkout. If you go on vacation, release your exclusive locks. Better to have to merge your changes than to prevent the rest of the team from working.

Branch when (but only when) necessary.

Branches allow work to continue when policy changes might otherwise prevent it. However, to quote Eric Sink, "don't create a branch unless you are willing to take care of it. A branch is like a puppy."iv There are responsibilities associated with maintaining a branch, and that takes resources. In general, branch when you encounter a conflict in policies, such as when you need to restrict check-ins due to an impending release but part of the team needs to continue development. The typical branch scenarios are:

release of the code
experimental or subteam projects - the changes can be made in a dev branch and merged into the trunk when complete. This gives the programmers all the benefits of SCM but doesn't interfere with the rest of the team's work.

For release branches, it is generally better to branch as late as possible without stalling other development. The later the branch, the fewer merges will be required.

Give each codeline a policy and an owner.

For each codeline, define what is acceptable to be checked in, built, etc. A development codeline should never be released. A release codeline should never be used for development. The owner of the codeline resolves ambiguities, documents irregularities, and has a vested interest in keeping the codeline smooth. Consider having your SCM tool email you with check-in notifications once a codeline goes into strict supervision (typically, for a release).

Backup the repository often.

Full backups at least once a week. Incremental backups daily. At least one recent full backup should be kept off-site. Occasionally test the backups to see if they actually work.

Builds

Use a well-defined, complete, repeatable build process.

A build should be reproducible. This means that everything in the build process, from how to get the code to what tools to use, should be documented. An official build should never be done without completely following the build process. If the process can be automated in a build script, do so to avoid errors.

Build often.

Frequent builds help ensure the codeline can be compiled (no one has checked in any bad files, etc.). These builds can also be used for impromptu testing by developers, previews for documentation work, and other such useful things. Again, if the build process can be automated, consider setting up a computer to build the product daily, around noon. (Noon because, if there are compile errors, it is easier to get them fixed while people are still at work.)

Archive all build results as part of the product.

The results of the build process should be archived as part of the build results. Compiler log outputs, install generator logs, map files, debug symbols, etc., all need to be stored.

Reading Material

Meyer's (?) wolf books, 1 and 2

Possible Later Additions

"Give everything an owner. Every process, policy, document, product, component, codeline, branch, and task in your SCM system should have an owner. Owners give life to these entities by representing them; an entity with an owner can grow and mature. Ownerless entities are like obstacles in an ant trail - the ants simply march around them as if they weren't there."

"Use living documents. The policies and procedures you implement should be described in living documents; that is, your process documentation should be as readily available and as subject to update as your managed source code. Documents that aren't accessible are useless; documents that aren't updateable are nearly so. Process documents should be accessible from all of your development environments: at your own workstation, at someone else's workstation, and from your machine at home. And process documents should be easily updateable, and updates should be immediately available."

TODO: find my footnotes for the above quotes!

Beware pass-by-reference -- they're absolutely neccessary sometimes, but careless use often results in side-effects when people call your code and don't expect an argument's value to change mid-call. Don't avoid pointers just because references offer simpler syntax -- use references where needed, but pointers elsewhere.

Typedefs (and structs?) should have a trailing _t

#define constants and macros are all uppercase, underscores for seperators

Use .cpp and .h file extensions (and preprocessor directives in the h file if cpp/c compatibility is needed).

comments should be extractable by cppdoc / doxygen.

declare a private assignment operator & copy c'tor if you don't want to write one.

destructors should always be virtual

fix all compiler warnings -- they're there for a reason

prefer static_cast and reinterpret_cast over hard typecasts

be responsible for your code, but don't become attached to it

don't change lines just for the sake of formatting -- it fouls up the usefulness of source code control

if you deliberately fall through one case statement into the next, put a comment to that effect

avoid static / global objects that require construction -- order of construction and how it is handled is neither simple nor pleasant

no significant constants in the source code -- use (in order of decreasing preference) const int, enum, or #define instead.

prefer inline functions over macros -- just as fast, easier to debug, and have fewer possible side-effects

avoid globals and statics unless absolutely neccessary

formatting test code {
	int a;
	x = 5 + 7;
};

Table of Contents