Hungarian Naming Convention
by Jerry Huston
Hungarian Naming Convention can be used for variable or identifier
naming in languages such as C, C++, Pascal, Fortran etc. It was first
used commercially at Microsoft from Charles Simonyie PHD thesis.
Its use makes an identifier both state its type and imply its usage in
the program. It makes code much easier to follow for the
original programmer at a later date, and easier for more than one
person to work with the code at any time.
SFÝÿCould you give a few more examples of Hungarian notation?
From Jerry Huston
Sure. I'll show you some examples of official card-carrying
Hungarian notation, and examples of my usual interpretation of
it. (I don't want to start any message wars over what's REAL
Hungarian... I know the difference, but prefer my own variation.)
Official:
The standard types are...
f a boolean flag.
ch a one-byte character.
st pascal-type string (also sometimes sc).
sz zero-terminated string.
fn function.
The standard generic types are...
w a word, usually 16 bits.
b a byte, usually 8 bits.
l a long, usually 32 bits.
u an unsigned word, usually 16 bits.
bit a single bit.
v a void. Usually used with the prefix p, as pointer to
void is a meaningful variable type, whereas void isn't.
Some special types...
env an environment.
sb segment base.
ib an offset (combination of index prefix and byte type).
The standard prefixes are...
p a pointer. Not a type itself, but an operation applied
to a type. pch would be a pointer to a character.
lp a 32-bit far pointer.
hp a huge pointer.
np a near pointer
rg an array (considering it a group that's the range of a
mathematical function).
i index into an array.
c count, such as cch, the first byte in an st.
d difference (between two instances of a type).
h handle, often a pointer to a pointer.
hh huge handle, a huge pointer to a 16-bit pointer within
the same segment pointed to by the huge pointer.
gr group, or pointer to it. A group is different from an
array, in that a group may contain different-sized
objects. Could be a linked-list.
b an offset, typically used in conjunction with a gr,
since an index (i) would be inappropriate for anything
but an array.
mp an array. An abbreviation for map, since an array is a
mapping of the index to the value stored at that place.
dn domain. Used in the rare case when the important part
of the array mapping is the index, not the contents.
e element of an array.
f a bit within a type. Typically used to store one or
more bit flags in unused positions within an integer.
sh a shift amount. Specifies the location of a bit within
a type by the bit number, rather than the bit mask that
the f specifies.
u a union.
a allocation. Distinguishes between an array and a
pointer to it. sz would be a string, asz would be the
allocated space that it's stored in.
v a global.
Some Examples...
pch pointer to a character.
ich index into an array of characters.
rgst an array of pascal-type strings.
bst offset to a particular pascal-type string in a grst.
phpx a near pointer to a huge pointer to an object of
type x.
pich a near pointer to an index into a character array.
en a base type, such as an entry.
hrgn handle to a region.
dx length of a horizontal line (difference between two
x's.
mpmipfn an array of pointers to functions, indexed by
mi's, where mi might be a menu item.
rgrgx two dimensional array of x's (array of arrays).
pv pointer to void (such as an argument to free()).
hrgch huge pointer to an array of characters.
Fortunately, my own programs and functions are seldom large
enough, with enough different variables, for me to need such an
elaborate scheme to keep track of identifiers. I haven't worked
with the Gospel according to Hungarian enough to look at
something like pgrxchDomPev and be able to say, "Of course,
obvously that's a ..."
So I use a modified, and quite simplified form, that uses some of
the basic tenets of Hungarian notation. Since I work entirely in
C or C++, and write programs only for PCs these days, I use
abbreviations of the C data types rather than the standard
Hungarian type designations.
For example, lowercase i to indicate an integer, not w for word.
I use lowercase c to indicate a character, not b for byte. Some
of the official Hungarian designations do fit my needs, such as u
for unsigned, and sz for zero terminated string.
I tend to spell out things that Hungarian would indicate with a
one-character modifier. If I were working in a function with two
pointers to chars (for example converting a Pascal-type string to
a C-type string) I might call them pcString1 and pcString2, and
use an offset called iOffset (which would allow converting
strings that used one or more bytes to store the length of the
Pascal string). Thus, the assignment in that function might look
like,
*pcCString++ = *(pcPasString++ + iOffset);
I think the most important thing is to standardize on something
that makes sense. Your personal method, or your departmental or
company method may not match the one used in a given apps group
at Microsoft, but it can still be better than an undisciplined
approach that doesn't imply at all what the variable is.
A student one time told me about a huge maintenance project that
he inherited when he took a job that had been vacated on short
notice. The woman who had written the application in the first
place had used the names of all her kids, then her nieces and
nephews for variable and function names. When she needed more,
she used names of current and prevous pets. Compared to that,
even my loose interpretation of Hungarian is quite structured!
By the way, the name comes from Charles Simonyie, who works for
Microsoft. He wrote a thesis on programmer productivity, a small
part of which was his variable naming convention. Friends first
called it "Reverse Hungarian Notation," as a play on the fact
that Charles is of Hungarian descent, and of course on the famous
RPN notation of HP's. Eventually that just got shortened to
"Hungarian Convention" or even just "Hungarian."
Some of the developer groups at Microsoft use a much purer form
of Hungarian notation, with prescribed types and modifiers
much more strictly enforced.
(...and some there don't use it much at all.)
In Charles' original thesis, he felt it better to use variable
names that don't infer the data type names used in a particular
language, and I don't quite go along with that for my own use.
For example, instead of the prefixes b and w to indicate byte and
word, I much prefer using c and i to imply char and int. That's
perhaps more natural for me, because I do nearly all my work in
C++ and C.
Also, I tend to be a bit more descriptive about how a particular
datum fits into the function itself, rather than how it relates
to other data -- whether it's a range or a domain, for example.
But the basic *idea* of Hungarian is a great one... that of
making an identifier both state its type and imply its usage in
the program. It makes code much easier to follow for the
original programmer at a later date, and easier for more than one
person to work with the code at any time.
|