FAQs in section [36]:
- [36.1] What's this "serialization" thing all about?
- [36.2] How do I select the best serialization technique?
- [36.3] How do I decide whether to serialize to human-readable ("text") or non-human-readable ("binary") format?
- [36.4] How serialize/unserialize simple types like numbers, characters, strings, etc.?
- [36.5] How exactly do I read/write simple types in human-readable ("text") format?
- [36.6] How exactly do I read/write simple types in non-human-readable ("binary") format?
- [36.7] How do I serialize objects that aren't part of an inheritance hierarchy and that don't contain pointers to other objects?
- [36.8] How do I serialize objects that are part of an inheritance hierarchy and that don't contain pointers to other objects?
- [36.9] How do I serialize objects that contain pointers to other objects, but those pointers form a tree with no cycles and no joins?
- [36.10] How do I serialize objects that contain pointers to other objects, but those pointers form a tree with no cycles and only "trivial" joins?
- [36.11] How do I serialize objects that contain pointers to other objects, and those pointers form a graph that might have cycles or non-trivial joins?
- [36.12] Are there any caveats when serializing / unserializing objects?
- [36.13] What's all this about graphs, trees, nodes, cycles, joins, and joins at the leaves vs. internal nodes?
[36.1] What's this "serialization" thing all about?
It lets you take an object or group of objects, put them on a disk or send
them through a wire or wireless transport mechanism, then later, perhaps on
another computer, reverse the process: resurrect the original object(s). The
basic mechanisms are to flatten object(s) into a one-dimensional stream of
bits, and to turn that stream of bits back into the original object(s).
Like the Transporter on Star Trek, it's all about taking something complicated
and turning it into a flat sequence of 1s and 0s, then taking that sequence of
1s and 0s (possibly at another place, possibly at another time) and
reconstructing the original complicated "something."
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.2] How do I select the best serialization technique?
There are lots and lots (and lots) of if's, and's and but's, and in reality
there are a whole continuum of techniques with lots of dimensions. Because I
have a finite amount of time (translation: I don't get paid for any of this),
I've simplified it to a decision between
using human-readable ("text") or non-human-readable ("binary") format,
followed by a list of five techniques arranged more-or-less in increasing
order of sophistication.
You are, of course, not limited to those five techniques. You will probably
end up mixing ideas from several techniques. And certainly you can always use
a more sophisticated (higher numbered) technique than is actually needed. In
fact it might be wise to use a more sophisticated technique than is minimally
needed if you believe future changes will require the greater sophistication.
So think of this list merely as a good starting point.
There's a lot here, so get ready!
- Decide between human-readable
("text") and non-human-readable ("binary") formats. The tradeoffs are
non-trivial. Later FAQs show how to write simple
types in text format and how to write
simple types in binary format.
- Use the least sophisticated
solution when the objects to be serialized aren't part of an inheritance
hierarchy (that is, when they're all of the same class) and when they don't
contain pointers to other objects.
- Use the second level of
sophistication when the objects to be serialized are part of an
inheritance hierarchy, but when they don't contain pointers to other
objects.
- Use the third level of
sophistication when the objects to be serialized contain pointers to
other objects, but when those pointers form a
tree with no
cycles and no
joins.
- Use the fourth level of
sophistication when the objects to be serialized contain pointers to
other objects, and when those pointers form a
graph that with no
cycles, and with
joins at the leaves only.
- Use the most sophisticated solution when
the objects to be serialized contain pointers to other objects, and when those
pointers form a graph that might have
cycles or
joins.
Here's that same information arranged like an algorithm:
- The first step is to make an
eyes-open decision between text- and binary-formats.
- If your objects aren't part of an inheritance hierarchy and don't contain
pointers, use solution #1.
- Else if your objects don't contain pointers to other objects,
use solution #2.
- Else if the graph of pointers within
your objects contain neither cycles nor
joins,
use solution #3.
- Else if the graph of pointers within
your objects don't contain cycles and if the
only joins are to terminal (leaf) nodes,
use solution #4.
- Else use solution #5.
Remember: feel free to mix and match, to add to the above list, and,
if you can justify the added expense, to use a more sophisticated
technique than is minimally required.
One more thing: the issues of inheritance and of pointers within the objects
are logically unrelated, so there's no theoretical reason for #2 to be any
less sophisticated than #3-5. However in practice it often (not always) works
out that way. So please do not think of these categories as somehow sacred
they're somewhat arbitrary, and you are expected to mix and match the
solutions to fit your situation. This whole area of serialization has
far more variants and shades of gray than can be covered in a few
questions/answers.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.3] How do I decide whether to serialize to human-readable ("text") or non-human-readable ("binary") format?
Carefully.
There is no "right" answer to this question; it really depends on your goals.
Here are a few of the pros/cons of human-readable ("text") format
vs. non-human-readable ("binary") format:
- Text format is easier to "desk check." That means you won't have to
write extra tools to debug the input and output; you can open the serialized
output with a text editor to see if it looks right.
- Binary format typically uses fewer CPU cycles. However that is
relevant only if your application is CPU bound and you intend to do
serialization and/or unserialization on an inner loop/bottleneck. Remember:
90% of the CPU time is spent in 10% of the code, which means there won't be
any practical performance benefit unless your "CPU meter" is pegged at 100%,
and your serialization and/or unserialization code is consuming a healthy
portion of that 100%.
- Text format lets you ignore programming issues like sizeof
and little-endian vs. big-endian.
- Binary format lets you ignore separations between adjacent values,
since many values have fixed lengths.
- Text format can produce smaller results when most numbers are small
and when you need to textually encode binary results, e.g., uuencode or
Base64.
- Binary format can produce smaller results when most numbers are
large or when you don't need to textually encode binary results.
You might think of others to add as well... The important thing to remember
is that one size does not fit all make a careful decision here.
One more thing: no matter which you choose, you might want to start each file
/ stream with a "magic" tag and a version number. The version number would
indicate the format rules. That way if you decide to make a radical change in
the format, you hopefully will still be able to read the output produced by
the old software.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.4] How serialize/unserialize simple types like numbers, characters, strings, etc.?
The answer depends on your decision
regarding human-readable ("text") format vs. non-human-readable ("binary")
format:
The primitives discussed in those FAQs will be needed for most of the other
FAQs in this section.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.5] How exactly do I read/write simple types in human-readable ("text") format?
Before you read this, make sure to
evaluate all the tradeoffs between
human-readable and non-human-readable formats. The tradeoffs are
non-trivial, so you should resist a knee-jerk reaction to do it the way you
did it on the last project one size does not fit all.
After you have made an eyes-open decision to use human-readable ("text")
format, you should remember these keys:
- You probably want to use iostream's >> and << operators
rather than its read() and write() methods. The >>
and << operators are better for text mode, whereas read() and
write() are better for binary mode.
- When storing numbers, you'll probably want to add a separator to prevent
items from running together. One simple approach is to always add a space
(' ') before each number, that way the number 1 followed by
the number 2 won't run together and look like a 12. Since the
leading space will automatically get soaked up by the >> operator, you
won't have to do anything explicit to extract the leading space in the code
that reads things back in.
- String data is tricky because you have to unambiguously know when the
string's body stops. You can't unambiguously terminate all strings with a
'\n' or '"' or even '\0' if some string might contain
those characters. You might want to use C++ source-code escape-sequences,
e.g., writing '\' followed by 'n' when you see a newline, etc.
After this transformation, you can either make strings go until end-of-line
(meaning they are deliminated by '\n') or you can delimit them with
'"'.
- If you use C++-like escape-sequences for your string data, be sure to
always use the same number of hex digits after '\x' and '\u'.
I typically use 2 and 4 digits respectively. Reason: if you write a smaller
number of hex digits, e.g., if you simply use stream <<
"\\x" << hex << unsigned(theChar),
you'll get errors when the next character in the string happens to be a hex
digit. E.g., if the string contains '\xF' followed by 'A',
you should write "\x0FA", not "\xFA".
- If you don't use some sort of escape sequence for characters like
'\n', be careful that the operating system doesn't mess up your string
data. In particular, if you open a std::fstream without
std::ios::binary, some operating systems translate end-of-line
characters.
- Another approach for string data is to prefix the string's data with an
integer length, e.g., to write "now is the time" as 15:now is the
time. Note that this can make it hard for people to read/write the file,
since the value just after that might not have a visible separator, but you
still might find it useful.
Please remember that these are primitives that you will need to use in the
other FAQs in this section.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.6] How exactly do I read/write simple types in non-human-readable ("binary") format?
Before you read this, make sure to
evaluate all the tradeoffs between
human-readable and non-human-readable formats. The tradeoffs are
non-trivial, so you should resist a knee-jerk reaction to do it the way you
did it on the last project one size does not fit all.
After you have made an eyes-open decision to use non-human-readable ("binary")
format, you should remember these keys:
- Make sure you open the input and output streams using
std::ios::binary. Do this even if you are on a Unix system since it's
easy to do, it documents your intent, and it's one less non-portability to
locate and change down the road.
- You probably want to use iostream's read() and write()
methods instead of its >> and << operators. read()
and write() are better for binary mode; >> and << are
better for text mode.
- If the binary data might get read by a different computer than the one
that wrote it, be very careful about endian issues (little-endian
vs. big-endian) and sizeof issues. The easiest way to handle this is
to anoint one of those two formats as the official "network" format, and to
create a header file that contains machine dependencies (I usually call it
machine.h). That header should define inline functions like
readNetworkInt(std::istream& istr) to read a "network int,"
and so forth for reading and writing all the primitive types. You can define
the format for these pretty much anyway you want. E.g., you might define a
"network int" as exactly 32 bits in little endian format. In any
case, the functions in machine.h will do any necessary endian
conversions, sizeof conversions, etc. You'll either end up with a
different machine.h on each machine architecture, or you'll end up
with a lot of #ifdefs in your machine.h, but either way, all
this ugliness will be buried in a single header, and all the rest of your code
will be clean(er). Note: the floating point differences are the most subtle
and tricky to handle. It can be done, but you'll have to be careful with
things like NaN, over- and under-flow, #bits in the mantissa
or exponent, etc.
- When space-cost is an issue, such as when you are storing the serialized
form in a small memory device or sending it over a slow link, you can compress
the stream and/or you can do some manual tricks. The simplest is to store
small numbers in a smaller number of bytes. For example, to store an unsigned
integer in a stream that has 8-bit bytes, you can hijack the 8th bit of each
byte to indicate whether or not there is another byte. That means you get 7
meaningful bits/byte, so 0...127 fit in 1 byte, 128...16384 fit in 2 bytes,
etc. If the average number is smaller than around half a billion, this will
use less space than storing every four-byte unsigned number in four 8-bit
bytes. There are lots of other variations on this theme, e.g., a sorted array
of numbers can store the difference between each number, storing extremely
small values in unary format, etc.
- String data is tricky because you have to unambiguously know when the
string's body stops. You can't unambiguously terminate all strings with a
'\0' if some string might contain that character; recall that
std::string can store '\0'. The easiest solution is to write
the integer length just before the string data. Make sure the integer length
is written in "network format" to avoid sizeof and endian problems
(see the solutions in earlier bullets).
Please remember that these are primitives that you will need to use in the
other FAQs in this section.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.7] How do I serialize objects that aren't part of an inheritance hierarchy and that don't contain pointers to other objects?
This is the least sophisticated problem, and not surprisingly, it is also the
least sophisticated solution:
- Every class should handle its own serialization and unserialization. You
will typically create a member function that serializes the object to some
sink (such as a std::ostream), and another that allocates a
new object, or perhaps changes an existing object, setting the member
data based on what it reads from some source (such as a
std::istream).
- If your object physically contains another object, e.g., a Car
object might have a member variable of type Engine, the outer object's
serialize() member function should simply call the appropriate
function associated with the member object.
- Use the primitives described earlier to read/write the simple types in
text or
binary format.
- If a class's data structure might change someday, the class should write
out a version number at the beginning of the object's serialized output. The
version number simply represents the serialized format; it should
not get incremented simply when the class's behavior changes. This
means the version numbers don't need to be fancy they usually don't need a
major and minor number.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.8] How do I serialize objects that are part of an inheritance hierarchy and that don't contain pointers to other objects?
Suppose you want to serialize a "shape" object, where Shape is an
abstract class with derived classes Rectangle, Ellipse,
Line, Text, etc. You would declare a
pure virtual function
serialize(std::ostream&) const within class Shape, and make
sure the first thing done by each override is to write out the class's
identity. For example, Ellipse::serialize(std::ostream&) const would
write out the identifier Ellipse (perhaps
as a simple string, but there are several
alternatives discussed below).
Things get a little trickier when unserializing the object. You typically
start with a static member function in the base class such as
Shape::unserialize(std::istream& istr). This is declared to return a
Shape* or perhaps a smart pointer such as
Shape::Ptr. It reads the
class-name identifier, then uses some sort of creational pattern to
create the object. For example, you might have a table that maps from the
class name to an object of the class, then use the Virtual
Constructor Idiom to create the object.
Here's a concrete example: Add a pure virtual
method create(std::istream&) const within base class Shape,
and define each override to a one-liner that uses new to allocate an
object of the appropriate derived class. E.g.,
Ellipse::create(std::istream& istr) const would be { return new
Ellipse(istr); }. Add a static
std::map<std::string,Shape*> object that maps from the class name to a
representative (AKA prototype) object of the appropriate class; e.g.,
"Ellipse" would map to a new Ellipse(). Function
Shape::unserialize(std::istream& istr) would
read the class-name, throw an exception if
it's not in the map (if (theMap.count(className) == 0) throw
...something...), then look up the associated Shape* and call
its create() method: return theMap[className]->create(istr).
The map is typically populated during static initialization. For example, if
file Ellipse.cpp contains the code for derived class Ellipse,
it would also contain a static object whose ctor adds that class to
the map: theMap["Ellipse"] = new Ellipse().
Notes and caveats:
- It adds a little flexibility if Shape::unserialize() passes the
class name to the create() method. In particular, that would let a
derived class be used with two or more names, each with its own "network
format." For example, derived class Ellipse could be used for both
"Ellipse" and "Circle", which might be useful to save space in
the output stream or perhaps other reasons.
- It's usually easiest to handle errors during unserialization by throwing
an exception. You can return NULL if you want, but you will need to
move the code that reads the input stream out of the derived class' ctors into
the corresponding create() methods, and ultimately the result is often
that your code is more complicated.
- You must be careful to avoid the static
initialization order fiasco with the map used by
Shape::unserialize(). This normally means using the
Construct On First Use Idiom for
the map itself.
- For the map used by Shape::unserialize(), I personally prefer the
Named Constructor Idiom over the
Virtual Constructor Idiom it simplifies a few
steps. Details: I usually define a typedef within Shape such
as typedef Shape* (*Factory)(std::istream&). This means
Shape::Factory is a "pointer to a function that takes a
std::istream& and returns a Shape*." I then define the map as
std::map<std::string,Factory>. Finally I populate that map using
lines like theMap["Ellipse"] = Ellipse::create (where
Ellipse::create(std::istream&) is now a
static member function of class
Ellipse, that is, the Named Constructor
Idiom). You'd change the return value in function
Shape::unserialize(std::istream& istr) from
theMap[className]->create(istr) to
theMap[className](istr).
- If you might need to serialize a NULL pointer, it's usually easy
since you already write out a class identifier so you can just as easily write
out a pseudo class identifier like "NULL". You might need an
extra if statement in Shape::unserialize(), but if you chose
my preference from the previous bullet, you can eliminate that special case
(and generally keep your code clean) by defining static member
function Shape* Shape::nullFactory(istream&) { return NULL; }.
You add that function to the map as any other: theMap["NULL"] =
Shape::nullFactory;.
- You can make the serialized form smaller and a little faster if you
tokenize the class name identifiers. For example, write a class name only the
first time it is seen, and for subsequent uses write only a corresponding
integer index. A mapping such as std::map<std::string,unsigned>
unique makes this easy: if a class name is already in the map, write
unique[className]; otherwise set a variable unsigned n =
unique.size(), write n, write the class name, and set
unique[className] = n. (Note: be sure to copy it into a
separate variable. Do not say unique[className] =
unique.size()! You have been warned!) When unserializing, use
std::vector<std::string> unique, read the number n, and if
n == unique.size(), read a name and add it to
the vector. Either way the name will be unique[n]. You can
also pre-populate the first N slots in these tables with the N
most common names, that way streams won't need to contain any of those
strings.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.9] How do I serialize objects that contain pointers to other objects, but those pointers form a tree with no cycles and no joins?
Before we even start, you must understand that the word "tree" does
not mean that the objects are stored in some sort of tree-like data
structure. It simply means that your objects point to each other, and the
"with no cycles" part means if you keep following pointers from one object to
the next, you never return to an earlier object. Your objects aren't "inside"
a tree; they are a tree. If you don't understand that, you
really should read the lingo FAQ
before continuing with this one.
Second, don't use this technique if the
graph might someday contain
cycles or
joins.
Graphs with neither cycles nor joins are very common, even with "recursive
composition" design patterns like Composite or
Decorator. For example, the objects representing an XML document or an
HTML document can be represented as a graph without joins or cycles.
The key to serializing these graphs is to ignore a node's identity and
instead to focus only on its contents. A (typically recursive)
algorithm dives through the tree and writes the contents as it goes. For
example, if the current node happens to have an integer a, a pointer
b, a float c, and another pointer d, then you first
write the integer a, then
recursively dive into the child pointed to by b, then
write the float c, and finally
recursively dive into the child pointed to by d. (You don't have to
write/read them in the declaration order; the only essential rule is that the
reader's order is consistent with the writer's order.)
When unserializing, you need a constructor that takes a std::istream&.
The constructor for the above object would read
an integer and store the result in a, then would allocate an
object to be stored in pointer b (and will pass the
std::istream to the constructor so it too can read the stream's
contents), read a float into
c, and finally will allocate an object to be stored in pointer
d. Be sure to use smart pointers
within your objects, otherwise you will get a
leak if an exception is thrown while reading anything but the first
pointed-to object.
It is often convenient to use the Named Constructor
Idiom when allocating these objects. This has the advantage that you
can enforce the use of smart pointers. To do
this in a class Foo, write a static method such as FooPtr
Foo::create(std::istream& istr) { return new Foo(istr); } (where
FooPtr is a smart pointer to a Foo). The alert reader will
note how consistent this is with the technique discussed in
the previous FAQ the two techniques
are completely compatible.
If an object can contain a variable number of children, e.g., a
std::vector of pointers, then the usual approach is to
write the number of children just before
recursively diving into the first child. When unserializing, just
read the number of children, then use a
loop to allocate the appropriate number of child objects.
If a child-pointer might be NULL, be sure to handle that in both the
writing and reading. This shouldn't be a problem
if your objects use inheritance; see
that solution for details. Otherwise, if the first serialized character in an
object has a known range, use something outside that range. E.g., if the
first character of a serialized object is always a digit, use a non-digit like
'N' to mean a NULL pointer. Unseralization can use
std::istream::peek() to check for the 'N' tag. If the first
character doesn't have a known range, force one; e.g., write 'x'
before each object, then use something else like 'y' to mean
NULL.
If an object physically contains another object inside itself, as opposed to
containing a pointer to the other object, then nothing changes: you still
recursively dive into the child-node as if it were via a pointer.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.10] How do I serialize objects that contain pointers to other objects, but those pointers form a tree with no cycles and only "trivial" joins?
As before, the word "tree" does not mean that the objects are stored
in some sort of tree-like data structure. It simply means your objects have
pointers to each other, and the "no cycles" part means you can follow the
pointers from one object to the next and never return to an earlier object.
The objects are not "inside" a tree; they are a tree. If that's
doesn't make sense, you really should read
the lingo FAQ before continuing with this
one.
Use this solution if the graph contains
joins at the leaf nodes, but those joins can
be easily reconstructed via a simple look-up table. For example, the
parse-tree of an arithmetic expression like (3*(a+b) - 1/a)
might have joins since a variable-name (like a) can show up
more than once. If you want the graph to use the same exact node-object to
represent both occurrences of that variable, then you could use this solution.
Although the above constraints don't fit with those of
the solution without any joins, it's
so close that you can squeeze things into that solution. Here are the
differences:
- During serialization, ignore the join completely.
- During unserializing, create a look-up table, like
std::map<std::string,Node*>, that maps from the variable name to the
associated node.
Caveat: this assumes that all occurrences of variable a should
map to the same node object; if it's more complicated than this, that is, if
some occurrences of a should map to one object and some to another,
you might need to use a more sophisticated
solution.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.11] How do I serialize objects that contain pointers to other objects, and those pointers form a graph that might have cycles or non-trivial joins?
Caveat: the word "graph" does not mean that the objects are stored in
some sort of data structure. Your objects form a graph because they point to
each other. They're not "inside" a graph; they are a graph. If that
doesn't make sense, you really should read
the lingo FAQ before continuing with this
one.
Use this solution if the graph can contain
cycles, or if it can contain more
complicated kinds of joins than are allowed
by the solution for trivial
joins. This solution handles two core issues: it avoids infinite
recursion and it writes/reads each node's identity in addition its
contents.
A node's identity doesn't normally have to be consistent between different
output streams. For example, if a particular file uses the number 3 to
represent node x, a different file can use the number 3 to represent a
different node, y.
There are some clever ways to serialize the graph, but the simplest to
describe is a two-pass algorithm that uses an object-ID map, e.g.,
std::map<Node*,unsigned> oidMap. The first pass populates our
oidMap, that is, it builds a mapping from object pointer to the
integer that represents the object's identity. It does this by recursively
diving through the graph, at each node checking if the node is already in
oidMap, and if not, adding the node and a unique integer to
oidMap and recursively diving into the new node's children. The
unique integer is often just the initial oidMap.size(), e.g.,
unsigned n = oidMap.size(); oidMap[nodePtr] = n. (Yes, we did that in
two statements. You must also. Do not shorten it to a single
statement. You have been warned.)
The second pass iterates through the nodes of oidMap, and at each,
writes the node's identity (the associated integer) followed by its contents.
When writing the contents of a node that contains pointers to other nodes,
instead of diving into those "child" objects, it simply writes the identity
(the associated integer) of the pointer to those nodes. For example, when
your node contains Node* child, simply
write the integer oidMap[child].
After the second pass, the oidMap can be discarded. In other words,
the mapping from Node* to unsigned should not normally
survive beyond the end of the serialization of any given graph.
There are also some clever ways to unserialize the graph, but here again the
simplest to describe is a two-pass algorithm. The first pass populates a
std::vector<Node*> v with objects of the right class, but any child
pointers within those objects are all NULL. This means v[3]
will point to the object whose oid is 3, but any child pointers inside that
object will be NULL. The second pass populates the child pointers
inside the objects, e.g., if v[3] has a child pointer called
child that is supposed to point to the object whose oid is 5,
the second pass changes changes v[3].child from NULL to
v[5] (obviously encapsulation might prevent it from directly accessing
v[3].child, but ultimately v[3].child gets changed to
v[5]). After unserializing a given stream, the vector
v can normally be discarded. In other words, the oids (3,
5, etc.) mean nothing when serializing or unserializing a
different stream those numbers are only meaningful within a given stream.
Note: if your objects contain polymorphic pointers, that is, base
class pointers that might point at derived class objects, then
use the technique described earlier.
You'll also want to read some of the earlier techniques for handling
NULL, writing version numbers, etc.
Note: you should seriously consider the Visitor
pattern when recursively diving through the graph, since serialization
is probably just one of many different reasons to make that recursive dive,
and they'll all need to avoid infinite recursion.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.12] Are there any caveats when serializing / unserializing objects?
One of the things you don't want to do, except in unusual
circumstances, is to make any changes to the node's data during the traversal.
For example, some people feel they can map from Node* to integer by
simply adding an integer as a data member in the Node class. They
also sometimes add a bool haveVisited flag as another data member
within Node objects.
But this causes lots of multi-threading and/or performance problems.
Your serialize() method can no longer be const, so even though
it is logically just a reader of the node data, and even though it logically
makes sense to have multiple threads reading the nodes at the same time, the
actual algorithm writes into the nodes. If you fully understand threading and
reader/writer conflicts, the best you can hope for is to make your code more
complicated and slower (you'll have to block all reader threads whenever
any thread is doing anything with the graph). If you (or
those who will maintain the code after you!) don't fully understand
reader/writer conflicts, this technique can create very serious and very
subtle errors. Reader/writer and writer/writer conflicts are so subtle they
often slip through test into the field. Bad news. Trust me. If you don't
trust me, talk to someone you do trust. But don't cut corners.
There's lots more I could say, such as several simplifications and special
cases, but I've already spent too much time on this. If you want more info,
spend some money.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[36.13] What's all this about graphs, trees, nodes, cycles, joins, and joins at the leaves vs. internal nodes?
When your objects contain pointers to other objects, you end up with something
computer scientists call a graph. Not that your objects are stored
inside a tree-like data structure; they are a tree-like structure.
Your objects correspond to the graph's nodes AKA vertices, and
the pointers within your objects correspond to the graph's edges. The
graph is of a special variety called a rooted, directed graph.
The root object to be serialized corresponds to the graph's root node,
and the pointers correspond to directed edges.
If object x has a pointer to object y, we say that x
is a parent of y and/or that y is a child of
x.
A path through a graph corresponds to starting with an object,
following a pointer to another object, etc., etc. to an arbitrary depth. If
there is a path from x to z we say that x is an
ancestor of z and/or that z is a descendent of
x.
A join in a graph means there are two or more distinct paths to the
same object. For example, if z is a child of both x and
y, then the graph has a join, and z is a join node.
A cycle in a graph means there is a path from an object back to
itself: if x has a pointer to itself, or to y which points to
x, or to y which points to z which points to
x, etc. A graph is cyclic if it has one or more cycles;
otherwise it is acyclic.
An internal node is a node with children. A leaf node is a
node without children.
As used in this section, the word tree means a rooted, directed,
acyclic graph. Note that each node within a tree is also a tree.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
E-mail the author
[ C++ FAQ Lite
| Table of contents
| Subject index
| About the author
| ©
| Download your own copy ]
Revised Mar 1, 2006