Yet more dangers of macros in C++

March 27, 2009 at 1:34 am (PT) in Programming

I saw some code:

std::string s = foo();
bar(s.c_str());

I tried changing it to:

bar(foo().c_str());

and things broke. It turns out that bar() was a macro that expanded to:

#define bar(s) do { struct st; st.someString = (s); baz(&st); } while (0)

(If you’re wondering about the do ... while (0), consult the comp.lang.c FAQ.)

This is fine for C code, but in the C++ world, this is dangerous. In this case, foo() returns an anonymous std::string object by value. That anonymous object then is destroyed after its internals are assigned to st.someString but before baz() gets to use it, causing baz() to be called with garbage.

Moral #1: Macros that don’t have perfect function-like semantics shouldn’t look like functions. For example, macros should be clearly indicated by naming them in all uppercase.

Moral #2: Use inline functions when possible. (In this case, however, the macro was provided by a C library.)

C’s gets and fgets

November 27, 2004 at 3:12 pm (PT) in Programming

Everyone knows that the gets function in the C standard library is dangerous to use because it offers no protection against buffer overflows.

What should people use instead?

The typical answer is to use fgets. Unfortunately, although safe, fgets in non-trivial cases is much harder to use properly:

  • How do you determine what a good maximum buffer size is? The reason why using gets is dangerous in the first place is because you don’t know how big the input can be.

  • Unlike gets, fgets includes the trailing newline character, but only if the entire line fits within the buffer. This can be remedied easily by replacing the newline character with NUL, but it’s a nuisance.

  • If the input line exceeded the buffer size, fgets leaves the excess in the input stream. Now your input stream is a bad state, and you either need to discard the excess (and possibly throw away the incomplete line you just read), or you need to grow your buffer and try again.

    Discarding the excess usually involves calling fscanf, and I don’t know anyone who uses fscanf without disdain, because fscanf is hard to use properly too. Furthermore, discarding the line by itself won’t necessarily make you any better off, because you’ve accepted incomplete input and still have to walk the road to recovery.

    Growing the buffer is also a hassle. You either need to grow the buffer exponentially as you fill the buffer to capacity, or you need to read ahead, find out how many more bytes you need, grow the buffer, backtrack, and read the excess bytes again. (The latter isn’t even possible with stdin.) And, of course, this means you also need to handle allocation failure.

In all, this means to use fgets properly, you need to make several more library calls than you want. For example, here’s an implementation that discards line excess:

/** Returns the number of characters read or (size_t) -1 if the
  * line exceeded the buffer size.
  */
size_t fgetline(char* line, size_t max, FILE* fp)
{
    if (fgets(line, max, fp) == NULL)
    {
        return 0;
    }
    else
    {
        /* Remove a trailing '\n' if necessary. */
        size_t length = strlen(line);
        if (line[length - 1] == '\n')
        {
            line[--length] = '\0';
            return length;
        }
        else
        {
            /* Swallow any unread characters in the line. */
            fscanf(fp, "%*[^\n]");
            fgetc(fp); /* Swallow the trailing '\n'. */
            return -1;
        }
    }
}

Ugh!

I personally prefer using C.B. Falconer’s public-domain ggets/fggets functions, which have gets-like semantics and dynamically allocate buffers large enough to accomodate the entire line.

Additional reading: Getting Interactive Input in C

(void) casts in C aren’t totally useless.

August 23, 2004 at 12:37 am (PT) in Programming

Sometimes if I don’t care about a C function’s return value, I explicitly discard it with a (void) cast:

(void) foo();

Why bother? After all, isn’t the return value implicitly discarded simply by not assigning it to anything?

A (void) cast is a comment to the reader stating, “Yes, the author is aware that this function returns something, has considered the consequences of ignoring possible error values, and just doesn’t care.”

Without the cast, the code says, “Did the author intentionally ignore the return value of this function call? Is there some failure case the author forgot to handle? Can something go wrong here?”

Furthermore, if something does go wrong, the (void) cast marks a possible failure point.