I am amazed that programming languages (well, the typical ones, at least) don’t make it easier to manipulate files.

A common way files are read in C is to create a struct that matches the file format and to call fread to read the file into it. Isn’t that easy enough?

Not really. This approach is fine in isolation, but it’s non-portable:

  • Different architectures or compilers may lay out structs differently. Your compiler sometimes can choose to add padding bytes to guarantee alignment requirements. Luckily compilers aren’t allowed to do it willy-nilly, and some compilers offer #pragmas to control this.
  • Different architectures have different integer sizes. Appropriate typedefs often can mitigate this, but it’s still imperfect since it requires a small porting effort.
  • Different architectures use different endianness. If a file format is defined to store integers in big-endian byte order but your architecture is little-endian, then if you read the bytes out of the struct without first swapping the bytes you’ll end up with the wrong value.

The typical way to solve these problems is to read a file a byte at a time, copying each byte into the appropriate location within the struct. This is tedious.

Programming languages should provide a mechanism for programmers to declare a struct that must conform to some external format requirement. Programmers should be able to attribute the struct, prohibiting implicit padding bytes and specifying what the size and endian requirements are for each field. For example:

file_struct myFileFormat
{
    uint8 version;
    uint8[3]; // Reserved.
    uint32BE numElements;
    uint32BE dataOffset;
};

When retrieving fields from such a struct, the compiler should generate code that automatically performs the necessary byte swaps and internal type promotions.

One-dimensional Cube

January 4, 2005 at 3:21 am (PT) in Rants/Raves, Reviews

On the other side of the movie spectrum, I watched the rather disappointing Cube last night.

I try not to write about the nitpicks I have with movies, especially with non-mainstream movies like Cube, because who really wants to read someone complaining? But I just can’t help myself.

The bad math soured a lot of the experience. For example (spoilers ahead):

(more…)

Brad Bird is not a gun

January 4, 2005 at 3:12 am (PT) in Rants/Raves

Well, I finally watched The Incredibles. Even though I found it to be totally predictable, it managed to be really fun along the way. Who cares if it’s full of clichés when they’re done that well?

I think it’s probably the best movie I’ve seen all year (okay, that’s not saying much… best movie I’ve seen in the past year, then), and possibly it might turn out to be one of my all-time favorites.

Brad Bird is my new hero. (He made one of my other all-time favorite movies, The Iron Giant.) He’s not a gun—he’s Superman!

C’s gets and fgets

November 27, 2004 at 3:12 pm (PT) in Programming

Everyone knows that the gets function in the C standard library is dangerous to use because it offers no protection against buffer overflows.

What should people use instead?

The typical answer is to use fgets. Unfortunately, although safe, fgets in non-trivial cases is much harder to use properly:

  • How do you determine what a good maximum buffer size is? The reason why using gets is dangerous in the first place is because you don’t know how big the input can be.

  • Unlike gets, fgets includes the trailing newline character, but only if the entire line fits within the buffer. This can be remedied easily by replacing the newline character with NUL, but it’s a nuisance.

  • If the input line exceeded the buffer size, fgets leaves the excess in the input stream. Now your input stream is a bad state, and you either need to discard the excess (and possibly throw away the incomplete line you just read), or you need to grow your buffer and try again.

    Discarding the excess usually involves calling fscanf, and I don’t know anyone who uses fscanf without disdain, because fscanf is hard to use properly too. Furthermore, discarding the line by itself won’t necessarily make you any better off, because you’ve accepted incomplete input and still have to walk the road to recovery.

    Growing the buffer is also a hassle. You either need to grow the buffer exponentially as you fill the buffer to capacity, or you need to read ahead, find out how many more bytes you need, grow the buffer, backtrack, and read the excess bytes again. (The latter isn’t even possible with stdin.) And, of course, this means you also need to handle allocation failure.

In all, this means to use fgets properly, you need to make several more library calls than you want. For example, here’s an implementation that discards line excess:

/** Returns the number of characters read or (size_t) -1 if the
  * line exceeded the buffer size.
  */
size_t fgetline(char* line, size_t max, FILE* fp)
{
    if (fgets(line, max, fp) == NULL)
    {
        return 0;
    }
    else
    {
        /* Remove a trailing '\n' if necessary. */
        size_t length = strlen(line);
        if (line[length - 1] == '\n')
        {
            line[--length] = '\0';
            return length;
        }
        else
        {
            /* Swallow any unread characters in the line. */
            fscanf(fp, "%*[^\n]");
            fgetc(fp); /* Swallow the trailing '\n'. */
            return -1;
        }
    }
}

Ugh!

I personally prefer using C.B. Falconer’s public-domain ggets/fggets functions, which have gets-like semantics and dynamically allocate buffers large enough to accomodate the entire line.

Additional reading: Getting Interactive Input in C

Performance evaluations

November 26, 2004 at 4:52 pm (PT) in Personal

I got my performance evaluation a few weeks ago.

VMware’s employee rating system has the following choices:

  • Exceptional
  • Outstanding
  • Great
  • Needs Improvement
  • Unsatisfactory

Supposedly I’m doing “great”, but I’m not sure if that means I’m actually doing great or if that means I don’t need improvement. The lack of a “mediocre”/“adequate”/“satisfactory” rating throws things off.

Artsy fartsy

October 30, 2004 at 7:45 pm (PT) in Personal

One of my coworkers had a birthday last week, and to honor the occasion, the rest of us decided to surprise her by vandalizing her office with chinsy birthday decorations. I unexpectedly was given the task of writing “Happy Birthday” in big block letters across her whiteboard. Naturally my lettering came out awful, and what’s more, I didn’t even use a good style. (For some reason it didn’t occur to me to use the lopsided, cartoony letters that seem to permeate everything I do. Instead I used a quick, harried, angular, and totally unreadable, undisciplined mess that I rightfully abandoned many years ago.)

Anyhow, at this point I realized just how out of practice I am (which is all the worse since I was never very good to begin with). I hadn’t done anything remotely creative or artistic in months. (The last time was when I made the ambigrams.)

So last weekend I started getting back into the groove of fiddling with Photoshop, and now I’m once again my wanna-be artsy mood. I wonder how long it’ll last (and more importantly, if I’ll get anything accomplished).

Pretension

October 23, 2004 at 8:47 pm (PT) in Art

Bleah. I guess I’m feeling narcissistic. Variations of my ambigram:

On a related note, other ambigram sites I’ve come across:

Pitfalls of C++’s subscript operator

September 23, 2004 at 12:59 am (PT) in Programming

What’s wrong with this C++ code?

#include <iostream>
#include <string>

class Foo { public: Foo() { }
operator bool() { return true; }
std::string operator[](const std::string& s) { return s; } };
int main(void) { Foo foo; std::cout << foo["hello world!"] << std::endl; return 0; }

Answer:

(more…)

Microsoft Outlook hates me.

September 22, 2004 at 10:02 pm (PT) in Personal

… but can you blame it?

Software piracy

September 17, 2004 at 11:49 pm (PT) in Rants/Raves

Apparently some software developers think it’s a good anti-piracy measure to sabotage a user’s computer. There’s one scheme that makes pirated games buggier, and there are some programs that delete your data if pirated.

That’s just retarded.

  • You can’t lose something you never had. Anti-piracy advocates always love to claim that the software industry loses billions of dollars each year to piracy. Did developers actually lose money? Did nasty pirates break into their bank accounts and withdraw all their cash? The only thing that companies can claim to lose are potential sales. Do they really believe that every pirate would have purchased their software if it were uncrackable?

    (Or maybe what the advocates really mean is that the software industry wastes oodles of money each year developing new and ultimately futile anti-piracy schemes. That I’d believe.)

  • Who says there’s no such thing as bad publicity? If you want your software to act buggy if it thinks it’s pirated, go ahead, but don’t expect word-of-mouth reviews to be positive. If anything, the presence of bugs, intentional or not, will make people less likely to give you money. Will people be able to distinguish real bugs from the deliberate copy-prevention bugs? Will people think your game is fun if its physics are amiss and their bullets keep missing?

    And if you destroy people’s data, forget it. You’ve just guaranteed that you never, ever will see a dime from them or from anyone within complaining distance. Congratulations, you’ve alienated potential customers.

  • Free advertising is better than bad publicity. To some degree, piracy can help software companies. Piracy can help build name recognition. It can lead to de facto standards. Would Microsoft Windows and Microsoft Office be as widespread if users didn’t copy them from their workplaces? Piracy levels the cost of Windows down to $0, and at that price, it’s hard for Windows not to outcompete Linux. Microsoft doesn’t gain money directly from this, but it does gain market- and mind-share. As other examples, look at Adobe Photoshop, 3DS Max, and Maya.

    Even if you could make your software uncrackable, are your competitors’ products uncrackable too? Do you really want to drive your users—paying or not—to them?

    Software developers should be happy that people are using their products at all.

In the end, I think the best defense is a strong offense. The best anti-piracy measure is to make a good product that’s worth buying. People take pride in doing things that are worthwhile. (And for goodness’ sake, make sure your product is easy to buy!)