dropt 1.1.0

May 6, 2012 at 1:40 pm (PT) in Programming

I’m releasing dropt 1.1.0 today.

dropt is a C library for parsing command-line options. Yes, there are a lot of existing ones already, but I wasn’t satisfied with those that I had come across:

(more…)

8 years at VMware

April 26, 2012 at 11:21 pm (PT) in Personal, Programming

A few weeks ago, one of my coworkers complained about doing maintenance on a project that he had moved away from. I told him that authoring code is like having a child: you can’t say you’re tired of it and abandon it. If you brought it into this world, you should take some responsibility for it. If you’re not prepared to do that, don’t have that baby.

I was joking, of course, but perhaps it’s not a completely ridiculous comparison (although I suspect that my friends who are actual parents might disagree).

Today marks my eight year anniversary at VMware. For those past 8 years, I’ve spent 40 hours per week (well, probably more) developing VMware Workstation, watching it grow and trying to imbue it with whatever knowledge I have. A number of people tell me that I’ve been at VMware for too long and should move on, but I’m not ready to let go yet.

Cygwin is evil

August 21, 2009 at 2:06 am (PT) in Programming, Rants/Raves

Cygwin, a port of various Unix utilities to provide a Unix-like environment on Windows, has been around for a long while. It’s well-known; sites such as Lifehacker gives tips about using it. My tip is: avoid Cygwin unless absolutely necessary.

Cygwin-based tools depend on cygwin1.dll, and cygwin1.dll is obnoxious because:

  • It’s DLL-hell squared. You can’t simultaneously use different Cygwin-based tools that depend on different versions of the cygwin1.dll. Normally Windows programs can avoid DLL-hell by storing dependent DLLs in the programs’ own directories, but cygwin1.dll goes out of its way to search for other instances of itself.
  • To avoid that problem, the Cygwin authors discourage developers from bundling cygwin1.dll with their applications and instead want developers to include the Cygwin installer, which automatically fetches the current version from the Internet. Unfortunately, the Cygwin installer is horrible. The UI is non-standard and is completely bewildering. There is no uninstaller. Making end-users download and run the monolithic Cygwin installer just to get a small command-line tool also violates the Unix philosophy of having small tools for specific tasks.
  • The approach of bundling the Cygwin installer is fundamentally flawed anyway. Even if each application includes the installer, there’s no guarantee that the current version of cygwin1.dll is compatible with all of them. Installing one application could break existing ones. Did I mention it being DLL-hell squared?

So what are people to do?

  • If you want common Unix command-line utilities, check UnxUtils first, which is a collection of ones that have been ported to run natively on Windows.
  • If you want to compile a program written for Linux, try using the MinGW compiler first. For command-line programs, there’s a good chance that MinGW can compile it, and the generated binary won’t have any cygwin dependencies. (Yes, Cygwin’s version of gcc has an option to not require cygwin1.dll, but it basically puts it into a MinGW mode anyway.)
  • If you need a full Linux environment, install Linux in a virtual machine (shameless plug) or use andLinux. (andLinux doesn’t support 64-bit versions of Windows yet, however.)
  • If you want an X Server, try Xming.

I should note that Cygwin is still a necessary evil for stable versions of bash and sshd. I don’t know of any good alternative implementations of those.

Yet more dangers of macros in C++

March 27, 2009 at 1:34 am (PT) in Programming

I saw some code:

std::string s = foo();
bar(s.c_str());

I tried changing it to:

bar(foo().c_str());

and things broke. It turns out that bar() was a macro that expanded to:

#define bar(s) do { struct st; st.someString = (s); baz(&st); } while (0)

(If you’re wondering about the do ... while (0), consult the comp.lang.c FAQ.)

This is fine for C code, but in the C++ world, this is dangerous. In this case, foo() returns an anonymous std::string object by value. That anonymous object then is destroyed after its internals are assigned to st.someString but before baz() gets to use it, causing baz() to be called with garbage.

Moral #1: Macros that don’t have perfect function-like semantics shouldn’t look like functions. For example, macros should be clearly indicated by naming them in all uppercase.

Moral #2: Use inline functions when possible. (In this case, however, the macro was provided by a C library.)

Programming ethics

January 7, 2006 at 2:49 am (PT) in Programming, Rants/Raves

A couple of weeks ago I read about a scam anti-virus program sold by some no-name software company. The software reported false positives to induce hapless people into thinking that they were infected with something and to buy their useless product. A few days ago, Mark Russinovich of Sysinternals wrote about bogus spyware removers.

I’m so disgusted that I wonder if there should be a programming ethics board that allows programmers to become certified or licensed voluntarily. Shouldn’t people writing so-called anti-virus software take some form of Hippocratic Oath? Such a system wouldn’t be too different from the driver signing that Microsoft does, except it’d be a general system for individual developers, not for particular binaries. Hobbyists still would be able to create, distribute, and sell unlicensed programs, but anyone wanting to establish trust could advertise that they’re licensed. A signing authority could verify that licenses are active and authentic. Obtaining a license could require verification of developers’ personal information, allowing them to be identified and accountable if they break the code (pun intended). Qualification exams even could test for recognition of buffer overflows and other unsafe practices.

On the other hand, what would the punishment be? If the licensing fee is too low, it might be worthwhile for dishonest developers to obtain licenses just to break them. If the licensing fee is too high, no one would participate. And, of course, it’s unclear how to distinguish between intentionally malicious code and simply negligent code.

Recently in a programming forum I frequent, someone posted some sample code he had written and asked for a critique. He would be providing this code to prospective employers.

If you’re trying to get a job programming, providing sample code is good. However, it’s a little surprising what some people consider to be good sample code.

Your goal should be not only to demonstrate that you can write code, but that you can write maintainable code. Write-once, read-never code is worse than useless; it’s fragile and wastes the time of anyone else who ever tries to modify it.

Quick and easy things you can do to improve your code samples:

  • Make your code readable. More whitespace is better than less. More braces are better than fewer. People judge books by covers; make your code look good.
  • Document your code. Document your functions’ contracts; what are their inputs? what are their outputs?
  • Use defensive programming techniques. Aggressively use assertions to check that inputs are valid. Enforce functions’ contracts.
  • Handle errors. (Okay, this one is neither quick nor easy.) In my opinion, handling errors well is probably one of the hardest things about programming. It complicates resource management, it makes code ugly, it’s tedious, no one wants to do it, but it has to be done. If you’re going to submit sample code, take the time to handle errors. Use SESE patterns in C and RAII patterns in C++. At the very least, use comments to acknowledge where you ignore errors.

(Of course, only do the above if you also intend to continue doing them in practice. Misrepresenting yourself is dishonest, okay?)

Back into Palm OS programming?

April 25, 2005 at 1:10 am (PT) in Personal, Programming

I bought a Treo 650 this week, and it’s awesome. It’s even inspiring me to do some programming for Palm OS again. Unfortunately, getting back into that groove is really hard.

I wrote a lot of great code while I was at Sony, but of course all that code is Sony-owned and outside of my grasp. To do any Palm OS development work again, I’d need to rewrite everything from scratch, which is demotivating because I’d be redoing work that I had done already and—since I’m now rusty at this—work that I had done better. It makes me feel like my life is progressing backwards.

I am amazed that programming languages (well, the typical ones, at least) don’t make it easier to manipulate files.

A common way files are read in C is to create a struct that matches the file format and to call fread to read the file into it. Isn’t that easy enough?

Not really. This approach is fine in isolation, but it’s non-portable:

  • Different architectures or compilers may lay out structs differently. Your compiler sometimes can choose to add padding bytes to guarantee alignment requirements. Luckily compilers aren’t allowed to do it willy-nilly, and some compilers offer #pragmas to control this.
  • Different architectures have different integer sizes. Appropriate typedefs often can mitigate this, but it’s still imperfect since it requires a small porting effort.
  • Different architectures use different endianness. If a file format is defined to store integers in big-endian byte order but your architecture is little-endian, then if you read the bytes out of the struct without first swapping the bytes you’ll end up with the wrong value.

The typical way to solve these problems is to read a file a byte at a time, copying each byte into the appropriate location within the struct. This is tedious.

Programming languages should provide a mechanism for programmers to declare a struct that must conform to some external format requirement. Programmers should be able to attribute the struct, prohibiting implicit padding bytes and specifying what the size and endian requirements are for each field. For example:

file_struct myFileFormat
{
    uint8 version;
    uint8[3]; // Reserved.
    uint32BE numElements;
    uint32BE dataOffset;
};

When retrieving fields from such a struct, the compiler should generate code that automatically performs the necessary byte swaps and internal type promotions.

C’s gets and fgets

November 27, 2004 at 3:12 pm (PT) in Programming

Everyone knows that the gets function in the C standard library is dangerous to use because it offers no protection against buffer overflows.

What should people use instead?

The typical answer is to use fgets. Unfortunately, although safe, fgets in non-trivial cases is much harder to use properly:

  • How do you determine what a good maximum buffer size is? The reason why using gets is dangerous in the first place is because you don’t know how big the input can be.

  • Unlike gets, fgets includes the trailing newline character, but only if the entire line fits within the buffer. This can be remedied easily by replacing the newline character with NUL, but it’s a nuisance.

  • If the input line exceeded the buffer size, fgets leaves the excess in the input stream. Now your input stream is a bad state, and you either need to discard the excess (and possibly throw away the incomplete line you just read), or you need to grow your buffer and try again.

    Discarding the excess usually involves calling fscanf, and I don’t know anyone who uses fscanf without disdain, because fscanf is hard to use properly too. Furthermore, discarding the line by itself won’t necessarily make you any better off, because you’ve accepted incomplete input and still have to walk the road to recovery.

    Growing the buffer is also a hassle. You either need to grow the buffer exponentially as you fill the buffer to capacity, or you need to read ahead, find out how many more bytes you need, grow the buffer, backtrack, and read the excess bytes again. (The latter isn’t even possible with stdin.) And, of course, this means you also need to handle allocation failure.

In all, this means to use fgets properly, you need to make several more library calls than you want. For example, here’s an implementation that discards line excess:

/** Returns the number of characters read or (size_t) -1 if the
  * line exceeded the buffer size.
  */
size_t fgetline(char* line, size_t max, FILE* fp)
{
    if (fgets(line, max, fp) == NULL)
    {
        return 0;
    }
    else
    {
        /* Remove a trailing '\n' if necessary. */
        size_t length = strlen(line);
        if (line[length - 1] == '\n')
        {
            line[--length] = '\0';
            return length;
        }
        else
        {
            /* Swallow any unread characters in the line. */
            fscanf(fp, "%*[^\n]");
            fgetc(fp); /* Swallow the trailing '\n'. */
            return -1;
        }
    }
}

Ugh!

I personally prefer using C.B. Falconer’s public-domain ggets/fggets functions, which have gets-like semantics and dynamically allocate buffers large enough to accomodate the entire line.

Additional reading: Getting Interactive Input in C

Pitfalls of C++’s subscript operator

September 23, 2004 at 12:59 am (PT) in Programming

What’s wrong with this C++ code?

#include <iostream>
#include <string>

class Foo { public: Foo() { }
operator bool() { return true; }
std::string operator[](const std::string& s) { return s; } };
int main(void) { Foo foo; std::cout << foo["hello world!"] << std::endl; return 0; }

Answer:

(more…)