MadBlog
Wednesday 29 November 2006

pearls

Refactoring code is (sometimes) a real bit of fun. I've seen a lot of:

 if (!foo) {
     return NULL;
 } else {
     return foo;
 }

or quite a lot of :

 (<boolean expression>) ? true : false

but today I found one I've never seen before:

 foo ? -foo : 0

On C and strings APIs

C default str* functions have very poor semantics, and a whole lot of consistency problems. That's a fact, a well known one.

For those who still don't know them, and only to cite a few:

  • strlen returns a size_t but *printf functions returns int's;
  • str*{cpy,cat} returns a pointer to the destination, which is useless;
  • strncat can returns things that are not strings (because not NUL-terminated);
  • ...

A good string API MUST treat any string as a buffer + its allocated size (not len, I said size, meaning including the terminating NUL). Moreover, it should return informations that avoid the stupid problem of strcat that has to compute the destination length again and again. There is one function of the usual C string API that has it right, if you forget about the int return type problem: snprintf[1]. Why ? because:

  1. it prevents most buffer overflows because you always pass the buffer size;
  2. unlike strncpy the resulting buffer contains a true NUL-terminated string;
  3. it returns the size the string could have had if there was enough space in the buffer, so that a correct reallocation can work with reduced programatical costs;
  4. if you support negative buffer sizes, you can chain as many of those calls, and be safe !

and if you want a proof that it makes code more concise, just go read that patch. EDIT: Here m_strcpy is a function that respects the 4 previous points: m_strcpy(dest, dlen, src) copies src into dest like its brother strcat, but in a buffer of size dlen, and returns in fact, src length, à la snprintf. And if you do m_strcpy(dest, -1, src) it does not breaks, so that I can use it without needing to check if my virtual result length overflowed or not. With those functions it's just obvious how to build new ones with exactly the same semantics.

For those who wonder, yes I've forked mutt the hard way, and am trying to deobfuscate its code (I was shocked to see how bad it was given how I like mutt as a MUA). And here what you see is that this string API semantics is very easy to use, is safe, and extendable. For example, in that patch, rfc822_write_address and rfc822_write_address_single that did had the bad property to truncate the result without any way for the caller to know that now use the usual snprintf semantics for no pain at all.

with those API no need to care about the ending '\0', or about off-by-ones, you always speak of buffer sizes, and when your current virtual write position in the buffer is pos, you just need to call pos += your_strXXX(buf + pos, buflen - pos, ...) and you'll be safe. If you want to know if a realloc is needed, you just have to verify if pos >= buflen, the same as what you have to do for snprintf.

I really wonder why such API are not common in the C world, I see way to much code using the usual C string functions that just are a PITA.

Notes

[1] and not printf like I said in the first place...