How to malloc() the Right Way

Having been in the game for a while now, I’ve seen different styles of malloc()ing data. With new students, I almost universally see:

int *array = (int *)malloc(sizeof(int) * N);

There are two problems that need addressing here:

  1. The cast to (int *)
  2. Using sizeof(int)

If you like either of these, bear with me, I have reasons for changing both to the below:

int *array = malloc(N * sizeof *array);

If you aren’t already aware, the sizeof operator can be applied to an expression. The expression is not evaluated (array isn’t actually dereferenced in the above), only the type of the expression is examined. The type of *array is int, so sizeof *array is equal to sizeof(int), and is also computed at compile-time.

Why you shouldn’t cast

First of all, you don’t need to. All data pointer types in C are implicitly convertible to and from void * with respect to constness. This should be enough of a reason not to use it, but keep reading if you’re unconvinced.

Second, and this is vital for students, but especially teachers: code should have as few casts as possible. A cast is a sign that something dangerous or strange it happening. A cast indicates that the normal rules of the type system can’t do what the programmer needs to do. I haven’t seen many good reasons for type-casting outside of low-level systems code. Teachers, we should not teach students to use casts without a second thought. The problem becomes more pronounced with they use a cast somewhere to shut up the compiler, their code breaks, and no one knows why. malloc() is not unusual, strange, or dangerous in C. It shouldn’t be casted, it’s a natural part of the flow of any significant C program.

Third, the cast can actually prevent the compiler from issuing a warning if stdlib.h isn’t included. C89 doesn’t require a declaration for all functions, as long as it can find them at link-time. Simply put, this means you can call malloc without including stdlib.h. The compiler will produce an implicit declaration: int malloc(int). This is invalid in C99, but most compilers still allow it.

Why you shouldn’t use sizeof(type) either

If the type changes, the malloc line won’t need many changes. Let’s examine the first example where one allocates an array of int. Now, imagine the programmer later realizes the type needs to be long, not int.

int *array = (int *)malloc(sizeof(int) * N); /* before */
long *array = (long *)malloc(sizeof(long) *N); /* after */

Three changes. In the latter case we have

int *array = malloc(N * sizeof *array); /* before */
long *array = malloc(N * sizeof *array); /* after */

One change. No changes to the right side of the =, which is important because often the call to malloc() isn’t right next to the variables declaration. This makes it clearer that using sizeof(int) and casting prevent your code from being very DRY. For all that is said about DRY code, the easiest way to think about it, in my opinion, is that you want to design your program so that if you need to make a change, you only need to change one thing. Repeating the type in three places makes your code harder to modify and maintain.

A more convincing example

Let’s consider a typical type of struct, a size and a pointer to the contained data. Along with that struct I’ll create a function

myarray.h

#ifndef MY_ARRAY_H_
#define MY_ARRAY_H_

#include <stddef.h>
/* Array type, stores its length alongside the data */
struct my_array {
  size_t size;
  int *data;
};

struct my_array *new_my_array(size_t sz);
void free_my_array(struct my_array *);

#endif

myarray.c

#include "myarray.h"
#include <stdlib.h>
struct my_array *new_my_array(size_t sz) {
  /* space for the struct itself */
  struct my_array *arr = malloc(sizeof *arr);
  /* space for the contained data */
  arr->data = malloc(sz * sizeof *arr->data);
  arr->size = sz;
  return arr;
}

void free_my_array(struct my_array *arr) {
  free(arr->data);
  free(arr);
}

Note: for the more advance C programmers out there, I know this could be done with a flex-array, but it’s an example, there could be two arrays, it could be C89, whatever.

The usage of this should be pretty obvious: struct my_array *arr = new_my_array(N);. Now consider the same problem: the programmer decides int should be long. As I’ve written this, the only change needs to be in myarray.h. int *data becomes long *data, and the allocation in myarray.c is still correct. There is no cast that needs changing and since sizeof uses the variable, it’s transitively modified by the change in the header.

If, on the other hand, the malloc line in new_my_array used a cast and a sizeof(int), the code would still compile, but would be wrong. In this example the change would be required across two files, which is problematic enough, but it could be worse. What if the array is growable? Now there’s another malloc or realloc somewhere that needs to be found. What if the array can grow in more than one place? What if the array can shrink too?

Objections

But what if I want it to work in C++?

C++ doesn’t allow implicitly converting from void* to another data pointer without a cast. This is a valid question, but there are issues with it, mostly arising from the fact that C and C++ are two different languages

Right away, if you have a C++ project that has malloc in it, in that case malloc is a strange thing to use, and should require a cast. C++ has new for dynamic allocation. If you’re allocating an array, you’re better off using std::vector, std::string, or one of the smart pointers if you really think that’s best.

If you actually really need to malloc in C++ (which you probably don’t). Then a C-style cast isn’t the right way to do it. It’s a sledgehammer, whereas C++ has a lot of tools to do what you actually mean. In the malloc case, what you really want is:

int *arr = static_cast<int *>(std::malloc(N * sizeof *arr));

If you’re thinking “that’s clunky and ugly,” I’d agree, but I’m also not someone who uses malloc in C++ (unless I really really need it). If you’re a C++ programmer and you don’t understand the difference between static_cast, dynamic_cast, reinterpret_cast, and const_cast, then seriously, start reading up. A student once asked me: “what’s the right time to use a C-style cast?” and as I told her, “when you’re programming in C.”

That’s not what I mean! I want my library to work in both C and C++

For that you should be using extern “C”.Note: What follows is only tangentially related to the original point of this post

You have written a C library, and you want to link it with C++ code. This is quite normal, and there are two rules of C++ that let us do so. C++ allows one to prefix extern "C" on a function declaration, meaning the name will not be mangled and the linker knows to look for a C-style named function, rather than a C++ style name (which would be mangled). The declaration then appears as extern "C" int f(int);. However, we can’t just throw around extern "C" in C code, because the C language knows nothing about it. The other rule we have says that a conforming C++ compiler must define the preprocessor symbol __cplusplus. Combining these two, one can create a declaration that works in C++ and C:

#ifdef __cplusplus
extern "C"
#endif
void myfunc(int);

In the myarray example, the whole thing can be wrapped in an extern "C" block.

#ifndef MY_ARRAY_H_
#define MY_ARRAY_H_
#include <stddef.h>

#ifdef __cplusplus
extern "C" {
#endif

/* Array type, stores its length alongside the data */
struct my_array {
  size_t size;
  int *data;
};

struct my_array *new_my_array(size_t sz);
void free_my_array(struct my_array *);

#ifdef __cplusplus
} /* close the extern "C" block */
#endif

#endif

Voila, now your C and C++ code can share a header, be compiled as their own language, and still be linked.

$ gcc -std=c89 -Wall -c myarray.c
$ g++ -std=c++11 -Wall -c main.cpp
$ g++ myarray.o main.o -o main

This whole concept may merit more explaining in another post

I don’t really understand what you just said, but I meant that I want to be able to run my code through a C++ compiler as well as a C compiler

Well, in that case you’ll find yourself restricted to using a subset of C89 and C99. You’ll also have more problems to deal with than you realize, there are a lot of things that are perfectly valid C but when put through a C++ compiler, they crash and burn.

If you still prefer to cast malloc()‘s return in C, or use sizeof(type). I’m interested as to why, so please, do tell.

Advertisements