Slices and Strings in C
I like the way Go strings and slices are implemented, and I think they’re a good fit for C. Lately I’ve been working on an interpreter, and bit by bit I wrote a bunch of utilities that, gathered together, started to look like a standard library alternative I call libsy—a pun on libc, since they sound almost the same. In a few posts, I want to share what I've learned and where I ended up. Let’s start with the string definition:
typedef struct { char *a; isize len; } Str;
Very simple, just a pointer to the beginning of the string (a stands for 'address') and its length in bytes. Strings should always be a valid UTF-8. Slices have a similar structure to strings, except they have a capacity that shows how many items the underlying array can store. Here is Bytes that is used to keep kind of binary data:
typedef struct { char *a; isize len; isize cap; } Bytes;
Each slice type should be defined individually, and to make it easier, I use this macro:
#define DEFINE_SLICE(item, slice) \ typedef struct { \ item *a; \ i32 len; \ i32 cap; \ } slice DEFINE_SLICE(i32, i32s); DEFINE_SLICE(Str, Strs);
I generally prefer to stay away from macros, but since I have slices for too many types and they always have the same form, I guess it's fine.
For the use cases I have in mind, 32-bit slice indexes should be more than good enough, and since I don't care about 16-bit systems, I'm using them as a default option for slices. Slices are value types, they get passed a lot as function arguments and using 32-bit instead of 64-bit ints helps a bit with register pressure.
Now to the more interesting part: operations. Let's start with IDX—it's a macro to get an index in the slice with bounds checking:
i32 array[10] = {}; i32s slice = { array, 5, 10 }; i32 first = IDX(slice, 0); IDX(slice, 10) = 2 * first;
as you can see, you can read from or assign to IDX. If the index you've provided is out of bounds of slice, the program will crash with an error message like this:
IDX(slice, 10): index is too big (len=5)
Here is how macro is defined:
#define IDX(s, i) \
((s).a[({ \
isize IDX_i = (i); \
isize IDX_len = (s).len; \
if (IDX_i < 0) { \
Panicf("IDX(%s, %zd): negative index", #s, IDX_i); \
} \
if (IDX_i >= IDX_len) { \
Panicf( \
"IDX(%s, %zd): index is too big (len=%zd)", \
#s, IDX_i, IDX_len \
); \
} \
IDX_i; \
})])
A few things to comment:
- This uses ({ ... }) syntax that allows to execute statements inside an expression, the last statement becomes the value of the whole expression. This is non-standard, but exists in GCC and Clang for a long time.
- As you can see, you can define variables inside this block and they will stay local to scope. I prefer to prefix this variables with a name of a macro, so that there's no possible clash with the variables used macro parameters.
- Index i is used once, which means things like IDX(slice, expensiveFunction()) would work as you expect.
- Slice s, however, is used twice, and so don't put anything complicated here. I guess it's possible to make it work by wrapping the whole thing in ({ }) and storing the whole slice in a local variable, so that (s) is computed once, but it makes things more complicated, because ({ }) returns r-value and you wouldn't be able to assign to it. You can, however return a pointer and immediately dereference it, but I feel like it's not necessary.
I use IDX almost everywhere, it is simple to use, makes programs safer, and easier to debug. I also have the REV(slice, index) macro that accesses slice in reverse order and LAST(slice) that gives the last element in slice or panics, if empty.
Let's now talk about how to create a slice and append to it. I generally try to follow 'Zero is Initialization' principle, so Bytes b = {}; is a valid zero-size slice of bytes. I have an APPEND(slice, value) macro to add items to the end of the slice, which grows slice, if necessary:
i32s squares = {}; for (i32 i = 0; i < 10; i++) { APPEND(squares, i * i); }
Here is how it's defined:
#define APPEND(c, value) \
do { \
assert((c).len <= (c).cap); \
if ((c).len == (c).cap) { \
i32 APPEND_cap = (c).cap == 0 ? 1 : (c).cap * 2; \
typeof((c).a) APPEND_a = aligned_alloc( \
alignof(*(c).a), APPEND_cap * sizeof(*(c).a) \
); \
if ((c).a) { \
memcpy(APPEND_a, (c).a, (c).len * sizeof(*(c).a)); \
free((c).a); \
} \
(c).a = APPEND_a; \
(c).cap = APPEND_cap; \
} \
(c).a[(c).len++] = (value); \
} while (0)
APPEND tries to mimic std::vector or golang's append behavior by doubling underlying array size, when there's no space left, hence it's O(1) amortized. There is a lot of hassle with aligned_alloc and manual memcpy because unfortunately there is no aligned_realloc, and some of my types need to be aligned for more than 8 bytes—default alignment for glibc realloc.
APPEND is a special macro, because it assumes exclusive ownership of the slice, while other macro don't really modify slice itself. In a way, it has more strict usage requirements than append in Go, because C is not garbage collected and free would be really unhappy, if you pass a pointer from the middle of the allocated space.
I also have APPEND_LOCAL for slices allocated on stack. APPEND_LOCAL will never try to grow the array, so if there is no space left, it will panic with a helpful message.
i32 buf[10]; i32s squares = { buf, 0, 10 }; for (i32 i = 0; i < 11; i++) { APPEND_LOCAL(squares, i * i); // panics on the last iteration }
Another useful macro is SUBSLICE(slice, begin, end) that does what slice[begin:end] do in other languages:
#define SUBSLICE(slice, begin, end) ((typeof(slice)){ \
.a = &(slice).a[begin], \
.len = ({ \
isize SUBS_begin = (begin); \
isize SUBS_end = (end); \
isize SUBS_len = (slice).len; \
if (SUBS_begin < 0 || \
SUBS_begin > SUBS_end || \
SUBS_end > SUBS_len \
) { \
Panicf( \
"SUBSLICE(%s, %zd, %zd): " \
"bad index (slice len = %zd)", \
#slice, SUBS_begin, SUBS_end, SUBS_len \
); \
} \
(SUBS_end) - (SUBS_begin); \
}) \
})
This has similar trade-offs as IDX, it's best to not provide any complex arguments to this macro. SUBSLICE written in a way that it works with both strings and slices, for slices capacity is initialized with zero, which will trigger asserts in APPEND and APPEND_LOCAL.
These few macros make strings and slices in C more convenient to use while keeping control explicit. They don’t hide allocation or ownership rules, but they keep code safe and concise. In the next parts, I plan to cover arenas, interfaces, and more.