PDR: C Tutorial for C++ programmers

Go up to the Tutorials table of contents page

Originally by Jeremy W. Sheaffer

This document assumes that you have a strong working knowledge of the most basic and important aspects of C++. It is designed to introduce you to important concepts in C - most of them also available in C++, but rarely used there - with which you likely have little familiarity.

Hello World, Hello World, Hello World!

A basic C program looks like the following:

// hello world x3
#include <stdio.h>
int main() {
    for (int i = 0; i < 3; i++) {
        printf("hello world!\n");
    }
    return 0;
}

A few things to note about this program: - The <stdio.h> was #included (stdio stands for Standard I/O library), which is where printf() (and, later, scanf()) live. These are the basic input and output routines in C, analogous to cout and cin in C++. More on these functions are below - There are no namespaces in C

To use malloc(), which is the C version of new, you will need to include the <stdlib.h> file (stdlib is the standard library) - more on malloc() is also below.

Input and Output

C does not have iostreams or stream operators, such as << (for cout) and >> (for cin). For most I/O, we use the printf() family of functions (for output) and the scanf() family (for input).

int printf(const char *format, …)

printf() takes a format string, containing verbatim text that you want to display and conversion specifiers which describe to printf() how to interpret and display the remaining arguments. The conversion specifiers may contain flags, which control such things as field width, precision, and format. All conversion specifiers begin with a %. To print an actual %, your format string must contain %%. Other special characters will have to be escaped with a backslash (i.e., to print a newline, we enter \n; for a backslash, we enter \\).

For example, to print an integer, we would enter:

int x = 3;
printf("this is an int: %d\n", x);

The %d part tells printf to format the appropriate parameter (x, in this case) as an integer, and insert it at that spot in the string.

Commonly used conversion specifiers are:

Specifier	Meaning
d	converts an int
f	converts a float
c	converts a char
s	converts a string
p	converts a pointer

and flags:

Flag	Meaning
`l` (a lower-case ‘L’)	Instead of an int or float, convert a long int or double
`ll` (two lower-case ’L’s)	…, convert a long long int or long double
`0` (zero)	Zero pad the conversion to fill the field width
`-`	Left justify the field
’ ’ (a space)	Leave a blank space where an omitted ‘+’ would go
`+`	Display a ‘+’ for positive numbers
Non-zero integer	Minimum field width
`.` followed by a non-zero integer	Number of digits of precision

Some examples:

printf("pi = %.0f\n", pi);         /* output: pi = 3    */
printf("pi = '%+10.4f'\n", pi);    /* pi = '   +3.1416' */
printf("Hello from %s, which begins at %p!\n",  __PRETTY_FUNCTION__, main);  /* Hello from main, which begins at 0x4004f4! */

This last one uses the __PRETTY_FUNCTION__ macro, which is a string representation of the function name. And the ‘main’ in there is printing out the hexadecimal address of where the main() function begins in the x86 code section.

Variable argument lists

We’ve seen variable argument lists in the context of the C calling convention (that’s why the parameters are pushed onto the stack in reverse order). The ... portion of the printf() function signature signifies that it take a variable number of arguments. I/O functions in C commonly combine this notation with a format string. Any use of variable argument lists must include clues so that the callee can correctly identify the number and type of its arguments. An interesting example of a function which uses variable argument lists in a less obvious way is open():

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);

but C does not support function overloading! While the open() manual page displays the two prototypes above, the actual signature of open() is:

int open(const char *pathname, int flags, ...);

The variable argument list (mode) is only used if flags includes the bit O_CREAT, giving open() the information it needs to decide whether or not to access arguments beyond flags.

int scanf(const char *format, …)

scanf() converts input, rather than output. Its format strings look very similar to those of printf(), with a few more complex conversions, which we will not cover in this tutorial as they are not often used. The key point to remember to differentiate usage of scanf() and printf() is that while printf() converts values (as in pass-by-value), scanf() converts input and thus needs a place to store it (pass-by-address, which is really passing a pointer by value; C does not have pass-by-reference, only C++ does).

scanf() will precisely match all non-whitespace characters in the format string. Whitespace is handled specially, in that each individual instance of whitespace in the format string is treated like a single space, and a single space in the format string matches one or more whitespace characters in the input. Conversion usually stops at the first whitespace character. All matched input is discarded, save that which is converted.

Be careful of buffer overflow! scanf() doesn’t know how large your buffer is. If you don’t know either, don’t use scanf(). We prefer to use fgets() followed by sscanf() for this reason.

Some examples:

int age;
char grade;
char school[3];

printf("AGE: ");
scanf("%d", &age);    /* Converts input to an integer and stores it in age */

printf("GRADE: ");
scanf("%c", &grade);  /* Converts input to a letter grade (probably 'A') and stores it in grade */

printf("SCHOOL: ");
scanf("%s", school);  /* Converts input to a string and stores it in school */

The third example above almost certainly overflows the buffer. scanf() will copy input into school until it sees the next whitespace character. If the input is “The University of Virginia”, it will save “The”, and overflow the buffer by one byte (due to the fact that all C-style strings have a zero byte that terminates the string). If the input is “UVA”, it will save “UVA”, but still overflow the buffer. Using the field width flag %2s can solve this buffer overflow problem, but then we will only save “Th” or “UV”. In order to convert whitespace, you must use the more complex conversion. It’s more common to use fgets() for this type of input.

Dynamic memory management

In C++ you allocate storage from the heap with new, and you return it to the heap with delete. new and delete are operators, which can even be overloaded, and which, by default, automatically call constructors or destructors for you.

Heap control in C is a bit lower level, and is done primarily through two functions: malloc() and free().

malloc()

Allocation from the heap is achieved through a call to malloc(), which returns a pointer to the newly allocated space on success or NULL on failure (i.e., if you are out of memory). It is important to check the return value of malloc() before attempting to access the returned storage. The storage allocated by malloc() is not initialized in any way! You will need to explicitly run initialization subroutines on data structures that you allocate in C.

malloc() has the prototype:

void* malloc(size_t size)

You must explicitly tell it how much storage you require. The sizeof operator is useful here.

free()

free() is used to deallocate storage originally allocated with malloc(), calloc(), or realloc(). free() in C is analgous to delete in C++. Its prototype is:

void free(void* ptr)

It is an error to call free():

on any address that was not returned by one of the malloc() family functions above
on a NULL pointer
twice on the same address

Any one of these errors will corrupt your heap. This corruption will manifest in unusual ways which will be very difficult to debug and which will not have any obvious relationship with the root cause (if you are lucky, it will cause a segmentation fault).

The easiest way to debug a memory error is not to make it in the first place. With care, this is easier than it sounds. Firstly, know when you need dynamic allocation; don’t use it if you don’t have to. Secondly, as you would do in C++, write constructors and destructors for all of your data structures, and be consistent about using them. These functions should handle your heap control and error checking explicitly, so that they are implicit in the code that uses the storage.

If you do manage to develop memory errors, AddressSanitizer and LeakSanitizer work just as they did in C++.

Examples

#include <stdio.h>
#include <stdlib.h>

int main() {
    // dynamically allocate an array of ints
    int* p = (int*) malloc(sizeof(int) * 5);
    if (p == NULL) {
        // memory allocation failed; handle the error somehow
        return 1;
    }

    // Initialize p[1] to 10. Everything else is still uninitialized.
    // Trying to access any other index without first initializing it is undefined behavior!
    p[1] = 10;
    printf("%d\n", p[1]);

    // free up that array
    free(p);

    return 0;
}

Derived types

Like C++, C allows programmers to derive types from existing types. There are several kinds of derived types, including structures, unions, and enumerated types.

struct

C does not have classes, but it does have structures. Early versions of C++ were called C with Classes; the C++ “compiler” was little more than a preprocessor, and class syntactical constructs were converted to structures so that the output could be compiled by a C compiler.

A structure definition has the following format:

struct name {
    type1 member1;
    type2 member2;
    /* ... */
    typen membern;
};

In the above, name is optional. If omitted, the structure is said to be anonymous; in general, you probably don’t want that. Within the curly braces, struct name is an incomplete type or forward reference and can be used to build recursive structures. struct name is fully defined after the closing curly brace, and instances can be declared at that point.

The following might be a good definition for a list item data structure. We’ll look at more of the list class further on:

struct list_item {
    struct list_item* prev
    struct list_item* next;
    void* datum;
};

Now whenever you want a variable of type list_item, you declare it using struct list_item – for example, struct list_item my_item. If the extra struct doesn’t look right to you, this is where the typedef keyword can come in handy.

// typedef comes before struct
typedef struct list_item {
    struct list_item* prev;
    struct list_item* next;
    void* datum;
} list_item_t; // what you want to call your struct goes after the closing brace

With the typedef, we can now refer to our struct as simply list_item_t. Therefore, variable declarations become list_item_t my_item. Note that the struct list_item* prev inside of the struct declaration cannot be replaced with list_item_t* prev, as the typedef hasn’t been finished by that point! We don’t know the name of the typedef until after the closing brace.

union

A union is a type for which the compiler allocates space sufficient only for the largest member, not for all members. At any moment in a union instance’s lifetime, only one member is valid. It is the responsibility of the programmer to ensure that the data is accessed correctly. The syntax of a union declaration is identical to that of a struct declaration, save the keyword.

An anonymous union provides useful syntactic sugar as a structure member. Take the following example:

struct example_struct {
    int type;
    union {
        int i;
        float f;
        double d;
    };
};

struct example_struct s;
s.type = 1;

switch (s.type) {
case 0:
    printf("%d\n", s.i);
    break;
case 1:
    printf("%+2.3f\n", s.f);
    break;
case 2:
    scanf("%lf", &s.d);
    break;
}

C with Classes used unions and function pointers to implement polymorphism.

Function pointers

Function pointers are useful in many instances. The most common of those include function tables (arrays of function pointers), implementation of code in which you may not statically know what function will be called (like qsort() and bsearch(); technically all instances in which you should use function pointers fall under this heading), and implementation of object oriented code in C.

Syntactically, a function pointer looks like a function prototype, except that the “function name” (actually the name of the pointer) is wrapped in parenthesis with a pointer star at the beginning. For example:

int (*pprintf)(const char* format, ...)

defines a pointer that can point to printf() (not particularly useful). A function’s name, without parenthesis, evaluates to its address, thus we can assign the address of printf() to pprintf with the statement:

pprintf = printf;

and you can call printf() through the pprintf pointer with:

pprintf("Hello, World!\n");

qsort()

qsort(), short for quick sort, can sort arrays of data of arbitrary type. In implementing qsort(), the subroutine has no knowledge of how to compare the arbitrary elements being sorted, thus qsort() takes a pointer to a comparison function. Its full prototype looks like this:

void qsort(void* base, size_t nmemb, size_t size, int (*compar)(const void*, const void*));

The comparison function takes pointers to two elements of the array starting at base and returns negative, zero, or positive if the first is smaller than, equal to, or larger than the second, respectively. The array starting at base has nmemb elements of size size.

Object-Oriented C

With list_item_t defined as above, consider the following:

struct list {
    list_item_t* head;
    list_item_t* tail;
    unsigned int length;
    int (*compare)(const void* key, const void* with);
    void (*datum_delete)(void*);
};

This is a type definition for an object-oriented doubly-linked list class in C. list_item_t contains a pointer to a void type. In other words, it can point to anything, so if we want to do sorted insert, we need a comparison function, just as in qsort(). Furthermore, if we destroy the list, we probably want the data contained in it to be returned to the heap. If that requires a only shallow delete, free() can be passed to the constructor, otherwise, a destructor for the stored type must be passed. If the list is not to be sorted, or if the data is not to be destroyed with the list, one or both of compare and delete_datum can be NULL.

Note that, for the exercise below, you need not implement object-oriented C code; your code can be pure C.

Exercise

This exercise is to be developed in C, and compiled using clang (NOT clang++!). The exercise is to implement a very simple linked list. Your program should do the following:

Read in an integer, which we’ll call n
Read in n more ints, and put those into a linked list
- The linked list must be dynamically allocated
Print out that linked list (we don’t care about the order!)
Properly deallocate the linked list

That’s it - the point is to have you use many of the features of C discussed here (printf(), scanf(), structs, malloc(), and free()). We aren’t looking for multiple subroutines, a full list class, etc. – a long main() function is just fine. Don’t make this more complicated than necessary!

The program should be in a file called linkedlist.c. A sample execution run might look like the following:

Enter how many values to input: 4
Enter value 1: 2
Enter value 2: 4
Enter value 3: 6
Enter value 4: 8
8
6
4
2