C Survival Notes Daniel C. Silverstein, cubes@pipeline.com, February 1, 1999 These notes and code from these notes are available at http://www.CSUA.Berkeley.EDU/~dans/HelpSessions/CSurvival/ Special Thanks are due to Brian R. Gaeke, brg@csua.berkeley.edu, whose notes these are partly based upon or borrowed from. The Holy Books of CS (Or Time to Go Buy Another Book): ----------------------------------------------------- There are a number of books which I lovingly refer to as the Holy Books of CS. These books are almost ubiquitous books that will continually haunt you throughout your CS career, until, of course, you buckle and buy your own copy. Besides, most good computer scientists and/or programmers have a copy of each of each of the Holy Books of CS in their libraries, so everyone else in computer science will laugh at you if you don't. Like most religious texts, the Holy Books of CS are the subject of no small amount of debate, and different sects occasionally recognize different books. By the standards of the Berkeley CS sect, you've probably encountered at least one Holy CS Book already. That's Abelson and Sussman, or SICP. If you committed the cardinal sin of selling SICP after finishing 61A, then the local area CS deities (i.e. the L&S CS Admissions committee) will smite you for your insolence. Anyway, it's time to buy "The C Programming Language", Prentice Hall 2nd ed., by Brian W. Kernighan and Dennis M. Ritchie, the next installment in the Time-Life Holy CS Books collection. If you're not fully satisfied, there's always the Psych Major. This book is commonly known by a number of names, including K&R, K&R C, and, my personal favorite, the C Bible (not to be confused with a different book that is actually titled the C Bible). As with a typical bible, K&R C is, to the average reader, sometimes cryptic and terse. If you're capable of learning C entirely by reading through K&R you should probably come up here and help teach this Help Session. Think of K&R as a reference, not a tutorial. If K&R isn't doing it for you, and you're just not getting C, there are lots of books available that claim to teach you C. I think it's better to save the $30 to $50 a typical C tutorial book will cost you, and, instead, find someone who does get C (no getc pun intended) and ask them to answer your questions when you are stumped. If you feel guilty about learning without spending at least 50 bucks, give your $50 to the kind person you keep pestering with your questions. Those of you who are antisocial can stop by the CSUA office, 343 Soda, and, if someone is in, he or she will probably be able to answer your question or direct you to the appropriate reference. The CSUA won't mind if you want to give it your $50 either. You can also try the HKN/UPE office in 345 Soda, but, in my horribly biased opinion, members of those organizations won't be much help :) Also, the libraries of the C language are pretty well documented, on occasion, painfully so, in the Unix man pages. Typing "man someClibraryfunction" at the Unix command prompt will often yield screenfuls of information about the function, and, possibly, several lines of information that you actually want or need. Program Structure: ----------------- A C program is composed of global declarations and functions. The first function that is called by your program is always "main", and it takes two arguments, argc, which is an integer, and argv, which is an array of strings. You can have as many other functions in your program as you want. Comments in C are anything between a /* and */. Note that you cannot use //-style single line comments as in Java and C++. Here is a simple program (simpleprog.c): /* This is an example of a poor comment */ #include "simpleprog.h" #include int main (int argc, char *argv[]) { foo(); return 0; } void foo (void) { printf("Fooing\n"); } Let's pick this program apart. The first two lines, which begin #include, are called preprocessor directives. There are a number of other preprocessor directives, and I may talk about some of them later. When the compiler sees a #include directive, it replaces the line with the contents of the file specified between the quotes or greater-than and less-than symbols. Whether you use quotes or greater-than and less-than symbols depends on the kind of file you are including. Use greater-than and less-than for C standard libraries. stdio.h is the STandard Input/Output library. printf is defined in stdio.h. Use quotes for other files, usually ones that you've written. I'll explain simpleprog.h in a moment. Next we have int main (int argc, char *argv[]) return type name arguments If you've taken 61B, this should look pretty familiar. It means that "main" is a function that returns an integer, and, as mentioned above, takes an integer named "argc" and an array of strings "argv" of unknown length as arguments. Lastly, we have another function, foo. It takes no arguments, and has no return value. The statement printf("Fooing\n"); displays Fooing on the standard output and advances to the next line (the "\n" is a mnemonic in C for the newline character). I'll talk more about printf and standard output in a little bit. You may be wondering why main is not of type void. The answer is that the return value of main can be used to return an error code. In C, 0 means false, and thus, no error. Any non-zero value is true, and the particular value can be used to represent the nature of the error. Now, why is simpleprog.h included at the beginning? The answer is that, in C, you should declare a function before you call it. simpleprog.h contains the function declarations for the functions defined in simpleprog.c (actually, there's only one, foo). It's a little ridiculous to have a .h file for one function, but larger C programs should definitely be broken up. It makes the code easier to read, and it can help to explain what a particular piece of code does. Here's what simpleprog.h looks like: extern void foo(void); As you can see, a function declaration looks a great deal like the first line of a function. The only major difference is that instead of being followed by statements contained between { }'s, one just end it with a semicolon. Something worth mentioning is that parameter names don't have to match between a function's declaration and its definition. In fact, you can leave the parameter names out of a declaration completely. What does matter is that the number and type of parameters match. The keyword extern is used here to tell the compiler that, although the function is being declared here, it will be defined in another file. extern is also used if your code is in more than one file, and you want to use a variable that appears in one of your other files. The variable is only declared in one of the files, and the others reference it via extern. Data Types: ---------- Some of the intrinsic data types in C are: int an integer (size depends on your compiler, normally 32 bits) char a character (8 bits) i.e. 'a', 'b', 'c',. . . short a small integer (16 bits) long a large integer (32 bits) double a double precision floating point number float a single precision floating point number Declaring variables is pretty straightforward. Once again, this should be old ground for those of you who have taken 61B: int foo; /* an integer named foo */ char x; /* a character named x */ You can also add the "unsigned" qualifier to allow for larger values: long a; /* -2147483648 to 2147483647 */ unsigned long b; /* 0 to 4294967295 */ Furthermore, you can declare arrays of each variable type: int user_id[60]; /* user_id[0], user_id[1],. . . user_id[59] are integers which can be accessed individually */ There's also a data type that means "no value". You saw it above in simpleprog.c. It's called "void", but you can't declare variables to be of type void. You use it to say that a function returns no value and/or takes no arguments. C has strings, but no explicit string type. It stores strings as arrays of characters. C strings have an extra character at the end represented as '\0'. This character is called the string null-terminating character. It has ASCII value 0. Thus, one can say: char foo[] = "mumble"; /* foo: m|u|m|b|l|e|'\0' */ You can initialize variables when you create them using the syntax shown above. In C, initial variable values are unspecified you cannot count on them being initialized to something convenient like 0. Variables very likely contain garbage when you declare them, therefore, you must initialize them. According to the K&R, there are a few situations when the compiler should initialize variables. If you are in a fraternity, and you would like to haze your pledges, give them a copy of K&R and tell them to find these situations. They are, by far, the exception, not the rule. Thus, always initialize your variables before use, i.e. int meaning = 42; If you've taken 61B, you may remember the keyword static from Java. In Java, static variables were the same in all instances of a class. This is useful if you want to keep track of the number of a particular kind of object in existence (increment a static variable whenever an object of that kind is created, decrement when it is destroyed). There are also static methods, which could be called without an existing instance of the class that contains them. Instead of being called in the usual manner, objectName.method(arguments);, they are called as className.method(arguments); static means something slightly different in C. static variables in C maintain their values across function calls. So, if you declare a variable inside a function to be static, and you call the function twice, the variable will have the same value at the beginning of the second call as it did at when the first call ended. Also, static variables are only initialized once (though there's absolutely nothing stopping you from changing the value of a static variable after it's been initialized). Thus, if you have the following function (from static.c): void staticExample(void) { static int timesCalled = 0; printf("staticExample has been called %d times.\n", ++timesCalled); } and you call staticExample twice, it will print out staticExample has been called 1 times. staticExample has been called 2 times. Note the use of the ++ operator. By placing it before timesCalled, timesCalled is incremented and then used (in this case, displayed). Had it been written timesCalled++, timesCalled would have been used then incremented. The -- operator operates in much the same way. If you use static outside of a function, it has the effect of hiding the variable that you use it on from anything outside of the file it is located in. Functions can be declared static as well with similar effect. Converting Between Types: ------------------------ The various C libraries feature a few functions to convert between types: i = atoi(x) converts x, which is a string, (that is, a character pointer pointing to the first character of the string, but more on that later) to an integer and returns this integer value. sprintf(x, "%d", i) is shorthand for converting i, an integer, into a string, x. The output goes into a string, x, according to the format which is the second argumet. Declaring Your Own Types: ------------------------ C is not object oriented, and consequently, there are no classes in C. You can, however, aggregate a bunch of normal intrinsic variables into one single data structure using what's called a "struct": struct user { char name[80]; int age; long id; }; Note the semicolon following the closing }. You can now declare variables of type "struct user", just as you can declare variables of type "int", and you can access its fields (like instance variables in a Java/C++ class) via the canonical "dot" syntax, like so: struct user bob; sprintf(bob.name, "Bob"); bob.age = 20; bob.id = 3141759 C also has a feature called typedef for creating new data type names. You might find this feature useful some time in the near future if you grow tired of all of the bureaucracy and pressure associated with the CS major. (Those of you who are in EECS probably won't be able to identify with this sentiment, but bear with me nonetheless). Anyway, suppose you decide that, instead of pursuing CS, you are going to pursue the much more enjoyable and far more easily attainable pastime of drinking. Alas, you find that the lifestyle of a wino doesn't really suit you. There's just too much of a gap between the lifestyle you were expecting to have as a programmer. Besides, you're cold all the time because the Daily Cal is so bad it isn't even fit for bedding. Being a resourceful type, you notice that a great many CS wannabes have decided to follow you in the pursuit of drink, and this presents a wonderful start-up opportunity. Now, you don't want to run some ghetto-style liquor store where you'll have to sell to underage Stanford students. No, you want to be the source; you want to be at the top of the liquor industry food chain. But, in your former life, you were into CS, not chemistry. Here's how you can use your knowledge of C to make alcohol: typedef int Liquor; typedef float Wine; typedef char Beer; /* Note that there is nothing to stop you from typedeffing more than one thing to any one existing data type, I just haven't done so. Now it is possible to write functions with return types Liquor, Wine, and Beer. You can also use these types for things like declarations and casts */ Liquor distill(void) { . . . } Wine ferment(void) { . . . } Beer brew(void) { . . . } Admittedly, this example is highly contrived, and I wouldn't recommend doing something like it in practice. In fact, unless you are writing for the obfuscated C contest, I recommend against creating additional names for the standard C types. Although a situations do exist where doing so serves a useful purpose, things like the above tend to make code more confusing and harder to understand. A more common and acceptable use of typedef is the following hack which allows you to avoid typin "struct structType structvariableName" when creating structs, and, instead, simply type "structType structvariableName" (from hashtable.h): typedef struct HashTable { unsigned int (*hashFunction) (void *); int (*equalFunction) (void *, void *); struct HashBucket **data; int size; } HashTable; Then, in "philspel.c", a HashTable variable created simply as: HashTable *dictionary; Output Methods: -------------- I/O is something which is similar enough from one language to the next to make picking up the little minutiae that separate the various languages a royal pain. Output is arguably simpler in C, so I'll cover it first. Examples from this section are in "output.c". Output, which is normally done with System.out in Java and cout in C++, is accomplished in the C language with the C library function printf(). You've already seen it used in some of the examples above. The first argument is a "format", and then come the (optional extra arguments, which are translated to strings automatically. As was also mentioned above, you need to include stdio.h to use the printf function. So, if you wanted to print a string, use: printf("This is a test.\n"); which will output This is a test. <-- Note that that, in addition to the visible text, there was a newline (the \n), which is indicated by the arrow. Also, printf can print the contents of variables. If, for example, you have a string variable like: char t[] = "fun"; then you can write printf("This is %s.\n", t); which will output This is fun. The %s means to insert the next string variable, in this case, t, into the output where the %s appears in the string. This is called a "format specifier", and it is perhaps best explained with a few more examples. If you have the following declarations: char apples[] = "apples"; char oranges[] = "oranges"; char pears[] = "pears"; then the line printf("I have %s, %s, and %s.\n", apples, oranges, pears); gives I have apples, oranges, and pears. but printf("I have %s, %s, and %s.\n", oranges, pears, apples); gives I have oranges, pears, and apples. You can use other kinds of arguments with printf as well: %d integer %ld long %f double %u unsigned %ul unsigned long %c char You can change the format specifier to change how a value is output too: %x integer as hexadecimal %lx long integer as hexadecimal %o integer as octal There are lots of options to printf, especially for determining how wide things are in their printed representations. If you don't want to bother with printf, you can say putchar('a'); to print a single character or puts("stuff") to print the string "stuff" (the quotes will not be printed). There are many variations on printf, such as sprintf, which was mentioned above. printf always writes to stdout (standard output). You can't change stdout inside a C program. The operating system passes it in. Normally, stdout is the screen. On Unix systems, you can use the > symbol, which is known as a file redirection operator, to change stdout as follows: someprog > outputFile This will cause everything sent to stdout to be written to outputFile. Note that it will not be appear on the sceen. You can name outputFile anything you like. If it doesn't exist, the system will create it for you. There is another variant on printf called fprintf, which operates exactly like printf, except for the fact that it takes a FILE pointer (more in pointers later) as it's first argument. When writing philspel, you can use fprintf to write messages on stderr by saying something like fprintf(stderr, "Danger. Danger. Danger will Robinson.\n"); Input Methods: ------------- C features a number of ways to acquire input. As with printf, the input functions Examples from this section are in "input.c" Use fgets to read into a string buffer: char user_name[80]; printf("What is your name? "); fgets(user_name, 80, stdin); This will allow 80 characters to be read from the standard input to the buffer named user_name as a string. Note that fgets requires you to specify what file to read from. You can also use scanf, the analog to printf, to insert one or more user inputs into non-string variables. For example: int x; printf("Enter a number between 1 and 10: "); scanf("%d", &x); Will place the next number the user types into x (the & sign passes the memory address of x to scanf, which is necessary because C does not have references. I'll say more about the & operator when I talk about pointers). scanf does no error checking. If you expect to be bored for roughly five seconds in the near future, you can whittle away the tedious seconds by breaking scanf with unexpected input. When writing philspel, you may also find the functions getc and getchar to be useful. getc takes a FILE pointer and returns the next character from that file. If it encounters an error, or hits the end of the file, it returns the end of file character, (usually ctrl-d if you are typing at the command prompt) EOF. getchar does basically the same thing, but it does not take any arguments because it only reads from stdin. Thus, writing getc(stdin) is equivalent getchar(). As with stdout, it is possible to redirect stdin on Unix systems. Use the < symbol to do so. The easy way to remember which symbol redirects stdin and which redirects stdout is to think of the < and > symbols as arrowheads. When you run a program using the redirection operators, the < symbol will point toward, or into the program, and the > symbol will point away, or out of the program. Pointers: -------- Hypothetically speaking, everyone here who has taken 61A should understand what a pointer is. The concept of a pointer is relatively simple, but many people seem to have trouble with C's pointer syntax. If you find yourself confused, try to keep the following analogy in mind. Think of memory as a warehouse containing rows and rows of shelves with identical looking boxes on them, like the one in that X-Files episode where Mulder finds the cure for Scully's cancer in a hidden warehouse below some DOD building. I must take this moment to point out that fox.com is quite possibly the ugliest commercial web site I have ever seen (it actually rivals some of the ugliest personal web sites I've ever seen). thex-files.com is, thankfully, much more tastefully designed, and deserves high marks for aesthetics. Anyway, think of a typical variable as some random box on some random shelf. If you happen to be standing in front of the box, then you can open it and see what's inside. Think of a pointer as a big arrow. When first created, a pointer just points to some random box. One can set a pointer to be the location of a particular box, so that you have a big arrow pointing to a box of your choosing. Once this is done, you can follow the arrow to the box, and, thus, find out what's in the box. In C, a pointer is declared as follows: int *ptr; Thus, a pointer is declared by specifying the type, and then writing the pointer's name preceded by a *. The statement above can also be written int* ptr; The compiler won't complain, but I recommend doing it the first way because it emphasizes the fact that "ptr" is a pointer. It is possible to declare pointers to pointers, i.e., int **ptrtoptr, in fact, you can pile the *'s in front of a pointer's name when declaring, and each one will give you another level of pointing (pointers to pointers to pointers to pointers, and such). Also, unlike regular variables, you can declare a pointer of type void, and what this means is that the pointer is a generic pointer that may point to any type of data. When you dereference or perform pointer arithmetic (I'll explain both terms momentarily) on a void type pointer you must cast it to the appropriate type. There are cases when you don't have to cast a void pointer, but you should always do it because it is good style and less error prone. So we have a pointer of type integer with the highly original name, ptr, with which we can do the following: int someBox; ptr = &someBox; /* ptr points to someBox, we could have * done this when we declared ptr */ someBox = 3; printf("The value is %d.\n", *ptr); which gives The value is 3. The & symbol means "the address of" or "the location of". The * symbol preceding a pointer anywhere other than where it is declared is the dereferencing operator. When you write ptr, you're talking about the arrow pointing to a box. When you write *ptr, you're talking about the contents of the box that the arrow points to (in this case, the arrow points to someBox). If you have a pointer to a pointer, you must use a * for each level of dereferencing that you want. Used properly, pointers are really useful, but they can also be the source of some very perverse code (sometimes intentionally so) and some very nasty bugs. Note that, were you to write *ptr = &someBox prior to writing ptr = &someBox, you would do be doing something horrible because you would be overwriting some random location in memory since pointers, like other variables in C, are not guaranteed to be initialized upon creation. The only time when you can safely say *ptr = &someBox before ptr has been initialized is when you first declare a pointer. Thus, it would have been acceptable to write int *ptr = &someBox to create ptr and initialize it to &someBox. Also, were you to write ptr = someBox, you would be doing something perverse. The only time when it would be acceptable to write ptr = someBox would be if someBox was set to the value of some memory address you would like to get at. C functions pass arguments by value. This means that when you call a function, it works with copies of whatever you pass in. Changing these copies won't change the things you passed in. Furthermore, C functions can only return one value. This raises the question, how do you write a C function that can change the value of more than one variable? The answer lies in pointers: int a = 1, b = 1, *alpha = &a, *beta = &b; void pointless(int a, int b) { a = 0; b = 0; } void poignant(int *a, int *b) { *a = 0; *b = 0; } pointless(a, b); /* After calling pointless, a and b are unchanged */ poignant(alpha, beta); /* After calling poignant, both a and b have been changed */ You can declare linked lists of integers as follows: struct node { struct node *next; long data; }; Then you can allocate a new pointer to a "struct node" element using struct node *head; head = (struct node *) malloc(sizeof(struct node)); The "malloc" function allocates a new chunk of memory; its argument is a certain number of bytes. The number of bytes you want, in this case, is the amount of memory required to represent a "struct node" structure, which is given by the C built-in operator "sizeof". That basically does what you need, but there's still a slight problem. C assumes that malloc returns a pointer to char, or "char *". You need to tell the compiler that you are going to use this space to hold "struct node" variables, not characters, so you have to perform a cast, (struct node *), so that the returned storage space is treated correctly If the system can't get any more memory for you, the return value from malloc will be NULL. The NULL pointer is equal to zero, and it's a good thing to initialize pointer variables to as it will prevent you from doing some of the nasty things that you can unintentionally do with pointers. The same idea works for arrays of size you don't know until run-time: int *buffer = NULL, bufferSize = 0, index = 0; /* Somewhere during the course of the program, bufferSize gets set */ buffer = (int *) malloc(bufferSize * sizeof(int)); This newly-allocated space can then be accessed just like an array having bufferSize number of elements, numbered 0 to bufferSize - 1. Be careful not to write off the end of the array, because C won't tell you that you are doing it, and you are likely to crash your program with a "segmentation fault" error. Use the "free" function to deallocate the space when you're done using it. You should always pass the location returned originally by malloc to free. Consequently, don't change the value of the pointer that you set = to a malloc call's return value (in this case, buffer). If you want to perform pointer arithmetic (I'll talk about that in a moment), make a copy of the original pointer, and mess with the copy. /* Store some values in some of the space we've allocated */ for (index = 0; index < 10; index++) { buffer[index] = index; } /* Display the values we just stored */ for (index = 0; index < 10; index++) { printf("%d ", buffer[index]); } printf("\n"); free(buffer); Note that array style notation is being used here. This highlights the relationship between pointers and arrays. In fact, the name of an array is actually a pointer to the first element (i.e. element[0]) in the array. You can basically use pointer and array notation interchangeably. In general, stick to array notations for arrays (using pointer notation for an array can make your code harder to read), but use whatever is most convenient for the particular situation for pointers. Also, we don't come anywhere near the last element of the array which would be buffer[499]. If you wrote something into buffer[500], the compiler would not complain. You would just spend hours debugging after your program core dumps. Also, notice that using array notation does not affect the value of the pointer, buffer. Furthermore, it's not safe to access space after you've freed it, but there's nothing wrong with malloc'ing it again later. It's a good idea to set your pointers to NULL after freeing them. That way, you can be certain you are never using space that you've deallocated. Next, I'd like to talk about pointer arithmetic. Adding 1 to a pointer makes it point to the next element of space. C saves us from the trouble of worrying about how big the individual types are. So you can say: char foo[] = "mumble", *fooptr = foo; fooptr = fooptr + 1; /* fooptr points to foo[1] */ fooptr = fooptr ++; /* fooptr points to foo[2] */ fooptr = fooptr --; /* fooptr points to foo[1] */ Finally, I'd like to talk about FILE pointers. This was added after the Help Session took place, so if you got a copy during the Help Session, this part was not included. The FILE pointer type is defined in stdio.h, and, basically works like a normal pointer. The only thing unusual about FILE pointers is that you call the C library function, fopen, to assign them, and you must call another C library function, fclose, when you are finished working with the file you opened. For example: FILE *dictFile; if ((dictFile = fopen(name, "r")) != NULL) { /* perform operations on the file */ fclose(dictFile); dictFile = NULL; } fopen takes two arguments, a string containing the file name, and a second string which describes the mode you would like to open the file in. Here, the "r" means read mode. K&R describes the various possible file modes. One should test that the fopen call returns a non-NULL value because a return value of NULL indicates that the attempt to open the file failed. Once the file is open, we can use it as an argument to things like getc, fscanf, and fprintf, basically anywhere you can say stdin, stdout, or stderr. Once you are finished, close the file with fclose, and set the FILE pointer to NULL as seen above. Also, although you normally set a FILE pointer to the return value of fopen, there's nothing to stop you from assiging it as you would a normal poiner: FILE *redundant = stdin; Assert Statements: ----------------- It's possible to do simple checking of your code using "assert". Basically, if you assert something in your code, and it's not true, your program will exit with an error message of exactly where the assert failed. assert is really cool. Used properly, it can save you a great deal of debugging time and pain. You must include the C library file assert.h to use assert. For example (from assertion.c): int c = 2; assert(c == 1); will cause your program to fail. A common use of this is checking that malloc was able to get the memory you asked for, and did not return a NULL pointer.