The first thing Jim tries is that he writes a simple, one-filed Hello, world such as this hello.c
file.
Since Jim has somewhat old machine (486DX2, 66 MHz, 28 MB RAM), he is still using the old and obsolete "operating system" called MS-DOS. He used it for years and does not feel comfortable with Windows that demands 512 MHz CPU and 256 MB RAM. Especially when everyone around says that upgrade to Windows is a waste of money and time.
He starts his programming carieer with a very simple program:
File hello.c
:
#include <stdio.h> int main(void) { printf("Hello, world \n"); return(0); }
But his first attempt to run the program fails:
C:\PROGS> hello Bad command or filename C:\PROGS> hello.c Bad command or filename C:\PROGS> _
Maybe the system is unable to find the file. Let's investigate it a bit.
C:\PROGS> dir Volume in drive C has no label Volume serial number is 1234-5678 Directory of C:\PROGS . <DIR> 01-04-04 11:01a .. <DIR> 01-04-04 11:01a HELLO C 89 01-04-04 11:01a 1 file(s) 89 bytes 125533456 bytes free C:\PROGS> _
So our file is there but why it does not run. We are in MS-DOS and MS-DOS always tries to execute files in the current directory and the hello.c
is in the current directory so why the system pretends that it cannot be found ???
Well, the problem is that computers don't understand the C source code. The C is the programmer's language and programmers are humans (usually). Computers can execute only something that is called "machine code" and that is completely different from languages used by a human programmers. The machine code is basically a sequence of numbers. As an example here is a small machine code program for a 486DX2 machine running MS-DOS that prints the "Hello world !" message (to enter it you need a "HEX editor", a program that opens a file and displays it as a bunch of two-digit hexadecimal numbers that you can edit):
BA 08 01 B4 09 CD 21 C3 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 20 21 0D 0A 24
So our hello.c
file is overlooked by the system like broad acres. Our program has to be translated from the programmer's native language to the machine's native language in order to be able to run it. This process is called compilation and C language files are usually compiled by a program called CC
(C Compiler). Luckily we have one installed:
C:\PROGS> cc hello.c C:\PROGS> hello Bad command or filename C:\PROGS> _
Still wrong. Is the problem that the compiler refused to compile our program and forgot to tell us about the reason or the problem is somewhere else?
C:\PROGS> dir Volume in drive C has no label Volume serial number is 1234-5678 Directory of C:\PROGS . <DIR> 01-04-04 11:01a .. <DIR> 01-04-04 11:01a HELLO C 89 01-04-04 11:01a HELLO OBJ 1426 01-04-04 11:31a 2 file(s) 1515 bytes 125514323 bytes free C:\PROGS> _
So the compiler produced something but the system thinks that the result is not executable. In MS-DOS only *.EXE
and *.COM
are treated as executables, attempts to execute all others are refused. So a stupid attempt to repair the problem:
C:\PROGS> rename hello.obj hello.exe C:\PROGS> hello _
Ehm, our machine not only didn't showed us the longed-for "Hello, world !" but also no prompt was displayed so we have no place to put further commands! The only cure of this situation os the big red button (RESET). Now Jim is somewhat more clever: after renaming .OBJ
to .EXE
and subsequent attempt to execute the "executable" the machine will ignore everything other than the big red button :).
So when it isn't possible to run the stupid OBJ directly it will be better to rename it back to *.obj
and investigate other ways what to do with it. After hours spent reading manuals Jim realizes that after compilation a linking stage has to be done to produce a working executable.
C:\PROGS> rename hello.exe hello.obj C:\PROGS> link hello.obj C:\PROGS> dir Volume in drive C has no label Volume serial number is 1234-5678 Directory of C:\PROGS . <DIR> 01-04-04 11:01a .. <DIR> 01-04-04 11:01a HELLO C 89 01-04-04 11:01a HELLO OBJ 1426 01-04-04 11:31a HELLO EXE 6423 01-04-04 12:04a 2 file(s) 7938 bytes 125495434 bytes free C:\PROGS> _
Wow, it seems we made it. Finally there is an EXE
file available. But does it just work ? Since our last attempt resulted in crash, we are not so trustworthly this time. But the file is an EXE
, so we hope it will work:
C:\PROGS> hello Hello, world C:\PROGS> _
BINGO !!!
After some time Jim listened all the acclaims about the GNU/Linux so he decided to try it out. After successfull installation (with the help of John, the seasoned hacker and his frend) he was surprised that this piece of software runs solidly even on his ancient crap. Ho now he is about to find out how to write Hello, world
on GNU/Linux. John told him that in GNU/Linux the compiler is called gcc
and programs don't need an extra linking steps.
$ gcc hello.c $ hello bash: hello: command not found $ _
And what went wrong now ???
$ ls a.out* hello.c $ _
Now we have only two files in the directory. We surely know that hello.c
is not executable so there is only one file to try out left (1):
$ a.out Hello, world ! $ _
Wow, it works. But why it is called so strangely? Let's repair it.
$ mv a.out hello $ hello Hello, world ! $ _
Later Jim realized that it is possible to tell gcc
how it should name the result:
$ gcc -o hello hello.c $ ls hello* hello.c $ hello Hello, world ! $ _
These two sessions show the first point of confusion: the need for an explicit link step. Almost every C compiler does only the compilation and produces something called relocatable object file, which must be processed by another program called linker to produce the actual executable. Even the GCC does that, however when user requests the compilation of one single file, it will call the linker for him. But even GCC is not flawless, because instead of naming the resulting binary after the source code file (a common case) it gives it that strange a.out
name (an odd case coming from the ancient history of computers).
To be honest, the need for a link step after the compilation is in fact not a design flaw, even if it looks just like waste of time spent by entering one additional command now. The advantages of this compiler design aspect (having a separate linking step) will show up later.
A normal expectation is that when I write a single-file program and tell to compile it, the compuler will compile it, link it and name the result after the source code file it was compiled from.
However these flaws are not so problematic. Normally when I compile the same file 50 times, I get tired from constntly telling the compiler that I want to compiled the file and I write a shell script that calls the compiler and tells it about the file for me. The flaws above can be bypassed by simply placing all the commands needed to produce the binary into that shell script and later invoke the shell script only. By giving that shell script a short name, the recompilation of the program can be only two keystrokes away from the command prompt.
If these flaws would the only ones in the C programming language design, I would simply have my compiler to default the output name to the name of the source code and otherwise behave just as GCC and don't bother with designing my own programming language. But there are more tedious flaws that are not so simply bypassable.