The hidden threat for freedom

Programming language is a mean to express a computational process in a form that computers can understand. Nothing less but also nothing more. This translation of a computational process is called program and the files containing the translation of the computational process are called program source code.

However computers cannot execute programs directly. Before the program can be executed it must be translated into the particular computer's "native language" (called machine code) first. The compilation process produces binary executable file (or binary for short) which finally can be executed. The binaries are machine and OS dependent and cannot be easily transferred to a machine with different CPU (i.e. speaking different "native language") or with a different OS installed.

This concept of source code and binary can be extended to include other data used by the software. For example a game may have its graphics stored in the PCX or PNG file formats to allow easy editing with standard image editing tools but the game itself may be designed to use a custom format which allows the game to load and use the graphics but which does not contain information that is useful only when editing the graphics. Then the source code are the PCX/PNG files and the binary is the file in the custom format used by the game's binary.

However even when we have a binary for our program that can be directly executed by the operating system, we are not done with it. We need to integrate the program into the rest of the operating system to allow users to reach the program quickly and simply everytime they need it.

The programming language specifications does not specify how to build a program from its source code or how to integrate it into the operating system; this is outside of their scope. It is up to developer to choose the means to push the program's binaries to their right places. The proprietary software developers have this simple: they create another program that does all the installation job and distribute all these things as a whole so users are not bothered with installation process details. This schema is also adopted by some binary distributions of the free software.

But free software is about freedom and freedom requires the access to the source code. And the source code of the software is where the hell begins. There is no standard way how to describe, which binaries are in the distribution and where they belong so each developer has to invent this for himself or herself when he/she needs it. Moreover, different operating systems use different schemas for locations for various binary files (programs, libraries, data files, etc) so the developer has to let his application's source know which operating system it has to be installed on and how to integrate with it.

Also, the free software introduced optional features, a concept not widely known in the proprietary software world. Optional features allow users of the system to leave out code to support features they don't want and/or don't need to save space in memory and disk and sometimes also to save execution time or other resources. This was never considered possible before and the programming languages were not prepared for that so kludges were introduced to allow it (for example in C the C preprocessor is abused to allow pieces of code implementing unwanted features being able to be left out of the resulting binary). Suddenly invoking the build system over the source is not sufficient, one also needs to decide which features to include in the result and what to left off so I called this entire process of preparing the package for use software factoring.

The result is that almost each source code that we can meet on the Internet today contains somewhat large program that has one simple purpose: compile the code. Its size is approximately proportional to the size and complexity of the program it is designed to compile. If you ever tried to compile a free software package, you most likely met it in person: it is the program named configure, which scans the build enviroments and writes the build instructions to be used by the rest of the build process. This compilation program with its nontrivial logic that may seem like artifical intelligence to someone is a great thing and I honor anyone that is able to develop and/or maintain such a thing in his software. Even more I appreciate the persons who managed to write a piece of free software which is able to generate the clumsy configure scripts from nicely formatted source files. The real problem is the deep integration of this compilation program with the rest of the source code and this problem is not easily visible by software developers (especially for the seasoned ones) but may be the same grave threat to software freedom as software patents (if not bigger).

What's the matter? Let's demonstrate this thing on an example. When Richard Stallman as seasoned software developer looks at source code of GCC for example, he will notice nothing wrong with it. It is almost no problem for him to "enter" into the source code and start fixing things or adding new things. But when I as a newbie look at the same source code of GCC, I will be scared. The thing that scares me is the plethora of almost omnipresent preprocessor directives and the compilation scripts written partly in a language that I never seen before and partly in a language that albeit I know, is so messed up that I'm unable to learn anything from it. So I don't know, how the software is build up from the source, which source files belong together and where I find the beginning of the particular program in the software distribution, namely the function main(). And I cannot ask the computer for help (even if I'm a programmer myself), because it is impossible to get the informations about code layout from the source code automatically.

It is possible that this is much worse threat to the software freedom than the software patents or other things we are fighting against today. I think this because of these facts:

- It is not easily visible
For developers, especially the experienced ones it is really hard to see that there is something wrong with the understandability of their code. This is because they already have their "mental map" of the software so they know where the things are, how they work and how they act together. But for a "stranger" it may be nearly impossible to build such a "mental map" for himself to allow him to understand, how the software works.
- It devalues the freedom
Only less than 1% of the users of the computers have their minds and motivation strong enough to successfully build a "mental map" of a larger software package such as GCC. Since this "mental map" is a precondition of the ability to study and change the software, this means that more than 99% of users are unable to exercise the right to study and change the software. They need to delegate this work to someone skillfull enough to be able to solve the problem and if they don't find someone, they are tempted to switch to proprietary software.
- It overloads the developers
The need to cope with the complexity of the software build process puts additional burden to the developers taking away their development power from the useful work. This means less useful work is done in the software development which leads to longer times needed for things to be done.
- It threatens software projects at early stages
Software projects which don't have much users from the <1% pool of skillfull peoples are threatened. If such a project is abandoned by the original developer, it is highly probable that it dies and the thoughts behind it will be lost forever.

As you might see, the situation is serious. This hole needs to be filled before the enemy utilizes it and kicks the free software out of business. And this is what OSFIL wants to do. OSFIL provides a standard way to describe the software factoring and the software integration needs so developers don't need to invent their own kludges that too often turn to be too incompatible.