OSHS librarian system details

OSHS librarian system handles programs and libraries much differently than traditional operating systems.

Static versus shared

Most modern operating systems have two kinds of libraries: static and shared (or dynamic). Both these types use different file extensions and different data format. Static libraries are collections of code, which is copied into the program's binary as needed. However from a dynamic library no code is copied but a link to the library is stored into the program instead. The code in the library is then shared by the programs which contain the link to the library.

By contrast OSHS does not have static nor shared libraries but just libraries. These libraries contain enough data to allow the linker to treat them either as static or as shared according to user requests.

Position independent code

This technology is used for shared libraries only. It allows the code in the shared library file to be mmap()ed to different virtual addresses in different processes.

I consider this technology to be somewhat antique. Its origins are in the past, where computers had their virtual address spaces very small. It was not possible to fit all shared libraries into the address space of a single process. Even when we try to leave only the libraries in use in the memory, they still pose volume that is too large for the address space. Therefore the code in the shared library is designed so that it can function properly on any virtual address without any prior modifications. This type of code is called position independent code or PIC for short. The payoff for position independencty is (of course) the performance penalty. The calls between the library and another external code must be done via a "position table" (called Global Offset Table or GOT for short), which contains the exact locations of each piece of code in the library which is addressed from outside or whose address is taken. This table is specific to the process and its virtual address is the same in each process (sometimes even these tables are located at different virtual addresses in different processes and there exists a "global library table", which lists the virtual address of the "position table" for each library in the memory but such solutions are not so common due to much greater performance hit).

On some platforms it is possible to link even non-PIC code into a shared library. But this usually incurs the system to great performance hits during the program loading, greater memory consumption and sometimes even to security risks. This is because the dynamic linker still tries to have the library at linear addresses varying among processes and uses so called text relocations to fix the places which contain absolute offsets inside the code or static data in the library. This increases memory consumption since the final shape of the code is different for different virtual addresses so different memory pages must be used to hold it. Other platforms deliberately refuse to add a really non-PIC code into a shared library.

Modern computers all have virtual address spaces large enough to allow all the libraries installed in the system to fit in and still leave enough room for the program's own code. However the current operating systems adopt the same antique PIC design for the shared libraries as the ones from the ancient ages. The reason is security, because it is somewhat harder for an attacker to exploit the library if the address of the routinne he wants to abuse is not known at the link time of the program being exploited.

OSHS does not follow the PIC design. Libraries in OSHS have "floating code", just like the relocatable object modules. The code in the library is organized in such a way that it is still possible to mmap() the library directly (but this mmap()ing differs substantially from mmap()ing of ordinary file). Once the library is loaded into the memory (or mmap()ed), its virtual address remains fixed until it is unloaded. This causes the system to be slightly easier to crack but also increases the performance of the code by removing the overhead of PIC code. OSHS uses different tactics for beating the system cracking.

Using shared libraries

Each modern operating systems has a sophisticated program loader, which reads the program file, determines which libraries are required for the program, loads these libraries, links all pieces together and finally transfers execution to the entrypoint of the loaded program.

The problem with such sophisticated loaders is that they require sophisticated configuration. The loader must know where it can search for the libraries and how to perform the search. The most usual way to cope with this is to declare a "directory of libraries" and place all libraries in active use there. However to allow users to use their own shared libraries some kind of PATH enviroment variable is consulted when searching for these libraries (for example in GNU/Linux this variable is called LD_LIBRARY_PATH). But even with this the user's freedom is severely reduced. For example using the development versions of the libraries for one application and stable versions of the same libraries for the rest of the system can often cause headache so easily that people refrain from shared libraries when doing development or they hardcode the paths into the binaries making their binaries unusable on systems with different setups.

In OSHS the program loader looks at the program binary and nothing else. The libraries must be loaded into memory before the main program is attempted to be executed. Once the libraries are loaded, they remain there until they are unloaded or the system is rebooted. There are several regions of the memory for libraries. One of them is for libraries installed system-wide and another one is used for libraries built by a particular user. Also there is a region for plugins, which is specific to each process code image but is shared among processes that execute the same code image.

With this design the program loader does not need to care about directories with libraries and similar things related to library file locating and reading. The system library must ensure that all libraries are ready for the program before it tries to execute the program. The execution server provides services for library loading and unloading to help the system library with the dynamically linked libraries.

Naming the files

Actually the OSHS does not distinguish between a program and a library like traditional systems do. Both types of the files share the same extension and are referred to as binary files or just binaries.

OSHS binaries are programs with multiple entry points. One of these entry points is called main entry point and it is where the control is placed when the program is executed. Other entry points are functions that are published for use in other binaries; they are referred to as non-main entry points.

The program portion of a binary is the code at the main entry point and whatever is referenced by it. It is loaded into an area that is specific to a process image executing that program. The library portion of the binary is the code at any of its non-main entry points and whatever code they are referencing. This portion is loaded into the area for shared libraries, which is shared among all processes in the system. If a piece of code is referenced by both, the main entry point and some non-main entry points, it is placed into the library portion of the binary since it has to be accessible for other binaries wanting to use that non-main entry point(s).