Hiutale

Revisiting booting ( 2026-01-25 )

I have been struggling with the linker. I sort of had a working hack, but in some upgrade it broke, and i couldn't fix it. I lack motivation for the deep c dive. So i will pivot and try another way. And simplify the memory management at the same time.

The previous way

When dealing with assembler, dealing with code at the instruction level, i found quite quickly that one needs tools at the instruction level (yes, in hindsight that seems ovious). So i wrote an interpreter to debug programs before creating binaries (and using the linker).

Using the linker, which is off course a deeply c influenced tool, gave the benefit of being able to use a debugger. This had several benefits, one could debug a crashed program, it would decode assembler, one could step through instructions and know about calling and functions. At no extra tooling cost, by "just" outputting elf.

This sort of worked, but elf is not simple, so many flags, and then it broke. I tried and tried, but i am no low level genius. So i gave up. And then, much later, i came up with another way that i will descibe below.

I just quicly want to remind about the memory model that came with elf. Most importantly this was completely seperate from the run-time model. Ie functions had to be laid out in the binary at sort of random but known places. Addressing was always going to be static, ie i wasn't going to have the c-linker, try to rewrite addresses of functions while it relocates code. It was not what i would call a clean solution.

Booting in general

In super general terms booting is the process of getting to an executable that would have the vm functionality. Ie for c-ruby that involves compiling a lot of c, but may also involve precompiling some grammar, or transforming a grammar desciption to c.

In a way our compiler is the ruby vm, which let's us program ruby, but has the mental difficulty of having to seperate compile-time and run-time, while in the same language. With some monkey-patching i used this to advantage by reusing the run-time code at compile-time, but that also has the difficulty of blurring the line (maybe you felt that difficulty as having to re-read the last sentence).

So at the highest level, this new idea is not to create an elf, but to create a binary executable. More precisely to use a memory mapped file to create what smlltalk might have called an image, ie a file that includes all objects, and some of those objects will hold binary executable code.

Currently i'm planning to have a tiny boot program (made in c) that boots the file by loading it into memory and jumping to a known place. It seems that during testing that could even be avoided and just jumping to the memory should be enough (if i get an arm laptop to work).

Object graph and memory model

This leaves us with two main problems. The first is to determine the object graph that makes the vm. This has already been solved off course, but only to a degree, and not 100% satisfactorily. The second issue is "flattening" all objects into a file and related layout issues. This was the elf layer, but it now can be done better and inegrated into the runtime.

The object graph

All the objects (including code objects), in ruby called the object space, make up a graph that needs to be saved to file (or flattened).

The main problem with this is determining the needed objects and classes. Ie finding the minimal set of classes and objects needed to create a vm. Spcifically this means classes have to be created with dependency in mind to avoid reeling in non essential classes and objects. especially now in the beginning.

The module for this is called core and should have no external dependencies. It includes strings and hashes, files and everything need for the vm, and does not neccessarily have a one to one relation with ruby core classes.

The second problem, alluded to ealier, is the compiletime-runtime distinction. Especially in oo programming it becomes very cumbersome to even think of the data as seperate from the functionality. So while we would only need the data of objects, creating that data is really the job of the class of an object. But the functionality is "really" only available at runtime. So what to do?

As of writing this dichotomy is solved by monkey-patching, which is neither clean (in terms of seperation), not especially easy to understand. What would be easier to understand is to make the distinction explicit. There are several distinct problems, let's work through them.

Namespacing

The namespacing conflict occurs from needing to create classes that already exist in the vm that is executing, ie a String or Hash. That perceived need in the current implementation comes from trying to achieve a consistent object graph, ie only objects from classes defined in the project. This may be unneccessary, i shall review. It certainly has helped in the interpretation phase, debugging before going to binary, which has been super helpful (debugging binaries is painful)

Object creation

The second problem comes from creating objects of these classes, be they classes existing in the running vm, or classes needed for the vm being created. These should hold data in created memory, or have their data compatible with easy transferring to the vm image.

Objects defined by the vm must manage their memory, unlike eg Mri where this is hanled in a different language (still the implementation language off course, but abtracted out of the ruby realm). Since they can not do this at compiletime, we use a FakeMemory implementation.

Memory Layout

We have chunked all data into power of two multiples of 8 words. The object layout is such that only the first word is fixed, it is a pointer to the type, the rest is up to the object.

This has a little bit special side-effects for dynamically growing objects, and i guess there are two main categories of those. Objects that get more instance variables, but according to Aaron those are rare. And then there are arrays and strings that grow, and those are common. In a c implementation one can throw the underlying c array away, but we don't want that kind of continues redirection. So we have to extend, and change the type. Details of this will be a bit gory, but are yonder as we are still only compiling.

Which leaves us with the method case, where we really are growing the code while creating and currenly have taken a bit too much care not to waste. The method could be simpler for sure.

And old archives