Hiutale

Onion boy projects ( 2026-03-28 )

Work on Som has progressed well, i'll summarize below, but my main focus is to look forward, to apply the lessons learned, and expound on the need for a new language at the c level. And bejond that, feelings about Futamuras projection.

Som-st in go

This time, also to save time, i decided on Transpiling, not compiling. I landed on go, mostly because of the built in gc. Rust was thought about (too complicated), while c++ was, although it could have been a good fit. Maybe it should have been crystal, but that didn't come up in time. Progress was fast, currently basics work, like calling, integers, blocks and some primitives.

I tried transpiling because i noticed in the previous iteration that much of the work looked so very simmilar to to a static language. Ie i was translating ruby into what was basically ifs, calls and switch statements. And now i just translated som into the same, just as go text. It really works very well, but there are clear limitations. These might not matter for SOM so much, but will for ruby.

The main problem is that SOM is just a little too simple. Apart from the unfamiliar block syntax, it really is very similar to c++. Ie in the examples and tests there is basically no metaprogramming, no true dynamic behaviour. This is good for the current approach, as i don't think i could extend the transpiling to run-time. So it proves the static language "theory", that a dynamic language needs to be expressed in a lower, static, form. But also points strongly to the need for a new tool, hence this post.

The natural hieracy of languages (the onion part)

I just want to expand on that last point before going forward. For me languages up to now come in distinctly layers varients. (The numers are just for reference, not a definition)

  • 1 - Binary. Not many program in this anymore, but this is still the lowest level, and the only one for which we actually have physical machines
  • 2 - Assembler, meaning anything that can be directly and unambigously translated into binary. This does not neccessarily have to have an external text representation, but it is a neccessary layer, so we can generate the binary.
  • 3 - Static languages like C, go, rust and soo many others. They translate very directly into assembler. There is no (or hardly any) dynamic behaviou at run-time, no ambiguities.
  • 4 - Dynamic languages like ruby, phython, javascript, smalltalk. No type system (ducks are involved apparently ), dynamic method resolution, closures you can pass around and loading code and changing code at run-time.
  • Up to now, dynamic languages have been interpreted to implement the dynamic nature. The point i want to make is that this is because of the lack in the layer below. Ie, it is perfectly possible to translate a dynamic language into a static one. In fact i claim that it is even neccessary at a mental level to bridge the levels of abstraction. Ie your need to peel the onion one layer at a time. But it has not been done in a real way, ie with two existing languages (like ruby and c, or js and go), because the build system of static languages are not made for it.

    In other words it is not the static language that makes the transpiling impossible, but the existing compilers (and linkers) assumption that after compiling that's that, no more change. There are other reasons i think we need a new languge, but this is that main one. Off course one could use llvm, but that is too much and too little.

Kide (crystal in finnish)

Ok, so we may need a new language to extend the build into runtime. Lets just specify that a bit more and move on tho the other reasons. (Then the name issue)

  • To compile (and link) at runtime, means it has to be self hosted, so we can compile the comiler into the binary.
  • Also we would need to link at run-time, ie resolve references to (rt) exsting objects, not constant, future, (ct) objects.
  • We would want a coherent memory model for all layers.
  • We would want a calling convention that will make sense in the dynamic layer.

Before i go through those in more detail, here are some other reason why i think a language is the way to go

Downward integration

Every higher language needs a way to access functionality in the lower layer that is not expressable anymore in the higher layer. A simple example in c is a system call. This needs to be expressed in assembler, c just doesn't have any constructs to do so. Or in SOM/ruby an integer addition. In an ideal compiler language, this would be easy, ie easy to integrate parameterised code of the layer below. Paramterised is the key here, as just simply switching compiler context is not enough. The act of compiling is very much interweaving the static structure with dynamic values. I think of this a bit like a string interpolation in ruby, where the code would be fixed and variable or eg register names "passed" in.

Dsl

In both previous projects i always found myself generating structures for a lower layer from a higher one. Creating code by creating structures at compile time, that handle data at run-time. This "interweaving" of compile and run-time is very difficult to keep track of, especially since the structures are abstact, in a way "by that time". I even wrote a dsl to get back to more readable code in kide. But since we now would have total control of the language and compiler, it is entirely feasible to create this kind of dsl in the compiler (not the general language). While this may sound abstract (mostly because compiler building is not easy), i have a good vision and am looking forward to experimenting.

Compiler semantics

There is a way to generalise this last idea like so: Writing a specific language as a compiler tool (not a general purpose language), allows for very specific optimisation towards the goal at hand, ie writing a dynamic compiler.

While i am not 100% sure what this will all mean, i can say from the 2 previous compilers that it would be very useful to control this step more. The work has off course been done in the other prijects, but as i mention, it was clumsy and difficult, so i think this would help greatly. But more waffling below.

Features

So coming back to that list, here is what i think this means in slightly more concrete terms.

Calling

Static languages are usually designed for speed and then some secondary thing, eg Rust safety, go concurrency etc. Not for implementing a dynamic language, and this shows a bit in the calling convention. Specific assumptions that don't hold and could be different include:

  • Methods are quite large and use lots of registers. By large i mean it is a good thing to put arguments in registers, because a method will use them there, and not eg call something else immediately
  • There is only one function active at a time and this has all registers to it's disposal. Even a cpu may have 32 or more registers, and the times of large functions and many variables (remember, without calling) is over. Also, so called cooperative multitasking has gone out of fashion. Maybe rightly so, but if it could be compiler controlled, then maybe....
  • Methods return to their caller. With closures that is not so. Also there is this idea of smaller than thread units.
  • Method data is best stored on the stack. This is tightly tied to not having garbage collection, but also to the large method and no oo.

I have previously used linked lists, i think this is still a valid aproach, especially for a dynamic language, and think that using the saame already at the ststic level is just helpful in many ways.

Memory

Static languages, not being object oriented, tend to take a von Neuman view of the machine. There is the memory, go for it, chunk it up as you want, interpret as you wish (well with restrictions recently).

Since the end game is a dynamic object oriented language, random bits and bytes are not really the focus, objects are. And there is no reason for a static language not to have the same object memory model as a dynamic one, in fact it would be super helpful if one can just implement call at the dynamic level using the static semantics (call and object).

Taking this further is means an object should be an object, which ever level has created it. It has a type and that determines it's layout. Maybe some more info will be added (for optimisation/gc?), but having ints or floats unattached to objects should not be part of the static level.

That previous sentence sort of makes my assumption explicit that there is no reason for the static level not to be object oriented.

Compile and Link model

Now we come to the biggie, assumptions made during compiling. These (off course?) fall into two distinct stages, before self-hosting and after. Before self-hosting is achieved, we are, like any static language, just compiling a binary, ie all the work happens at compile time. Code and data has to be "linked" to create an executable.

Just after the compiler can compile itself this way do things get interesting, in the sense that then code has to be generated against partially existing data. While new structures are created the same way, references to existing objects, ie globals like classes etc, must be resolved differently. And knowing thisfrom the start lets us plan for this destinction.

Let's project!

All this lead me to revisit the Futamura's projections, especially the second.

Compilers are (not) free

I came into contact with the projections through an excellent talk by Tom Stuart, called Compilers are free after which i really thought you could build a compiler by building an Interpreter and a Partial Evaluator. It took me a few years to see that compilers are not free, because the compiler you create is compiling into the same language as your interpreter is in. So for the example of Truffle, Truffle will always generate java code and never get below that, never create binary.

I only recently came across a better description of the projection that describes it as a discovery of equivalence. And that i can understand much better, because the work of compiling, with data from the program at compile time, and the structure/data at runtime, is what i imagine partial evaluation is.

Onions to the rescue?

Since i want to create binary from ruby, it looked like one would have to build a ruy interpreter in assembler, and that's where i stopped thinking about this in 2017.

But now having done some work on ruby and SOM and got to the point expounded above, i am thinking maybe, just maybe, there is a way. So in kide i basically built an interpreter for assembler just for debugging purposes. And a compiler into assembler from (partial) ruby.

As said above, current idea is to build a static language and compile that. On the way that would be compiled to assembler, and since we can interpret that, we have half of the projection for a static language. Then we "just" have to build an interpreter for ruby in the static language, evaluate and ...

Boy this is vague, but it feels real. I've start formalizing some ideas in a mathematical notation, hoping that will bring some clarity.

And old archives