This time, also to save time, i decided on Transpiling, not compiling. I landed on go, mostly because of the built in gc. Rust was thought about (too complicated), while c++ was, although it could have been a good fit. Maybe it should have been crystal, but that didn't come up in time. Progress was fast, currently basics work, like calling, integers, blocks and some primitives.
I tried transpiling because i noticed in the previous iteration that much of the work looked so very simmilar to to a static language. Ie i was translating ruby into what was basically ifs, calls and switch statements. And now i just translated som into the same, just as go text. It really works very well, but there are clear limitations. These might not matter for SOM so much, but will for ruby.
The main problem is that SOM is just a little too simple. Apart from the unfamiliar block syntax, it really is very similar to c++. Ie in the examples and tests there is basically no metaprogramming, no true dynamic behaviour. This is good for the current approach, as i don't think i could extend the transpiling to run-time. So it proves the static language "theory", that a dynamic language needs to be expressed in a lower, static, form. But also points strongly to the need for a new tool, hence this post.
I just want to expand on that last point before going forward. For me languages up to now come in distinctly layers varients. (The numers are just for reference, not a definition)
Up to now, dynamic languages have been interpreted to implement the dynamic nature. The point i want to make is that this is because of the lack in the layer below. Ie, it is perfectly possible to translate a dynamic language into a static one. In fact i claim that it is even neccessary at a mental level to bridge the levels of abstraction. Ie your need to peel the onion one layer at a time. But it has not been done in a real way, ie with two existing languages (like ruby and c, or js and go), because the build system of static languages are not made for it.
In other words it is not the static language that makes the transpiling impossible, but the existing compilers (and linkers) assumption that after compiling that's that, no more change. There are other reasons i think we need a new languge, but this is that main one. Off course one could use llvm, but that is too much and too little.
Ok, so we may need a new language to extend the build into runtime. Lets just specify that a bit more and move on tho the other reasons. (Then the name issue)
Every higher language needs a way to access functionality in the lower layer that is not expressable anymore in the higher layer. A simple example in c is a system call. This needs to be expressed in assembler, c just doesn't have any constructs to do so. Or in SOM/ruby an integer addition. In an ideal compiler language, this would be easy, ie easy to integrate parameterised code of the layer below. Paramterised is the key here, as just simply switching compiler context is not enough. The act of compiling is very much interweaving the static structure with dynamic values. I think of this a bit like a string interpolation in ruby, where the code would be fixed and variable or eg register names "passed" in.
In both previous projects i always found myself generating structures for a lower layer from a higher one. Creating code by creating structures at compile time, that handle data at run-time. This "interweaving" of compile and run-time is very difficult to keep track of, especially since the structures are abstact, in a way "by that time". I even wrote a dsl to get back to more readable code in kide. But since we now would have total control of the language and compiler, it is entirely feasible to create this kind of dsl in the compiler (not the general language). While this may sound abstract (mostly because compiler building is not easy), i have a good vision and am looking forward to experimenting.
There is a way to generalise this last idea like so: Writing a specific language as a compiler tool (not a general purpose language), allows for very specific optimisation towards the goal at hand, ie writing a dynamic compiler.
While i am not 100% sure what this will all mean, i can say from the 2 previous compilers that it would be very useful to control this step more. The work has off course been done in the other prijects, but as i mention, it was clumsy and difficult, so i think this would help greatly. But more waffling below.
So coming back to that list, here is what i think this means in slightly more concrete terms.
Static languages are usually designed for speed and then some secondary thing, eg Rust safety, go concurrency etc. Not for implementing a dynamic language, and this shows a bit in the calling convention. Specific assumptions that don't hold and could be different include:
I have previously used linked lists, i think this is still a valid aproach, especially for a dynamic language, and think that using the saame already at the ststic level is just helpful in many ways.
Static languages, not being object oriented, tend to take a von Neuman view of the machine. There is the memory, go for it, chunk it up as you want, interpret as you wish (well with restrictions recently).
Since the end game is a dynamic object oriented language, random bits and bytes are not really the focus, objects are. And there is no reason for a static language not to have the same object memory model as a dynamic one, in fact it would be super helpful if one can just implement call at the dynamic level using the static semantics (call and object).
Taking this further is means an object should be an object, which ever level has created it. It has a type and that determines it's layout. Maybe some more info will be added (for optimisation/gc?), but having ints or floats unattached to objects should not be part of the static level.
That previous sentence sort of makes my assumption explicit that there is no reason for the static level not to be object oriented.
Now we come to the biggie, assumptions made during compiling. These (off course?) fall into two distinct stages, before self-hosting and after. Before self-hosting is achieved, we are, like any static language, just compiling a binary, ie all the work happens at compile time. Code and data has to be "linked" to create an executable.
Just after the compiler can compile itself this way do things get interesting, in the sense that then code has to be generated against partially existing data. While new structures are created the same way, references to existing objects, ie globals like classes etc, must be resolved differently. And knowing thisfrom the start lets us plan for this destinction.
All this lead me to revisit the Futamura's projections, especially the second.
I came into contact with the projections through an excellent talk by Tom Stuart, called Compilers are free after which i really thought you could build a compiler by building an Interpreter and a Partial Evaluator. It took me a few years to see that compilers are not free, because the compiler you create is compiling into the same language as your interpreter is in. So for the example of Truffle, Truffle will always generate java code and never get below that, never create binary.
I only recently came across a better description of the projection that describes it as a discovery of equivalence. And that i can understand much better, because the work of compiling, with data from the program at compile time, and the structure/data at runtime, is what i imagine partial evaluation is.
Since i want to create binary from ruby, it looked like one would have to build a ruy interpreter in assembler, and that's where i stopped thinking about this in 2017.
But now having done some work on ruby and SOM and got to the point expounded above, i am thinking maybe, just maybe, there is a way. So in kide i basically built an interpreter for assembler just for debugging purposes. And a compiler into assembler from (partial) ruby.
As said above, current idea is to build a static language and compile that. On the way that would be compiled to assembler, and since we can interpret that, we have half of the projection for a static language. Then we "just" have to build an interpreter for ruby in the static language, evaluate and ...
Boy this is vague, but it feels real. I've start formalizing some ideas in a mathematical notation, hoping that will bring some clarity.