I remember there were a few other meta-assemblers I came across in the 80s-90s, so this is definitely not "unchartered territory", but it's good to see another one show up.
Of course, in the other direction there are meta-disassemblers used for analysis in tools like Ghidra.
2ton_jeff 3 hours ago [-]
Very cool and I like the idea of a "meta-assembler." The most-recent version of flatassembler (fasm 2) is built with fasmg which is also a "meta-assembler" of sorts, in that it also doesn't directly support a specific instruction set and instead is a very powerful macro assembler. I'm keen to check out functionality overlaps between the two implementations.
Oh neat! Thanks for the link, I hadn't heard of fasmg before.
It looks like fasmg builds up from the byte level, so it would only work for architectures that use 8-bit words. Torque builds up from the bit level, so it can assemble code for architectures like in PIC microcontrollers, using word sizes of 12 or 14 bits.
However, fasmg does allow a lot more control over the syntax of the language. The documentation shows some pretty powerful string manipulation that's used to parse real x86 assembler code, which makes sense given the purpose of it. Torque doesn't allow overriding the syntax to that degree, the macro invocation syntax is baked into the assembler.
One thing that intrigues me about fasmg is how it handles circular dependencies in expressions [0] (search for 'circular'). Currently in Torque it isn't possible to use a label reference inside a predicate, because the predicate evaluating one way could insert additional code, moving the label and causing the predicate to evaluate the other way [1]. But in fasmg it's possible to use the result of an expression as part of its own calculation.
I reached out to the author of fasmg WRT your post and circular dependency interest and he pointed me toward two posts that he wrote very specifically to explain what he believes is unique to fasm/fasmg and allows to handle circular dependencies of many kinds. [0] Types of multi-pass assembly, and [1] related pitfalls.
Thank you so much! It looks like the issue I described is what he calls the 'oscillator problem' [0]. This is an absolute goldmine, I'm going to be reading for days.
Thank you! My main inspiration was the Uxn assembly language [0], which is itself heavily inspired by Forth. I loved how easy it was to build something that looks like a high-level language by just stacking up macros, and I wanted to have that with embedded development too.
Rust isn't involved past implementing the Torque executable; you write your program with the Torque language and then run the assembler on it to convert it to machine code. You can see the whole process of running code on the TRS-80 from start to finish here [1].
For laying out structs, I'd build a macro that expands to the memory representation of the struct. If I wanted a struct representing, say, a 2D point with signed 16-bit little-endian integers for the x and y coords, I would build it from scratch like this (this is a valid program, you can assemble it with Torque):
Creating a DSL for an existing language wasn't something I'd ever considered. By being a standalone executable it's really easy to use and share, people don't have to install a whole language toolchain in order to use it.
Regarding constraints solving and jumping, Torque already throws an error if you try to pack too large a value into too small a field. This works really well for things like relative jumps, because jumping too far will create a value that can't fit in the instruction. I'm planning on adding an error-throwing token to the language that could be used alongside expressions and conditions to further constrain the values accepted by a macro, but I'm really happy with the simplicity of the language so far.
The actual internal representation isn't what I'd call an 'IR' per se, nothing like with a C compiler. It's all very pedestrian; the syntax tree is baked down across multiple passes, with macros acting as a glorified copy-paste system.
Thanks for the interest and the links, every one of those linked projects is new to me.
I remember there were a few other meta-assemblers I came across in the 80s-90s, so this is definitely not "unchartered territory", but it's good to see another one show up.
Of course, in the other direction there are meta-disassemblers used for analysis in tools like Ghidra.
https://board.flatassembler.net/topic.php?t=19389
https://flatassembler.net/download.php
It looks like fasmg builds up from the byte level, so it would only work for architectures that use 8-bit words. Torque builds up from the bit level, so it can assemble code for architectures like in PIC microcontrollers, using word sizes of 12 or 14 bits.
However, fasmg does allow a lot more control over the syntax of the language. The documentation shows some pretty powerful string manipulation that's used to parse real x86 assembler code, which makes sense given the purpose of it. Torque doesn't allow overriding the syntax to that degree, the macro invocation syntax is baked into the assembler.
One thing that intrigues me about fasmg is how it handles circular dependencies in expressions [0] (search for 'circular'). Currently in Torque it isn't possible to use a label reference inside a predicate, because the predicate evaluating one way could insert additional code, moving the label and causing the predicate to evaluate the other way [1]. But in fasmg it's possible to use the result of an expression as part of its own calculation.
[0] https://flatassembler.net/docs.php?article=fasmg
[1] https://benbridle.com/projects/torque/roadmap.html
[0] https://board.flatassembler.net/topic.php?t=20249
[1] https://board.flatassembler.net/topic.php?t=21060
[0] https://board.flatassembler.net/topic.php?p=178828#178828
Did you get inspiration from other assemblers or macro processors?
You have it running on a TRS-80, how does that work? I had no idea Rust could target a TRS-80.
I am getting hints of Forth, Lisp and TCL.
How would you go about laying out structs in memory?
I am sure you considered an internal DSL, what caused you go with something stand alone?
Any thoughts on adding a constraint solver, like Z3 and allowing end users to set constraints on things like the size of a jump.
I could see taking this an growing it into a compiler by making macro(macro(macros txt)))
Is there an internal IR?
Projects for inspiration
https://github.com/mattbierner/Template-Assembly
Specifying representations of machine instructions https://dl.acm.org/doi/pdf/10.1145/256167.256225
https://www.semanticscholar.org/paper/Specifying-representat...
Typed Assembly Language (TAL) https://www.cs.cornell.edu/talc/
And you haven't come across it, you are in for a treat https://en.wikipedia.org/wiki/META_II has spawned a whole trove of clones
https://en.wikipedia.org/wiki/OMeta
https://github.com/DalekBaldwin/clometa
Rust isn't involved past implementing the Torque executable; you write your program with the Torque language and then run the assembler on it to convert it to machine code. You can see the whole process of running code on the TRS-80 from start to finish here [1].
For laying out structs, I'd build a macro that expands to the memory representation of the struct. If I wanted a struct representing, say, a 2D point with signed 16-bit little-endian integers for the x and y coords, I would build it from scratch like this (this is a valid program, you can assemble it with Torque):
If I want the address of a field, I can add an offset to the struct address, using macros to name the offset values: Creating a DSL for an existing language wasn't something I'd ever considered. By being a standalone executable it's really easy to use and share, people don't have to install a whole language toolchain in order to use it.Regarding constraints solving and jumping, Torque already throws an error if you try to pack too large a value into too small a field. This works really well for things like relative jumps, because jumping too far will create a value that can't fit in the instruction. I'm planning on adding an error-throwing token to the language that could be used alongside expressions and conditions to further constrain the values accepted by a macro, but I'm really happy with the simplicity of the language so far.
The actual internal representation isn't what I'd call an 'IR' per se, nothing like with a C compiler. It's all very pedestrian; the syntax tree is baked down across multiple passes, with macros acting as a glorified copy-paste system.
Thanks for the interest and the links, every one of those linked projects is new to me.
[0] https://wiki.xxiivv.com/site/uxn.html
[1] https://benbridle.com/articles/torque-programming-the-trs-80...
"Assemblers tend to be poorly documented"
I wish everything in programming was as good documented as assemblers and ISAs.
And 6502 ;-) (and probably most of the ancient ones)
Would be interesting to target the RISC CPU of https://www.projectoberon.net with it.
Pretty cool project!