To expand on what I said earlier, I have a suggestion for the sequence
of steps for any port to a new platform.
1) Write a disassembler. For RISC platforms with orthogonal
instruction sets, this is pretty easy (about a day to get decent
disassembly). If the object format is not supported (ie. not 32-bit
ELF/COFF or OMF), it might take significantly longer.
2) Port the debugger. You'll be glad you did. If the platform is Linux
based, it should be real easy. If it's some other Unix, it still
shouldn't be too hard. If it's something altogether different, it could
take a while. If the debug format isn't supported (ie. not DWARF, that
will take additional time). The debugger will reuse the disassembler you
did earlier.
At this point, you should be able to remotely debug to your target
platform, and if it's Unix, it shouldn't be too hard to port the entire
debugger front end to it using the platform's native compiler. If the
only other debugging alternative is gdb, porting wd will save you tons
of time later on. These first two steps may be quite useful on their own.
3) Port the assembler. Again if it's a RISC platform, and if it
doesn't have a "smart" assembler like MIPS for instance (with tons of
pseudo-instructions), that's not such a huge amount of work. Again if
the object format is unsupported (not 32-bit ELF/COFF), expect
additional effort. The assembler doesn't have to be complete, but it
should be good enough to generate the few assembly routines for clib
(things like longjmp aren't really doable in C). The inline assembler
will be reused later by the C/C++ compiler, which may make interfacing
with system calls very easy (see bld/clib/linux/h/sysmips.h for an example).
4) Port the C front end. This should not be too difficult; the biggest
platform specific thing to worry about will be functions with variable
arguments, and it's almost a given those will cause headaches. The C
compiler will be an empty shell at this point, but you'll need it to
exercise the codegen.
5) Now comes the really hard part, the codegen. You'll reuse what you
learned earlier when porting the assembler. If the target is an
orthogonal RISC platform, you will be able to clone one of the existing
RISC codegens. If it's something weird or, god forbid, something as
awful as x86, you really have your work cut out for you. This part is by
far the most difficult because it requires understanding of how the
codegen works. Taking a quick look at bld/cg/doc/mipsnotes.txt may be a
good idea. Every CPU architecture seems to have its quirks (delay slots,
not so orthogonal instruction sets, alignment and/or operand size
restrictions, and who knows what sorts of other weirdness) and there are
no generic solutions. Note that vast swathes of the codegen won't
require any changes whatsoever as they're completely generic; the code
that translates intermediate code (sort of pseudo-assembly) is where the
effort will be concentrated.
6) Port the clib. Again if the target is a Linux or Unix platform,
it's probably not going to be terribly difficult. If it's something
oddball that doesn't even remotely resemble POSIX, expect trouble. This
step will probably overlap with the codegen port. Getting as far as
puts()-based hello world is not *that* difficult, but the fun starts
with printf(), and math stuff is guaranteed to be "interesting".
7) Port the linker. Same thing as earlier - if the object and
executable format is supported (ELF, PE, etc. etc.), most of the work's
done already. If it's something else, there's no telling how much effort
exactly it'll take.
By now you should be able to generate basic executables, at first
using syscalls directly and later using the clib. Now's where you will
really appreciate the debugger because the cg will be buggy and the
programs are going to die a lot.
8) Run (and pass) tests. At the end of this step, the compiler should
be able to pass the tests in bld/ctest/regress and - quite possibly with
extra tweaking - the tests in bld/clib/qa.
It may also be instructive to start cross compiling some utilities for
the target platform - disassembler, linker, etc. Once the port is solid
enough, it should be able to generate a functioning compiler. By the
time that compiler is good enough to compile itself and still work, you
should have a pretty solid codegen plus tools and basic runtime library
support.
9) Optionally continue with the C++ and perhaps F77 compiler and other
tools. The profiler is especially good candidate, and if you have a
debugger, you're almost done with the profiler. Most of the command line
tools are pretty easy to port, but the GUI tools are entirely different
kettle of fish. Porting all the GUI tools might conceivably require as
much effort as porting the compiler itself.
A lot of these steps can be done in parallel, for instance #1 and #3
are independent of each other. There are also two completely different
porting strategies: Either port the OW tools to the target platform
first, using the native compiler, or cross compile from a supported
platform. I'd recommend the latter, but it's a question of personal
preference. The host tools should be definitely solid, because otherwise
you'll be juggling way too many balls at the same time. Porting the
codegen/clib/linker/etc. all at the same time is hard enough. It is
recommended to use the target platform's tools at least in the initial
stages - eg. generate your own object files but use the existing linker.
This work in the other direction as well - it's possible to port eg. the
linker first and use existing tools to generate object files.
Needless to say, the above assumes that the target platform runs an
actual OS. If it's some kind of restricted embedded system, the work
will be in some ways easier (perhaps no need to port clib) and in other
ways much harder (far more difficult to test what you've done).
I should probably put this stuff in wiki at some point...
Michal