When writing a hardware compiler for C, we must deal with the traditional bugaboo of any advanced compiler: the necessity of reconstructing an author's intent from their code. C is a sequential programming language designed for a general computational model; mapping it to hardware is often non-trivial. The standard tools of the parallelizing compiler must be deployed to reconstruct loop structure, induction variables, aliasing and data dependencies from the source, and perform loop unrolling and other optimizations to regenerate parallelism which the sequential structure of the input has hidden. Nevertheless, we can come up with a reasonably straight-forward semantics for the translation of C code to hardware. This semantics is a superset of that described in [4].
First, we declare that all straight-line code executes in zero time. In the example of figure 1, the externally visible value of signal a jumps directly to 5 when the block is executed; it does not transition through 3. Conceptually one can imagine that there is an entirely separate variable representing the externally visible value -- a_ext, say -- which is assigned the value of a just once, at the end of the block. This rule allows us to synthesize straight-line code as a combinational logic block.
Branching and looping code then naturally define sequence points, where we can determine an ordering to the combinational blocks. These sequence points delimit separate states in a synthesizable state machine. Thus, the hardware described by figure 2 corresponds to a three-state machine, with states corresponding to pre-loop, in-loop, and post-loop conditions. We remain in the in-loop state until we exit the loop, either via break or the loop test. The externally visible values of a are: 0, while in the pre-loop state; 1, 3, 6, 10, and 15, while in the in-loop state; and 20 while in the post-loop state. It takes 6 synchronous clock cycles for a to transition from 0 to 20. 3
Note that unrolling this loop will change the externally visible states of the machine, and so unrolling should proceed cautiously. For predictability, it is best not to unroll any loops by default, and allow the programmer to specially annotate functions which they desire to be unrolled. Performance considerations, however, may suggest rather that all internal states be opaque, allowing the compiler maximum scheduling flexibility. Our work inclines toward the latter view.
Our implemented semantics map functions to hardware objects. The input and output ports of the synthesized hardware correspond to the function parameters and return value, respectively. Nested functions correspond to hierarchical hardware composition; recursive functions thus refer to infinitely extended hardware, and are disallowed.4 Using a function for a frequently reused subprogram does not save hardware in the same way it saves code size, but can serve as a useful hint to an optimizing hardware compiler that this particular logic block may be profitably multiplexed on its idle cycles.
Static variables and their associated state map in an obvious way to registers. Arrays, pointers, and their associated memory model can sometimes be localized and mapped to a RAM structure in hardware (especially for small fixed-size arrays), but often pointer arithmetic requires an full-fledged memory interface to implement properly. The behavior of these constructs is implementation dependent, but discouraged. The general rule is that non-obvious mappings are disallowed: if the programmer cannot easily visualize how a construct will map into hardware, then it is very difficult for the compiler and programmer to agree on the correctness of the result.