2. Barracuda Project

The following chapter is aimed at giving the reader an introduction to the Barracuda project and should only give enough detail to provide an overview over the functionality of STAR12 MCU.
Section 1 lists the overall functionality of the module and general user information. Section 2 contains design-related information. This includes such information as design specific targets, source code samples.

2.1 Architecture of the Barracuda MCU

The Barracuda MCU architecture consists of two main blocks, a CORE with 16-bit CPU12 and a peripheral interface block with 2 x Serial Communication Interface (SCI), a Serial Peripheral Interface (SPI), I2C, BDLC (network interface), Pulse Width Modulator (PWM), Enhanced Capture Timer (ECT), Clock and Reset Generator (CRG), Keyboard Wakeup Unit (KWU), 4 x CAN interface (MSCAN).
A STAR12 MCU bus interfaces between the two blocks. Transfer between interface modules is performed via an IPbus. An IPbridge serves for signal conversion between STAR12 and IPbus.
The ports of each peripheral interface module lead not directly to pins. There is a dedicated Port Integration module (PIM) that includes in the main analogue functionality to control driver strength, detect plugging or to enable pull-up resistors. In this way analogue port functions have completely be separated from interface modules.
Additionally, the Barracuda architecture includes on chip memory (Flash memory, EEPROM, RAM) and two Analog-to-Digital Converters (ADC).

klick to open PDF-Format
Fig.2.1 Architecture of the Barracuda MCU

The Barracuda is a 16-bit MCS912D-Family member Microcontroller Unit (MCU), that consists of a 16-bit central processing unit (CPU12) with 256K bytes of on chip Flash EEPROM, 16K bytes of RAM, 2K bytes of EEPROM, two asynchronous serial communications interfaces (SCI), a serial peripheral interface (SPI), an IIC-bus, an enhanced capture timer (ECT), two 8- channel, 10-bit analog-to-digital converters (ADC), an eight-channel pulse-width modulator (PWM), a BDLC - J1850 interface and up to four CAN modules. The Barracuda interfaces 16-bit memory and can operate in 8-bit narrow mode for interfacing 8-bit wide memory to reduce system costs. An on-chip PLL allows power consumption and performance to be adjusted to suit operational requirements. Furthermore, Keyboard Wakeup Logic is available for 12 I/O ports. Table 1 lists the features of the Barracuda MCU.

Table 1 Barracuda Features [6] (HCS12 T-Board)
CORE (CPU, FSC, VSC) 16-bit CPU12, compatible with M68HC11 instruction set, a 20-bit ALU Instruction queue and indexed addressing
2 x SCI The asynchronous serial communications interfaces allow Full-duplex operation with Standard mark/space non-return-to-zero (NRZ) format, 13-bit baud rate selection, Programmable 8-bit or 9-bit data format. Moreover it features separately enabled transmitter and receiver, Programmable transmitter output polarity, Two receiver wakeup methods, Receiver framing error detection, Hardware parity checking, 1/16 bit-time noise detection
SPI The serial peripheral interface module allows full-duplex, synchronous, serial communication between the MCU and peripheral devices. Software can poll the SPI status flags or the SPI operation can be interrupt driven. Other features are: Master mode and slave mode, Bi-directional mode, Slave select output, Mode fault error flag with CPU interrupt capability, Double-buffered operation, Serial clock with programmable polarity and phase, Control of SPI operation during wait mode.
I2C The Inter-IC Bus (IIC or I2C) is a two-wire, bidirectional serial bus that provides a two-wire data exchange between devices. The interface is designed to operate up to 100kbps with maximum bus loading and timing. The device is capable of operating at higher baud rates, up to a maximum of clock/20, with reduced bus loading. Features are: Multi-master operation, Software programmable for one of 256 different serial clock frequencies, Software selectable acknowledge bit, Interrupt driven byte-by-byte data transfer, Arbitration lost interrupt with automatic mode switching from master to slave, Calling address identification interrupt, Start and stop signal generation/detection, Repeated start signal generation, Acknowledge bit generation/detection, Bus busy detection.
BDLC The J1850 interface is a serial communication module, which allows user to send and receive messages across a Society of Automotive Engineers (SAE) J1850 serial communication network. The user's software handles each transmitted or received message on a byte-by-byte basis, while the BDLC performs all of the network access: arbitration, message framing and error detection duties. Features include: 10.4 Kbps Variable Pulse Width (VPW) Bit Format, Digital Noise Filter, Collision Detection
PWM The Pulse-Width Modulator has eight-channels, each with a programmable period and duty cycle as well as a dedicated counter. A flexible clock select scheme allows four different clock sources to be used with the counters. Each of the modulators can create independent continuous waveforms with software-selectable duty rates from 0% to 100%. The PWM outputs can be programmed as left aligned outputs or center aligned outputs.
ECT The Enhanced Capture Timer features 16-Bit Buffer Register for four Input Capture (IC) channels, four 8-Bit Pulse Accumulators with 8-bit buffer registers associated with the four buffered IC channels, 16-Bit Modulus Down-Counter with 4-bit prescaler, Four user selectable Delay Counters for input noise immunity increase. The timer is configurable as two 16-Bit Pulse Accumulators and supports only 16 - bit access on the IP bus.
CRG The Clock and Reset Generator features a Crystal oscillator, a Phase Locked Loop (PLL) frequency multiplier, a System Clock Generator (CGEN), System clock switch, System clocks off during WAIT mode, System Reset Generator (RGEN) with Power-on Reset (POR), Computer Operating Properly (COP), Watchdog Timer with time-out clear window.
2 x KWU The Keyboard Wakeup Unit controls two ports H and J. Data and DDR registers allow access as a 16-bit port. There are 16 Key Wake-Up (KWU) channels to wake-up the chip from STOP mode. For each pin, which has an interrupt enabled, an active edge brings the part out of STOP. Digital filtering is included to prevent pulses shorter than a specified value from waking the part from STOP.
4 x CAN CAN modules are CAN 2.0 A, B software compatible with four receive and three transmit buffers, flexible identifier filter programmable as 2 x 32 bit, 4 x 16 bit or 8 x 8 bit, four separate interrupt channels for Rx, Tx, error and wake-up Low-pass filter wake-up function and Loop-back for self test operation
2 x ADC The 12- channel, 10-bit Analog-to-Digital Converter also works as peripheral interface module. It does not require external sample and hold circuitry. The 12 analog input channels are multiplexed internally. It features: minimum 7 msec 10-Bit Single Conversion Time, Internal transfer buffer amplifier, Programmable Sample Time, Left Justified / Unsigned Result Data and Conversion Completion Interrupt Generation.
PIM The Port Integration Module establishes the interface between the peripheral modules and the I/O Pads for all ports of the interface modules. Each I/O pin can be configured up several register bits allowing input/output selection, drive strength reduction, enable and select of pull resistors and interrupt enable and status flags.
Memory on-chip memory will be available in different configurations: 32K, 58K, 128K Flash EEPROM,
1K, 2K byte EEPROM 2K, 4K, 8k and 16K byte RAM

2.2 Module Structure

The first step is analyzing the design that is to be FPGA emulated. The module structure is required for the substitution of analogue modules to translate the modules into a Gate Model based on the component library of the FPGA vendor.
The modules used for evaluation (modules from the JUPITER project) were designed to be mapped to a Motorola logic cell library. FPGAs provide only a subset of logic cells, and above all, provide neither hardmacros nor analogue features. To allow mapping the same design onto a FPGA without changing the functionality, not supported cells have to be replaced by cells available in the FPGA library.
In the course of this thesis, the complete design has been analyzed using the Verilog RTL-code, to find out not synthesizable cells. Most of them were hard instantiated cells e.g. driver, inverter, buffer, flip-flops. They have been replaced by simple Verilog modules as shown in table 2.

Table 2: Substitution of not synthesizable cells
unsupported cell substituted by RTL model
driver wire
hard instantiated cells (inverter, DFF, gated-clock cells, RS-FF) RTL code with the same functionality
internal SRAM SRAM Megafunction of the FPGA that emulates memory using its internal dual port RAM
analogue modules moved to top level

One major constrain for the Barracuda preSilicon Emulation project was not to change source code of the modules, since we worked on old modules that were to be replaced by the final Barracuda modules. Therefore, it was the intention of the author to search for generic approaches that allow repeating all actions in minutes.
For example, an analogue sub-module has to be moved to top-level. Generally, all interfaces of modules, higher in the hierarchy than the module to be substituted, have to be changed, including the top-level module. Another way is, using synthesis script commands to perform the same action with the help of Synopsys synthesis tool.
Example 1 shows a method, developed in the course of this thesis, that allows moving submodules to top-level without changes in RTL code using Synopsys synthesis script commands.
First, the module to move scg_cus is ungrouped keeping the names of all submodules. Second, the content of the module is substituted by a dummy module and the hierarchy is flattened to move all modules to the top-level. At top-level, module scg_cus is removed and replaced by the dummy module. Afterwards, all modules that were not originally at top-level have to be grouped again to restore hierarchy.

Example 1:
current_design vsc_kd128_1_0
ungroup -simple_names "MMC"
/* ungroup -simple_names "KEEPERS" */
group {PI, REG, BUF, CORE} -design_name "mmv_kd1298_1_0" -cell_name "MMC"

/* substitute content of module "vsc_kd128_1_0/scg_1_0/scg_cus"              */
/* by content of module         "vsc_kd128_1_0/scg_1_0/scg_cus_dummy"        */
/* and move module              "vsc_kd128_1_0/scg_1_0/scg_cus" to top-level */
/* ------------------------------------------------------------------------- */
remove_design -hier scg_cus
rename_design scg_cus_dummy scg_cus

current_design vsc_kd128_1_0
ungroup -simple_names "SCG"
ungroup -simple_names "CUS"

2.3 RTL Synthesis

For synthesis the Synopsys Design Compiler has been used. To reduce work and simplify design changes, a set of synthesis scripts that are fully generic have been developed in the course of this thesis. For every step in synthesis a dedicated design independent script performs a standard procedure.
Design specific information e.g. module names has completely separated from this scripts. Synopsys Design Compiler is configured using a global setup script. All design specific information e.g. path names and module names is located in Examples of Synthesis scripts: Advantages of this approach:

This approach has saved me a lot of time while working on the Barracuda project with at least 15 top-level modules, each with a set of 3 to 10 submodules.
The following synthesis script serves to analyze and elaborate a design using a simple list of filenames as input as shown in example 2.

Example 2:
/* A N A L Y Z E */
sh test -f SYNMODEL
if (dc_shell_status == 0)
   remove_variable vlist > /dev/null
   vlist = execute(-s, sh echo `cat SYNMODEL `)

analyze -f verilog -lib WORK vlist > reports + "/" + DESIGN_TOP_preROUTE + "_anal.rpt"
if (dc_shell_status == 0)
        echo "Error - Analyze Failed"

/* E L A B O R A T E */
sh test -f DESIGN
if (dc_shell_status == 0)
   remove_variable dlist > /dev/null
   dlist = execute(-s, sh echo `cat DESIGN `)

foreach (dsn, dlist)
	elaborate dsn -arch "verilog" -lib WORK -update  > reports + "/" + DESIGN_TOP_preROUTE + "_elab.rpt"

        if (dc_shell_status == 0)
                list dsn
                echo "Error: specified DESIGN not found"

First, to allow design independency, a method to separate module specific names had to be found.
The execute(-s, sh echo `cat SYNMODEL `) command reads in the content of file fpga4.list which is a simple list of file names that represent the modules to be processed.

Design specific information is stored in variable SYNMODEL. Some exception handling has proved to be very helpful for debugging and preventing false inputs. The following command will test the existence of necessary variables and give a warning if any error occurs during execution of the scripts.
sh test -f SYNMODEL
if (dc_shell_status == 0) {    commands...    }
The designed action is done recursively for each module listed in file fpga4.list.
Additionally, the approach of recursive execution of commands makes the design independent from its size. The complete set of scripts can be found in Appendix B.
foreach (dsn, vlist)
	elaborate dsn -arch "verilog" -lib WORK -update  > reports + "/" + DESIGN_TOP_preROUTE + "_elab.rpt"

        if (dc_shell_status == 0) { echo "Error: specified DESIGN not found" } 

2.4 Putting all together

One way of FPGA emulation is the implementation of all modules in one FPGA to test the functionality.
Another way is to separate the design and implement each part of the design in different FPGAs.
To find out the best way, the author has first analyzed all modules to get the number of required logic cells for each module. The Barracuda project requires diagnosis of internal bus signals as well. Therefore, the modules of the project have been separated on multiple devices. But it turned out, that there is no other way, because the number of required logic elements of the Barracuda design exceed the number of available logic cells even of the biggest Flex10k FPGA by far.
After all self-contained top-level modules of the design passed the ALTERA FPGA design tool successfully, it showed that the CORE module fitted in one Flex10kE200 FPGA and the interface modules (SCI, SPI, I2C, and BDLC) in one Flex10kE100. There were great parts of FPGA resources free, so we decided to add all other modules (KWU, TIMER, and PWM) to the design.

The architecture of Barracuda MCU includes an IPbus interface between CPU and the interface modules.
The second major task of this thesis was the development of an IPbus interface for each interface module, since the used modules from the old (JUPITER) project had a STAR12 MCU bus interface.
The basic function of the IPbridge is to convert signals e.g. the readwrite signal it converted to read and write, and to latch data and address busses.
For each interface module of the Barracuda design (SCI, SPI, I2C, BDLC, KWU, CRG, PIM, ECT) an IPbridge has been designed in the course of this thesis to allow modules to be put together. After analyzing each module, a pintable has been created that shows which signal of the old STAR12 bus interface has to be transformed to which signal of the IPbus interface. Table 3 shows the STAR MCU bus <=> IPbus signal conversion, the input and output signals to the IP-bridge module. The signal names and descriptions are given. The primary function of the signals is described first, followed by the secondary function if applicable. After the signal conversion has been worked out theoretically, the Verilog RTL-code of each IPbus interface has been developed and synthesized.
The complete list of IPbus interfaces is chown in appendix B

Table 3: STAR 12 Bus=> IPbus signal conversion
STAR 12 Bus Bit Description IP-bus Signal Direction


core_clk34 1 System Clock 34 module clock clk34 I
core_clk41 1 System Clock 41 module clock bus_clk I
core_rst_t3 1 Hardweare Reset asynchronous reset hard_rst_b I
     Software Reset synchronous reset soft_rst_b I
rdb_t2 16 Write Data Bus Output data bus; always driven data_rd O
core_wdb_t4 16 Read Data Bus Input data bus data_wr I
core_ab_t2 4 Address Bus System address bus addr I
core_sz8_t2 1

Size 8 Signal (for 8-bit accesses)

enable byte accesses. One bit for each byte in the data buses



 core_rw_t2      Read Write Signal  read signal read_en_b I
write signal write_en_b I
 core_stop_t2     module should enter doze mode doze_mode I
      module should enter freeze mode freeze_mode I
core_stop_t2 1 Stop Signal module should enter stop mode stop_mode I
      module should enter supervisor mode supervisor_mode I
core_bdmact_t2 1 background Debug Mode active module should enter test mode test_mode I
core_wait_t2 1 Wait Signal module should enter wait mode wait_mode I
core_smod_t2 1 Special Mode   smodT4 I
core_scanmod 1 Scanmode Signal   scanmod I
ffxx     Interrupt acknowledge int_ack, int_vector, rd_int_vector_b I
Module Plug SIGNALS
dlc_puerst_plug 1 Determines reset state of DLCPUE1 bit of DLCSCR register     I


1 Enable second driver for rdb_t2     I

Another major constrain for this project, was to make all pins of the Barracuda design visible on the board, but it showed, this exceeds the number of available pins of the used ALTERA FPGA by far.
The proposal of the author, not to implement port signals that control analogue port functions, e.g. input buffer enable or pull-up enable signals, proved the right way. More than 50 pins could be saved by leaving these signals open.
After the pin-out of STAR MCU bus and interface modules had been determined,
the third major task of this thesis was the development of wrapper modules for each FPGA that have the final pin-out of the FPGAs, see figures 2 to 5.

klick to open PDF-Format
Fig.2.2 Pin layout FPGA #1

The pin-out of FPGA1 includes the STAR-MCU bus and a port interface. Originally, a constrain of the project was to keep some reserved pins for future improvements of the design e.g. additional interrupt signals that have to be routed on the board from FPGA2/3/4 to FPGA1 (CORE). But all pins of FPGA1 have been used for MCU-bus, memory and port interface signals.
One way to solve this problem was the implementation of the CORE in two FPGAs. The submodules of the CORE would have been separated at any border of the module's internal hierarchy. It showed that the interface submodules had decisively more signals, than the interface of the top-level module. Moreover, the CORE module was time critical, that means implementation in two FPGAs would cause additional timing delay, since the former internal signals would have to be routed on the prototype board.
At this stage of the project, all signals, known after analyzing the old JUPITER modules, could be implemented in FPGA keeping around 20 reserved pins free for future use. We decided to use only one FPGA to implement the CORE module, but we had to keep in view the problem of too less pins available for future use.

klick to open PDF-Format
Fig.2.3 Pin layout FPGA #2

klick to open PDF-Format
Fig.2.4 FPGA #3 (FPGA #4)

It proved to be impossible to implement all peripheral interface modules in one FPGA since the number of required logic cells exceeds the size of one FPGA.
The Port Integration Module (PIM) has the port functionality of each interface module. If some interface modules are located in another FPGA than the PIM module appears the problem where to route the port signals of these modules.
One way could be splitting the PIM module or leaving it out as this module has in the main analogue components that can not be FPGA implemented.
Another way would be routing the port signals between FPGAs. The proposal of the author to keep the PIM module as it is and implement it in FPGA2 has been realized; because the PIM module includes some registers and has its own IPbus interface. This made some intermodule signals between FPGA2 and FPGA3/4 necessary. FPGA3 and FPGA4 have no peripheral ports. The port signals can_ind, can_dout and can_oen from each CAN0/1/2/3 (FPGA3/4) module lead to PIM module, located in FPGA2.
Figure 6 shows the module layout of the Barracuda ProtoBoard. Each FPGA is connected to the STAR12 MCU bus.

klick to open PDF-Format
Fig.2.5 FPGA#5