Domesday Duplicator Software

Overview

The Domesday Duplicator is a completely open-source and open-hardware solution.  All required files to construct the hardware and all of the source-code is available on Github.

The Github repository contains the following items:

  • Kicad schematics and PCB design
  • FPGA Verilog HDL code for the DE0-Nano
  • GPIF II state-machine design for the FX3
  • FX3 firmware for the Cypress FX3 board
  • Linux (Ubuntu 16.04 LTS) GUI capture application

The Github repository is accessible via the following link: Domesday Duplicator Github

The development environment for all parts of the Domesday Duplicator software is Ubuntu 16.04.4 LTS with the exception of the Cypress FX3 GPIF design utility (which, despite Cypress stating Linux compatibility for their product, does not run on Linux (and is an essential part of their development tool-chain)).

FPGA code

Purpose

The DE0-Nano FPGA board is used to bridge the Domesday Duplicator’s ADC hardware with the Cypress FX3 USB 3.0 board.  The code provides data manipulation and conversion, error checking and a 32K word FIFO buffer to allow buffering in case of short constrictions of USB bandwidth to the host computer.  The FPGA code also contains a test data generation function that allows testing of the Domesday Duplicator with known test data (that is verified as intact once received by the host application).

Development environment

The development environment for the FPGA code is Intel Quartus Prime Version 17.1.0 Build 590 10/25/2017 SJ Lite Edition running on Ubuntu 16.04.3 LTS.

Programming the DE0-Nano

In order to program the DE0-Nano so that the Domesday Duplicator software is loaded and executed on power up, it is necessary to program the EPCS64 serial configuration device.  Instructions for programming the serial device can be found in the DE0-Nano user manual available from Terasic.

Firstly load the project and use Processing->Start Compilation to begin the compilation process and generate the .sof programming file.  To temporarily program the DE0-Nano simply use Tools->Programmer and flash the DE0-Nano board using the .sof file.

To program the DE0-Nano permanently follow the instructions given on page 146 of the DE0-Nano user guide – section 9.1 – Programming the Serial Configuration Device.

The DE0-Nano User Manual is available from this link.

Source code modules

DomesdayDuplicator.v

This module is the top-level verilog module and contains the hardware mapping information for the communication between the FPGA and the ADC as well as the communication between the FPGA and the FX3.

The module also includes instantiation code for the Intel IP PLL functions which generate the required 80MHz clock (for FPGA to FX3 communication) and 28.8 MHz/35.4 MHz clocks (for ADC to FPGA communication – i.e. the sample rate clock).

The top-level module includes two sub-modules ‘dataGenerator’ and ‘fx3StateMachine’.  The purpose of these modules is described below.

dataGenerator.v

The data generator module is responsible for generating data either from the ADC or (if in test mode) internally.  When in test mode the generator outputs a repeating sequence of 10-bit numbers (0 to 1023).

The module passes all 10-bit data (test or ADC) through the convertTenToSixteenBits module described below.

Data from the ADC is read on the negative edge of the ADC clock and passed into the write-side of the dual-clock FIFO (at the sampling rate).  The dual clock FIFO is implemented using the Intel IP DCFIFO.  If in test mode ADC data is ignored and a sequence is passed to the write-side of the FIFO instead.  Data is only written to the FIFO if the collectData flag is set.

The dual-clock FIFO is 10-bits wide and 32767 words deep.

Data is read from the 80MHz read-side of the dual-clock FIFO on the positive-edge of the FX3 clock (80MHz) and passed via the convertTenToSixteenBits module to the data output.  Data is only read from the FIFO when the readData flag is set.

convertTenToSixteenBits.v

The converTenToSixteenBits module takes unsigned 10-bit data as input and outputs a scaled and signed 16-bit value (little endian).

fx3StateMachine.v

The fx3StateMachine module implements the required mirror state-machine for the GPIF II implementation (detailed below).  The state-machine has two states:

  1. state_waitForRequest – The state-machine waits for the GPIF II state-machine to indicate a transfer is about to begin.
  2. state_sendPacket – The state-machine waits for 8192 clock cycles (whilst data is transferred) before returning to the waitForRequest state.

Cypress FX3 firmware

Purpose

The Cypress FX3 firmware provides a DMA driven data transfer between the FPGA and the USB 3 compatible host computer.  A GPIF state-machine design is used to automatically read data from the FPGA and transmit it via USB 3 with minimal interaction of the FX3’s ARM processor.

In addition the FX3 provides a set of generic input and output GPIOs to and from the FPGA.  Currently the firmware monitors the inputs from the FPGA and produces debugging information (via the serial console) if any signal is set, this is to assist with debugging the FPGA code.  Outputs to the FPGA are available via a vendor-specific USB command (0xB6) that accepts a byte containing bit-flags and sets the outputs to the FPGA according to the state of the flags (used to send capture configuration settings from the GUI application to the FPGA).

Development environment

The FX3 firmware is developed in the Cypress EZ USB Design Suite for Linux and cyusb_linux_1.0.4.  The GPIF II state-machine design is developed in Cypress GPIF II Designer 1.0 (which is only available for Windows).

Programming the FX3

The Cypress FX3 is programmed using the cyusb_linux utility included with the Cypress Linux SDK.  To program the device please see the following steps:

  1. Close/short jumper J4 (PMODE) on the FX3 Superspeed board
  2. Power off the FX3 and then power it on again
  3. Load the cyusb_linux application
  4. Highlight the FX3 bootloader device at the top of the window
  5. Click on the ‘program’ tab
  6. Select I2C EEPROM
  7. Click on ‘select file’ and select the FX3 programming file from disk (use the .img file from the Release directory)
  8. Click on ‘Start download’ to write the programming file to the device
  9. Wait for programming to complete
  10. Remove the jumper from J4
  11. Power off the FX3 and then power it on again
cyusb_linux Screenshot

GPIF II

IO Matrix

The following diagram shows the IO matrix configuration for the FX3 GPIF implementation:

Cypress GPIF I/O Matrix view

The purpose of the signals are as follows:

  • CLK – This is the GPIF clock supplied by the FPGA (80 MHz)
  • dataAvailable – This signal indicates that there is sufficient data in the FPGA’s FIFO buffer for a transfer
  • Databus – The 16-bit data bus from the FPGA to the FX3
  • nReset – (not) reset condition signal from the FX3 to the FPGA
  • collectData – Flag from the FX3 to the FPGA that indicates if the FPGA should collect ADC data
  • readData – Flag from the FX3 to the FPGA indicating that a transfer is about to begin
  • input0 – Generic input GPIO from the FPGA to the FX3
  • input1 – Generic input GPIO from the FPGA to the FX3
  • input2 – Generic input GPIO from the FPGA to the FX3
  • input3 – Generic input GPIO from the FPGA to the FX3
  • outputE0 – Generic output GPIO from the FX3 to the FPGA (with ‘early’ low-latency)
  • outputD0 – Generic output GPIO from the FX3 to the FPGA (with ‘delay’ normal latency)
  • outputD1 – Generic output GPIO from the FX3 to the FPGA (with ‘delay’ normal latency)
  • outputD2 – Generic output GPIO from the FX3 to the FPGA (with ‘delay’ normal latency)
  • outputD3 – Generic output GPIO from the FX3 to the FPGA (with ‘delay’ normal latency)

State machine

The following diagram shows the GPIF II state machine design for the FX3 GPIF implementation:

Cypress GPIF State-machine view

The state machine is designed to use the automatic transfer feature of the FX3 where the incoming data from the FPGA is automatically moved to the USB interface by the GPIF module with minimal interaction with the FX3’s ARM processor.  The design uses two GPIF ‘threads’ to ensure minimum delay between transfers.  The GPIF design is configured with 16Kbyte buffers and data is automatically committed to the USB interface once a buffer is full.

As the GPIF interface is synchronous both threads enter a wait state until the FPGA signals that enough data is available for a transfer.  Once this flag is received the GPIF changes to the ‘request’ state where it signals to the FPGA that a transfer is about to start (the TH0_REQUEST and TH1_REQUEST states repeat for 3 clock cycles to allow time for the FX3 to send the signal and the FPGA to receive it).

Once the state-machine enters the read state, 8192 16-bit words of data are transferred between the FPGA and the FX3 (filling the available 16Kbyte buffer), the state-machine then commits the data to the USB interface and returns to a wait state.  This design allows for a deterministic transfer with minimal signalling complexity whilst allowing for the non-deterministic nature of the DMA ready state on the FX3 (the buffer ‘ready’ is unpredictable due to the reliance on the host computer to transfer data in a timely manner).  By using a combination of deterministic and non-deterministic states the GPIF design provides an asynchronous data transfer via the synchronous interface with minimal overhead to ensure high-bandwidth of data transfer.

Since the FPGA to USB interface is 80MHz (more than double the rate of capture) this design allows the USB interface to ‘catch-up’ rapidly whenever there is a drop in the bandwidth across the USB interface.

Source code modules

cyfx_gcc_startup.S

This file contains the proprietary start-up code necessary for the FX3 to function – Cypress why-oh-why would you not release this open-source?

cyfxtx.c

This file contains the Cypress USB 3.0 Platform source file – again under a proprietary license (and necessary for the FX3 to function)

domesdayDuplicator.c

This file contains the main functions for the Domesday Duplicator firmware.  All functions are heavily commented; please see the Github repository for details.

domesdayDuplicator.h

This file contains the definitions and header file information for domesdayDuplicator.h

domesdayDuplicatorGpif.h

This file contains the state-machine definition code generated by the GPIF II designer application.  The original GPIF II designer project is also included in the Github repository.

usbDescriptor.c

This file contains the USB descriptor information for the Domesday Duplicator USB device.

Vendor specific USB commands

The firmware handles two vendor specific USB commands 0xB5 and 0xB6.

Vendor specific command 0xB5 – Start/Stop data collection

The vendor specific command 0xB5 accepts a value of either 0 or 1.  If called with a value of 1 it causes the firmware to flag the FPGA to begin sample data collection.  Calling the command with a value of 0 causes the firmware to flag the FPGA to stop sample data collection.

Vendor specific command 0xB6 – Configuration bit flag

The vendor specific command 0xB6 accepts a byte value from the host that is interpreted as a bit-flag.  Only bits 0-4 are used (bits 5-7 are ignored and can be set to any value).  The setting of the bit-flags are mapped to output GPIO pins as follows:

  • Bit 0 – GPIO22
  • Bit 1 – GPIO23
  • Bit 2 – GPIO24
  • Bit 3 – GPIO25
  • Bit 4 – GPIO26

The command is used by the host application to set the capture configuration of the FPGA.  The exact interpretation of the flags is dependent on the FPGA programming (and is kept deliberately generic in the FX3 firmware).

Serial debug

The FX3 Superspeed explorer board provides a USB 2.0 service debug output.  Connecting this output to a suitable machine with a serial terminal allows monitoring of the debug information from the FX3 firmware.

The FX3 serial debug output is 115200 bps 8N1.

If you have problems getting the ttyACM device configured for use with the Cypress FX3 board, create (as root) the following file:

/etc/udev/rules.d/49-cypressserial.rules

Then place the following configuration in the rules file:

# 04b4:0007 - Cypress FX3 serial debug port
ATTRS{idVendor}=="04b4", ATTRS{idProduct}=="0007", ENV{ID_MM_DEVICE_IGNORE}="1"
ATTRS{idVendor}=="04b4", ATTRS{idProduct}=="0007", ENV{MTP_NO_PROBE}="1"
SUBSYSTEMS=="usb", ATTRS{idVendor}=="04b4", ATTRS{idProduct}=="0007", MODE:="0666"
KERNEL=="ttyACM*", ATTRS{idVendor}=="04b4", ATTRS{idProduct}=="0007", MODE:="0666"

Then issue the following command to re-read the USB configuration rules:

sudo udevadm control --reload-rules

Linux GUI capture application

Purpose

The Linux GUI application provides a capture front-end for the user.  The application also provides a high-speed multi-threaded USB implementation that allows extremely high-speed data transfer from the FX3 in real-time.  In addition, a multi-buffer disk IO implementation deals with writing the large amounts of capture data to disk in a timely manner.  The Linux application is also capable of sending vendor-specific USB commands to the Domesday Duplicator in order to control and configure the capture device.

Domesday Duplicator Ubuntu GUI application

Development environment

The Linux GUI application is developed using Qt Creator 4.6.0 for Linux (Based on Qt 5.10.1 GCC 64-bit).  Although QT is a cross-platform development tool, the application is only tested under Ubuntu 16.04.4 LTS.  The underlying USB library is libUSB (note that the Cypress ‘library’ is not used as it is simply shim-code over libUSB designed to taint source code with Cypress’ own proprietary licensing).

The software requires the libusb-1.0 library that can be installed using the following command:

sudo apt-get install libusb-1.0

The software also requires the freeglut3 library that can be installed using the following command:

sudo apt-get install freeglut3-dev

Source code modules

main.cpp

This module contains the QT application start up code.

mainwindow.cpp

This module contains the functions for the main window class.

mainwindow.h

This is the header file for mainwindow.cpp.

mainwindow.ui

This module contains the GUI form design for the main window.

usbdevice.cpp

This module contains the USB device interface class.

usbdevice.h

This is the header file for usbdevice.cpp

QtUsb

The Linux GUI application uses a modified version of the QtUSB library available as GPL open-source here.  As well as adding bulk, multi-threaded USB I/O and disk I/O, the implementation also contains bug-fixes to the original library and additional functions for supporting USB vendor specific commands.  Thanks go to Fabien Poussin for this library and for releasing it with an open-source licence.

Multi-threaded USB transfer architecture

The following diagram shows the approximate structure of the multi-threaded architecture used by the GUI application to achieve the required USB and disk bandwidth:

Linux GUI transfer architecture

The USB interface is processed using multiple transfer threads which are ‘in flight’ at any one time (configurable from the definitions in the bulk-transfer code).  The collection of transfers are called the ‘queue’.  This causes minimal latency when reading data from the USB device.  Each thread causes a ‘callback’ once the transfer is complete.  The callback function stores the thread in the current queue buffer and then re-launches the transfer for the next queue.  Once the last transfer in the queue is complete the queue buffer is stored in the next available disk buffer slot.  The disk buffers are much larger than the queues as, for optimal disk write performance, it is more efficient to write larger blocks of data (rather than many smaller writes).  Once the ‘queue limit’ of the disk buffer is reached, the buffer is marked as ‘ready for writing’.  A separate disk write thread monitors the disk buffers and, when ready, marks the buffer as ‘writing’ and commits the data to the SSD drive of the host PC.

Along with the main application thread, the GUI uses around 18 concurrent threads to achieve the required performance.  This has the additional benefit of being very suitable for multi-core processors; allowing the application to gain the required end-to-end performance.