- 1 Overview
- 2 FPGA code
- 3 Cypress FX3 firmware
- 4 Linux GUI capture application
The Domesday Duplicator is a completely open-source and open-hardware solution. All required files to construct the hardware and all of the source-code is available on Github.
The Github repository contains the following items:
- Kicad schematics and PCB design
- FPGA Verilog HDL code for the DE0-Nano
- GPIF II state-machine design for the FX3
- FX3 firmware for the Cypress FX3 board
- Linux (Ubuntu 16.04 LTS) GUI capture application
The Github repository is accessible via the following link: Domesday Duplicator Github
Please note that the software is still under heavy development and this section will be updated once a release version is ready.
The development environment for all parts of the Domesday Duplicator software is Ubuntu 16.04.3 LTS with the exception of the Cypress FX3 GPIF design utility (which, despite Cypress stating Linux compatibility for their product, does not run on Linux (and is an essential part of their development tool-chain)).
The DE0-Nano FPGA board is used to bridge the Domesday Duplicator’s ADC hardware with the Cypress FX3 USB 3.0 board. The code provides data manipulation and conversion, error checking and a 32K word FIFO buffer to allow buffering in case of short constrictions of USB bandwidth to the host computer. The FPGA code also contains a test data generation function that allows testing of the Domesday Duplicator with known test data (that is verified as intact once received by the host application).
The development environment for the FPGA code is Intel Quartus Prime Version 17.1.0 Build 590 10/25/2017 SJ Lite Edition running on Ubuntu 16.04.3 LTS.
Programming the DE0-Nano
In order to program the DE0-Nano so that the Domesday Duplicator software is loaded and executed on power up, it is necessary to program the EPCS64 serial configuration device. Instructions for programming the serial device can be found in the DE0-Nano user manual available from Terasic.
Firstly load the project and use Processing->Start Compilation to begin the compilation process and generate the .sof programming file. To temporarily program the DE0-Nano simply use Tools->Programmer and flash the DE0-Nano board using the .sof file.
To program the DE0-Nano permanently follow the instructions give on page 146 of the DE0-Nano user guide – section 9.1 – Programming the Serial Configuration Device.
The DE0-Nano User Manual is available from this link.
Source code modules
This module is the top-level verilog module and contains the hardware mapping information for the communication between the FPGA and the ADC as well as the communication between the FPGA and the FX3.
The module also includes the instantiation code for the Intel IP PLL function that generates the require 64MHz clock (for FPGA to FX3 communication) and 32MHz clock (for ADC to FPGA communication).
The top-level module includes two sub-modules ‘dataGenerator’ and ‘fx3StateMachine’. The purpose of these modules is described below.
The data generator module is responsible for generating data either from the ADC or (if in test mode) internally. When in test mode the generator outputs a repeating sequence of 10-bit numbers (0 to 1023).
The module passes all 10-bit data (test or ADC) through the convertTenToSixteenBits module described below.
Data from the ADC is read on the negative edge of the ADC clock and passed into the 32MHz write-side of the dual-clock FIFO. The dual clock FIFO is implemented using the Intel IP DCFIFO. If in test mode ADC data is ignored and a sequence is passed to the write-side of the FIFO instead. Data is only written to the FIFO if the collectData flag is set.
The dual-clock FIFO is 10-bits wide and 32767 words deep.
Data is read from the 64MHz read-side of the dual-clock FIFO on the positive-edge of the FX3 clock (64MHz) and passed via the convertTenToSixteenBits module to the data output. Data is only read from the FIFO when the readData flag is set.
The converTenToSixteenBits module takes unsigned 10-bit data as input and outputs a scaled and signed 16-bit value (little endian).
The fx3StateMachine module implements the required mirror state-machine for the GPIF II implementation (detailed below). The state-machine has two states:
- state_waitForRequest – The state-machine waits for the GPIF II state-machine to indicate a transfer is about to begin.
- state_sendPacket – The state-machine waits for 8192 clock cycles (whilst data is transferred) before returning to the waitForRequest state.
Cypress FX3 firmware
The Cypress FX3 firmware provides a DMA driven data transfer between the FPGA and the USB 3 compatible host computer. A GPIF state-machine design is used to automatically read data from the FPGA and transmit it via USB 3 with minimal interaction of the FX3’s ARM processor.
The FX3 firmware is developed in the Cypress EZ USB Design Suite for Linux and cyusb_linux_1.0.4. The GPIF II state-machine design is developed in Cypress GPIF II Designer 1.0 (which is only available for Windows).
Programming the FX3
The Cypress FX3 is programmed using the cyusb_linux utility included with the Cypress Linux SDK. To program the device please see the following steps:
- Close/short jumper J4 (PMODE) on the FX3 Superspeed board
- Power off the FX3 and then power it on again
- Load the cyusb_linux application
- Highlight the FX3 bootloader device at the top of the window
- Click on the ‘program’ tab
- Select I2C EEPROM
- Click on ‘select file’ and select the FX3 programming file from disk
- Click on ‘Start download’ to write the programming file to the device
- Wait for programming to complete
- Remove the jumper from J4
- Power off the FX3 and then power it on again
The following diagram shows the IO matrix configuration for the FX3 GPIF implementation:
The purpose of the signals are as follows:
- CLK – This is the GPIF clock supplied by the FPGA (64MHz)
- dataAvailable – This signal indicates that there is sufficient data in the FPGA’s FIFO buffer for a transfer
- bufferError – Set when the FPGA’s FIFO buffer is about to overflow
- GPIO_22_CTL05 – Unused (for debugging purposes)
- Databus – The 16-bit data bus from the FPGA to the FX3
- nReset – (not) reset condition signal from the FX3 to the FPGA
- collectData – Flag from the FX3 to the FPGA that indicates if the FPGA should collect ADC data
- readData – Flag from the FX3 to the FPGA indicating that a transfer is about to begin
- testMode – Flag from the FX3 to the FPGA indicating that the FPGA should generate known test data (for test mode)
The following diagram shows the GPIF II state machine design for the FX3 GPIF implementation:
The state machine is designed to use the automatic transfer feature of the FX3 where the incoming data from the FPGA is automatically moved to the USB interface by the GPIF module with minimal interaction with the FX3’s ARM processor. The design uses two GPIF ‘threads’ to ensure minimum delay between transfers. The GPIF design is configured with 16Kbyte buffers and data is automatically committed to the USB interface once a buffer is full.
As the GPIF interface is synchronous both threads enter a wait state until the FPGA signals that enough data is available for a transfer. Once this flag is received the GPIF changes to the ‘request’ state where it signals to the FPGA that a transfer is about to start (the TH0_REQUEST and TH1_REQUEST states repeat for 3 clock cycles to allow time for the FX3 to send the signal and the FPGA to receive it).
Once the state-machine enters the read state, 8192 16-bit words of data are transferred between the FPGA and the FX3 (filling the available 16Kbyte buffer), the state-machine then commits the data to the USB interface and returns to a wait state. This design allows for a deterministic transfer with minimal signalling complexity whilst allowing for the non-deterministic nature of the DMA ready state on the FX3 (the buffer ‘ready’ is unpredictable due to the reliance on the host computer to transfer data in a timely manner). By using a combination of deterministic and non-deterministic states the GPIF design provides an asynchronous data transfer via the synchronous interface with minimal overhead to ensure high-bandwidth of data transfer.
Since the FPGA to USB interface is 64MHz (compared to the capture rate of 32MHz) this design allows the USB interface to ‘catch-up’ rapidly whenever there is a drop in the bandwidth across the USB interface.
Source code modules
This file contains the proprietary start-up code necessary for the FX3 to function – Cypress why-oh-why would you not release this open-source?
This file contains the Cypress USB 3.0 Platform source file – again under a proprietary license (and necessary for the FX3 to function)
This file contains the main functions for the Domesday Duplicator firmware. All functions are heavily commented; please see the Github repository for details.
This file contains the definitions and header file information for domesdayDuplicator.h
This file contains the state-machine definition code generated by the GPIF II designer application. The original GPIF II designer project is also included in the Github repository.
This file contains the USB descriptor information for the Domesday Duplicator USB device.
The FX3 Superspeed explorer board provides a USB 2.0 service debug output. Connecting this output to a suitable machine with a serial terminal allows monitoring of the debug information from the FX3 firmware.
Linux GUI capture application
The Linux GUI application provides a capture front-end for the user. The application also provides a high-speed multi-threaded USB implementation that allows extremely high-speed data transfer from the FX3 in real-time. In addition, a multi-buffer disk IO implementation deals with writing the large amounts of capture data to disk in a timely manner. The Linux application is also capable of sending vendor-specific USB commands to the Domesday Duplicator in order to control and configure the capture device.
The Linux GUI application is developed using Qt Creator 4.4.1 for Linux (Based on Qt 5.9.2). Although QT is a cross-platform development tool, the application is only tested under Ubuntu 16.04.3 LTS. The underlying USB library is libUSB (not that the Cypress ‘library’ is not used as it is simply a shim-code over libUSB designed to taint source code with Cypress’ own proprietary licensing).
Source code modules
This module contains the QT application start up code.
This module contains the functions for the main window class.
This is the header file for mainwindow.cpp.
This module contains the GUI form design for the main window.
This module contains the USB device interface class.
This is the header file for usbdevice.cpp
The Linux GUI application uses a modified version of the QtUSB library available as GPL open-source here. As well as adding bulk, multi-threaded USB I/O and disk I/O, the implementation also contains bug-fixes to the original library and additional functions for supporting USB vendor specific commands. Thanks go to Fabien Poussin for this library and for releasing it with an open-source licence.
Multi-threaded USB transfer architecture
The following diagram shows the approximate structure of the multi-threaded architecture used by the GUI application to achieve the required USB and disk bandwidth:
The USB interface is processed using multiple transfer threads which are ‘in flight’ at any one time (configurable from the definitions in the bulk-transfer code). The collection of transfers are called the ‘queue’. This causes minimal latency when reading data from the USB device. Each thread causes a ‘callback’ once the transfer is complete. The callback function stores the thread in the current queue buffer and then re-launches the transfer for the next queue. Once the last transfer in the queue is complete the queue buffer is stored in the next available disk buffer slot. The disk buffers are much larger than the queues as, for optimal disk write performance, it is more efficient to write larger blocks of data (rather than many smaller writes). Once the ‘queue limit’ of the disk buffer is reached, the buffer is marked as ‘ready for writing’. A separate disk write thread monitors the disk buffers and, when ready, marks the buffer as ‘writing’ and commits the data to the SSD drive of the host PC.
Along with the main application thread, the GUI uses around 18 concurrent threads to achieve the required performance. This has the additional benefit of being very suitable for multi-core processors; allowing the application to gain the required end-to-end performance.