A FAST, CUSTOM FPGA-BASED SIGNAL PROCESSOR AND ITS APPLICATIONS TO INTRA-TRAIN BEAM STABILISATION

John Adams Institute for Accelerator Science at the University of Oxford, Oxford, UK
1also at CERN, Geneva, Switzerland

Abstract

A custom 9-channel feedback controller has been developed for low-latency applications in beam-based stabilisation. Fast 14-bit ADCs and DACs are used for high-resolution signal conversion and a Xilinx Virtex-5 FPGA is used for core high-bandwidth digital computation. The sampling, and fast digital logic, can be clocked in the range 200 to 400 MHz, derived from an external or internal source. A custom data acquisition system, based around LabVIEW, has been developed for real-time control and monitoring at up to 460 kbps transfer rates, and is capable of writing and reading from EPICS data records. Details of the hardware, signal processing, and data acquisition will be presented. Two examples of applications will also be presented: a position and angle bunch-by-bunch feedback system using strip-line beam position monitors to stabilise intra-train positional jitter to below the micron level with a latency less than 154 ns; and a phase feedforward system using RF cavity-based phase monitors to stabilise the downstream rms phase jitter to below 50 fs with a total latency less than the 380 ns beam time-of-flight.

INTRODUCTION

Many modern particle accelerators and future colliders require the generation and preservation of low emittance beams with a high degree of stability. Future electron-positron collider designs, such as the International Linear Collider (ILC) [1] and the Compact Linear Collider (CLIC) [2], call for beam spot sizes of 5 nm and below at the interaction point (IP), in order to maximise the luminosity. In order to achieve the design luminosity, $2 \times 10^{34} \text{cm}^{-2} \text{s}^{-1}$ in the case of ILC, in the presence of ground motion and facilities noise, a fast feedback is envisaged, operating at the interaction point, to correct the incoming position error of one beam with respect to the other, within the duration of the bunch train. Prototypes of such systems have been developed by the Feedback On Nanosecond Timescales (FONT) group. Initially, a purely analogue system was developed for room temperature RF cavity based linear collider designs, such as the NLC, and achieved a system latency of $\sim 23$ ns, on a 56 ns duration bunch train at the KEK Accelerator Test Facility (ATF) in 2005 [3]. Following the choice of superconducting RF for the ILC, with $\sim 1000$ bunches separated by $\sim 500$ ns, a digital IP feedback system prototype was developed, using a custom FPGA-based digital feedback controller. This has been tested and used extensively at ATF, and has been employed in the beam stabilisation efforts at the ungraded ATF extraction line, ATF2. The ‘FONT5’ feedback controller has also found applications in other beam stabilisation systems requiring low latency, for example, the CLIC drive beam phase feedforward demonstration at the CLIC Test Facility (CTF3). Details of the feedback controller, including data acquisition, will be presented as well an overview and key results from the applications above.

FONT5 FEEDBACK CONTROLLER

Figure 1 shows the PCB and front panel of the FONT5 digital feedback controller. The board is based around a Xilinx Virtex-5 FPGA (XC5VLX50T) [4], with a maximum speed of 550 MHz and 2160 Mb integrated block memory. The board has nine analogue input channels using 14-bit ADCs [5], capable of digitising up to 400 MSPS. Only the most significant 13-bits are connected to the FPGA however, in order to reduce routing congestion, and hence ease timing closure in the FPGA fabric. The ADCs have a low-latency (3.5 clock cycles) making them very suitable for fast feedback applications. The nine channels are arranged as three banks of three, with each bank sharing a common ADC clock. Offset DACs are provided to trim the ADC pedestals. The board also features four 14-bit DACs [6], with a maximum conversion speed of 210 MHz and 0.5 clock cycle latency. As for the ADCs, only the upper 13-bits are connected to the FPGA.

An on-board 40 MHz oscillator is provided for clocking slow logic and ancillary functions, as well as a fast comparator for an external system clock, usually in the range 200 to 400 MHz. A fast system clock can either be sourced externally, with an optional PLL-based jitter filter, or synthesised internally using a digital clock manager. Two programmable-level digital inputs are provided, which are normally used for trigger inputs, as well as several buffered and non-buffered I/Os. Communication to the FPGA is made via an RS-232 connection, running at up to 460.8 kbps. 128 7-bit control registers are used to communicate commands and variables to the FPGA, and up to 1024 samples per channel can be stored in Block RAM and transmitted via a UART, alongside read-backs of the control registers and status bytes. ADC data is displayed and saved to file using custom DAQ software written in LabVIEW. This software can also set control registers, and load RAM tables on the FPGA. Data and settings can be published as EPICS process variables, and infor...
motion from other systems can be incorporated via EPICS channel access.

Figure 1: FONT5 digital feedback controller.

POSITION AND ANGLE STABILISATION AT ATF2

The ATF2 project at KEK, in Japan, provides an energy-scaled mock-up of the compact final-focus system for ILC, in the extraction line of the ATF. It aims to, firstly, demonstrate production of a 37 nm vertical spot size at the focus, and, secondly, demonstrate stabilisation of the beam spot at the nanometre scale. The FONT5 system is a two-phase vertical position correction system upstream in the extraction line, Fig. 2, designed to stabilise the beam to the micron level at the entrance to the final-focus. The system consists of three stripline beam position monitors (BPMs) on movers, P1, P2 and P3, and two stripline kickers, K1 and K2. The BPMs, P2 and P3, and kickers, K1 and K2, are approximately orthogonal in betatron phase, and each kicker is driven by a linear combination of both BPMs, such that the beam jitter is corrected at both phases. The other BPM, P1, is used as a diagnostic for the incoming beam jitter, and for measuring the resolution. High-sensitivity analogue front-end signal processors down-mix the high frequency BPM signals to below 100 MHz for digitisation. A measurement resolution of ~300 nm has been demonstrated, at a bunch charge of approximately 1 nC, for a linear range of ±500 µm [7]. The use of BPM movers, with a range of ±1.5 mm, ensures that the BPM measurement is not limited by changes in beam orbit in the extraction line.

The BPM processors produce three output signals; two quadrature phase sum signals, and one difference signal. The difference between opposing strips in the BPM being proportional to beam intensity and position offset, and the sum being proportional to intensity alone; a difference-over-sum algorithm is employed to remove the dependence on intensity. The difference output is guaranteed to be matched in time with the in-phase sum signal, and so a quadrature phase difference signal is not necessary. The quadrature phase sum signal is highly sensitive to timing jitter of the beam signal with respect to the phase of the local oscillator, which itself is sourced from the machine RF. Therefore, by characterising the degree of sensitivity for each BPM to the variation in phase, it is possible to remove offline any contribution to the apparent position jitter caused by phase jitter; for example, that originating from the synchrotron oscillation of the bunches. In practice, for use in the real-time feedback system, phase shifters placed between the BPM and the processor are used to remove any residual path difference between the processor inputs, and hence minimise the sensitivity to phase jitter. The drive signal is applied to the kickers using custom kicker drive amplifiers [9].

For each BPM, the three signals from the BPM processors are digitised by the ADCs on the FONT5 board, with a common ADC clock used for all signals from each BPM, as the signals from different BPMs will arrive asynchronously at the FONT5 board. The signals from the BPM processor have a width of around 5 ns, and so to reliably capture these, a fast ADC clock is used, derived from the machine RF, such that the sampling remains phase-locked to the beam. In practice, a frequency of 357 MHz is used, which corresponds to the sub-harmonic bunching frequency. The Virtex-5 IODELAY [10] elements are used to ensure that the signal waveforms are sampled at the peak of the in-phase sum signal. These provide a 64-tap delay line, with a tap resolution of 75 ps. The ADC clocks are provided via the FPGA, and a ‘data ready’ signal from the ADCs is used in a feedback to adjust the centre of the data-eye to the capture edge of the system clock. The output delays on the ADC clocks are then scanned to find the peak of the signal, with the data and ‘data ready’ input delays being modified in concert.

A timing signal from the extraction kicker is used as a trigger, around 1 ms before beam is extracted from the damping ring. As this signal is not guaranteed to be stable with respect to the bunching frequency, a secondary counter is used, the period of which corresponds to the revolution period of the damping ring. The primary counter is therefore used to select the correct cycle of the secondary counter, in which the bunches will occur; and the timing of the bunches is hence locked to the secondary counter. Data is acquired, and the feedback system is active, during this cycle of the secondary counter, which corresponds to 462 ns. The ATF can
extract up to three bunches with an ILC-like bunch spacing, with a \( \sim 310 \text{ ns} \) duration kicker pulse.

The firmware for the feedback application consists primarily of charge normalisation, gain application, a 'delay loop' to provide memory of the correction signal for subsequent bunches, and FIR filtering, to account for droop in the output stages and amplifier. Gain and charge normalisation are applied via look-up tables implemented in Block RAM, which can be loaded in real-time from the LabVIEW DAQ software. All nine channels of ADC data, as well the kicker drive signals applied to the DAc's, and the values of the control registers and status bytes, are read-out to the DAQ at 460.8 kbps, every 3.12 Hz. This data then is published as EPICS process variables via the National Instruments EPICS I/O Server [11].

Figure 3 shows the result of the feedback operation as measured at P2, P3 and at MFB1FF, a location with high vertical beta-function approximately 30 m downstream used to witness the correction. A minimum latency of approximately 140 ns has been demonstrated previously [12], and for these tests a beam consisting of two bunches separated by 182 ns was used. The feedback tests therefore involve measuring the vertical position of bunch one and correcting the vertical position of bunch two. The system was typically operated in an ‘interleaved’ mode, whereby the feedback correction was toggled on and off on alternate machine pulses; the feedback ‘off’ pulses thereby provide a continual ‘pedestal’ measure of the uncorrected beam position. The position jitter is reduced from 1.6 \( \mu \text{m} \) to 610 nm at P2, and from 1.8 \( \mu \text{m} \) to 520 nm at P3. This factor of \( \sim 3 \) reduction in jitter is successfully preserved out to MFB1FF, with the beam jitter being stabilised from 30 \( \mu \text{m} \) to below 10 \( \mu \text{m} \).

**Figure 3:** Distribution of the vertical position of bunches 1 and 2 in P2, P3 and MFB1FF with (red) and without (blue) application of the feedback correction. Values of the position jitter are quoted for each BPM.

**DRIVE BEAM PHASE STABILISATION AT CTF3**

The two-beam acceleration concept for CLIC [2], places strict requirements on the phase stability of the drive beam. To limit luminosity loss from emittance growth due to energy jitter to less than 1%, the phase of the drive beam needs to be stable to within 0.2 degrees (at 12 GHz), or \( \sim 50 \text{ fs} \), with respect to the main beam. To this end, it is envisaged to have a phase feedforward (PFF) system, at each of the drive beam decelerator sections along the CLIC linacs. For each system, the path length through a four-bend magnetic chicane is varied using fast electromagnetic kickers situated around the bending magnets. The phase offset is measured at the entrance to a turn-around loop, with the correction chicane at the exit of the loop. The system latency is therefore designed to be less than time it takes the beam to traverse the loop.

A prototype of such a system has been tested at CTF3, at CERN. This system uses three high precision 12 GHz RF cavity-based phase monitors and two stripline kickers [13,14], a high-power, high bandwidth amplifier system, and the FONT5 digital feedback controller. The system layout is shown in Fig. 4, where only part of the CTF3 facility is shown for clarity. Two phase monitors are located upstream of the TL1 transfer line into the CTF3 combiner ring; one of these is used as the phase input to the PFF system, the other to cross-check the monitor performance. The kickers are located downstream, prior to the first and last dipole in a four-bend dog-leg chicane in the TL2 transfer line between the combiner ring and the CLIC Experimental Area (CLEX). By varying the voltages applied to the two kickers the beam can be deflected onto longer or shorter paths through the chicane, hence correcting the phase. The third phase monitor is located downstream of the correction chicane, in CLEX, to witness the phase correction. For this demonstration, uncombined beam, bunched at 3 GHz, has been used, and the time-of-flight of the beam from the upstream monitor to the correction chicane, including half a turn of the combiner ring, is \( \sim 380 \text{ ns} \); thereby defining the latency constraint for the correction system. The demonstration aims to achieve 0.2 degrees phase stability, at a bandwidth above 30 MHz.

**Figure 4:** Simplified schematic of the PFF system. Red and blue lines depict orbits for bunches arriving late and early at the first phase monitor, \( \phi \), respectively. The trajectory through the TL2 chicane is changed using two kickers, \( K \).
As well as minor modifications to the lattice to accommodate the correction kickers [15], new optics were required for TL2 to maximise the change in phase as function of applied kick (R52), whilst minimising the effect of energy jitter on phase (R56), and ensuring orbit closure after the chicane [16]. The phase monitor noise was measured by comparing the residuals between the two upstream monitors, which yielded a resolution at best of 0.14 degrees (at 12 GHz), with a typical performance around 0.2 degrees. The modular amplifier design, utilising SiC FETs, provides up to 20 kW output power, with 50 MHz bandwidth. This provides ±700 V of drive to the kickers, which coupled with the optics design, gives ±5.5 degrees of phase change in the chicane.

Custom application firmware was also written for the FONT5 boards to process the down-mixed phase monitor signals, as well as apply gain and offset control, in order to provide the correct magnitude of drive, and to centre the limited correction range with respect to the measured upstream phase. Programmable delay, using 32-tap shift registers, was also included to accurately match the timing of the correction signal to the arrival of the beam in TL2. The overall system latency is dominated by delay in the cables between the kicker amplifiers and kickers, which is constrained by the routing of available cable trays at around 175 ns. The FONT5 takes a minimum of 20 clock cycles (at 357 MHz) for the signal processing, with seven clock cycles of timing slack, taken up by the digital delays.

An example, illustrating the operation of the phase feedforward system, is given in Fig. 5. The mean upstream and downstream phase measurements over 75 machine pulses are compared, both with and without operation of the PFF system. The section of the pulse which is correctable within the limits of the amplifier is shown by the black vertical lines; outside of these limits the amplifier is in saturation and produces a roughly constant phase offset with respect to the uncorrected phase. Within the time region marked by the black lines, the PFF system reduces the RMS downstream phase from 1.68 ± 0.02 degrees to 0.26 ± 0.01 degrees of 12 GHz.

ACKNOWLEDGMENTS

We are grateful for the help and support of our colleagues and collaborators at KEK-ATF, CERN-CTF3, INFN Frascati, IFIC Valencia, and the ATF2 Collaboration. Work supported by the European Commission under the FP7 Research Infrastructures project Eu-CARD, grant agreement no. 227579.

REFERENCES

[9] TMD Technologies Ltd., www.tmd.co.uk