
The following article is from RF Baihuatan by Liao qiwen and so on
Translated by Hualink Technology Sunny Li
Aiming at the high speed transmission of PAM4 data in 5G communication, several key technologies for data and clock recovery are proposed, including baud rate sampling, edge selection, optimal sampling and threshold adjustment. A rate of more than 50 Gbit/s in a single channel can effectively reduce the hardware overhead of the chip, reduce the system power consumption, reduce the bit error rate (BER), and improve the robustness of the chip. The above technologies were verified by chip design and wafer fabrication in 65 nm complementary metal oxide semiconductor (CMOS) process. The test results show that the chip recovery clock has 1.08 Ps root mean square (RMS) time-domain jitter. The maximum recovery data rate at 51 Gbit/s can achieve 3.4×10-9 PAM4 signal BER and energy efficiency as low as 6.27 pJ/bit.
1 Clock and data recovery in optical communication
With the birth of big data and cloud computing technology, the amount of data shows explosive growth, and the demand for data communication bandwidth is increasingly strong, while the traditional electrical interconnection technology has been unable to meet the demand for high-speed information transmission. In this context, in order to achieve higher rate of information transmission, optical interconnection, a data transmission scheme using light wave as transmission medium and optical signal as information carrier, has attracted more and more attention. Optical interconnection can be understood as alternatives to traditional optical communication technology in short scene for electrical communication technology and the form, on the basis of has all the technical advantages of the optical communication, but also has higher performance characteristics of integration, low power consumption, low cost, so the light interconnection to realize ultra-low power consumption, long and ultra-high speed, high density of data communication, At the same time, it also has the characteristics of no electromagnetic interference, short delay, long life, safety and reliability. Therefore, optical interconnection technology represents the future development direction of long and medium distance data communication technology. Optical interconnection has the characteristics of low power consumption and high integration, the core of which is integrated photonic technology and integrated circuit technology, mainly including: high-speed laser chip technology, high-speed modulator chip technology, and integrated optical waveguide technology, integrated driver chip technology, photoelectric detection chip technology, and integrated digital signal processing technology.
Optical interconnection system is electro-optical hybrid integration, by the high speed laser and high speed modulator to load the electrical signal to the light wave, so the high speed driver chip is needed for the laser and the modulator to convert the electrical signal to the optical signal. When the non-return to zero (NRZ) data transfer rate exceeds 40 Gbit/s, the bandwidth of electro-optic conversion becomes the bottleneck limiting the speed of optical interconnection. Level 4 pulse amplitude modulation (PAM4) is more and more widely used because it has twice the data transmission rate in the same bandwidth. In the optical interconnection system, when the single channel data rate reaches 25Gbit/s, both the receiver and transmitter need to use clock and data recovery (CDR) circuit to recover high quality data from the high loss signal, and then through the drive circuit to load the data onto the light wave. As shown in Figure 1, the electro-optical and photoelectrical conversion at both ends of the receiving and receiving ends of the PAM4-modulated CDR circuit are core components. In the electric-optical conversion interface, the high speed serial electrical signal through the high loss circuit board leads to a serious signal quality deterioration. PAM4 CDR is used to restore the signal and get low jitter clock and data. In opto-electric conversion interface, due to insertion loss of electro-optic modulator and fiber transmission loss, the lossy signal received by photodetector also needs DATA recovery by CDR.
The difference between CDR and PLL mainly lies in the design of phase discriminator. The basic components of CDR include phase discriminator (PD), charge pump (CP), loop filter, voltage controlled oscillator (VCO), etc. The phase discriminator samples the input data with the OUTPUT clock of the VCO to obtain the control signal. The control signal is output to the charge pump to generate a control voltage to adjust the oscillation frequency of the VCO, so as to restore the high quality clock signal and data. Compared with THE CDR of NRZ signal, the design of PAM4 CDR is much more difficult, which is reflected in the following aspects: First, the quantization of high-speed multi-amplitude signals. PAM4 signal compared with NRZ signal, the signal to noise ratio and high requirement of linearity of the circuit, when the PAM4 signal quantization, need three threshold, as per level between the signal amplitude reduced to one-third of NRZ signal, then the error tolerance of the threshold value are reduced a lot, how to correctly select the appropriate threshold is a difficulty. In addition, due to the deterministic jitter of PAM4 signals, the sampling clock window becomes smaller for high-speed signals, and the requirement for clock jitter increases, which increases the difficulty of clock link design. Second, clock and data alignment for multiple amplitude conversion types in PAM4 signals. PAM4 signal has 12 level switching modes, far more than NRZ signal has 2 modes, and its complexity increases greatly when clock data edge alignment is carried out. Therefore, it is necessary to select a suitable conversion edge for clock and data alignment to reduce the difficulty of phase discriminator design and reduce the bit error rate. Therefore, the high speed PAM4CDR circuit should not only consider the bandwidth, jitter and power consumption of traditional high speed circuit design, but also consider the characteristics of data in the new modulation format to design the circuit.
2 PAM4 clock with recovery key
In optical communication, clock and data recovery technology are used to realize two functions: firstly, the data is sampled several times by using the locally generated multi-phase clock, and the edge of the data bit is judged, and the clock edge is aligned with it by PLL, so as to realize clock recovery with the same frequency and phase of the data. Secondly, the optimal phase of the synchronized clock is used to sample the input data to achieve the highest signal-to-noise ratio, and the sampling results are output as recovered data. Figure 2A) is a traditional oversampling type CDR, in which each bit of data is sampled with two clock phases of data (Dn) and Edge (Edge), and finally the clock of Edge phase is aligned with the data Edge crossing zero. Its working principle is based on the results of comparing the data value of Edge phase sampling with the data Dn of the previous period and the data Dn+1 of the current period: When the results of Edge and Dn are both logical 1 or 0, it indicates that Edge phase is ahead of the Edge of data jump, and the judgment result is "Early". When the results of Edge and Dn +1 are the same, it indicates that Edge phase lags behind the Edge of data jump, and the judgment result is "Late". Then the phase-locked loop reversely adjusts the clock phase according to the result, and finally realizes the alignment of clock phase and data edge. It is not difficult to see that the oversampling CDR requires a clock twice the Nyquist frequency of the data rate to generate enough phases for sampling, which brings about an increase in hardware overhead and system power consumption.
2.1 Baud rate sampling technology
In this paper, we propose a novel sampling technique in which only one sampling clock phase is required for each data bit to align the clock with the data edge. Specifically, this technique uses an integral circuit to calculate the cumulative value of the charge on the capacitor plate between two samples, corresponding to calculating the shaded area covered by the data bits in the figure (as shown in Figure 2A). When the data is flipped on the differential input, the shadow area is positive before and negative after the zero crossing (positive and negative confidence signs are reversed along the horizontal axis). Therefore, when the clock is aligned with the data edge, the positive and negative areas are equal, and the integral value is exactly zero. When the clock is ahead of the data, the positive phase area is larger than the negative phase, and the integral value is the positive non-zero value. Conversely, the integral value is negative when the lag occurs. Since the clock frequency in this technology only needs to be the same as the Nyquist frequency of the data, it is called baud rate sampling technology. This type of CDR can save hardware overhead and reduce system power consumption.
2.2 Edge selection technique
As shown in figure 2 b) for PAM4 signal transient characteristics of the edge: PAM4 difference signal logic level changes will produce multiple "zero" (differential signal voltage equal moment), and the zero point in time is not the average distributed in near the edge of the data, the distribution, associated with the bit type before and after and thus focus on several time near the point of separation. The CDR loop attempts to align the clock phase (e.g., rising edge) to the edge of the data conversion. In this case, the locked clock phase will be distributed at multiple time points, which is equivalent to a random walk in the above interval causing time jitter. The strategy adopted in this paper is to select only the signal edge with the highest probability of occurrence, that is, only the data bits of logic ±3 direction ±3 and ±1 direction ±1 direction as the criterion of clock/data synchronization. In this strategy, all data edges try to align to the same time point to avoid the random walk of clock phase after CDR lock caused by multiple zeros crossing, and avoid the time domain jitter introduced by this.

2.3 Optimal sampling and threshold adjustment
Although PAM4 modulation signals achieve double the equivalent data rate while occupying the same hardware bandwidth compared to NRZ signals, additional design challenges in signal processing and hardware implementation are presented. First, PAM signals are more sensitive to inter-symbol interference (ISI), and the conversion time between logic ±3 is longer than between logic ±1. This means that with the same rise/fall time (or hardware bandwidth) as the edge of the bit, the former has less remaining stability time; In extreme cases, the short bits between ±3 are not fully established to their ideal voltage value, but the conversion between ±1 logic is established in sufficient time. As shown in Figure 3, the results of the above situation are described. If the traditional oversampling principle is adopted and the clock phase sampling data with a difference of 0.5 UI from the edge is used, the highest SNR of the input signal cannot be obtained due to the incomplete establishment of the highest amplitude signal, and the actual sampling point of the maximum SNR is offset to 0.5 UI in the figure.
Secondly, the nonlinear distortion of PAM4 signal is a special problem in multi - amplitude modulation. Specifically, due to the wastage of the electrical signal input channel or the nonlinear, introduced the electric - light switch, PAM4 signal within the three eye may have different eye high with eyes wide, in this case if the CDR using linear distribution of threshold (horizontal black dotted line in figure 3) judging logic level value, will bring ruling error rate rise. For example, in the photoelectric transmitter of direct-modulated vertical cavity surface emitting laser (VCSEL), the rise and fall time of optical eye graph are not symmetrical due to the nonlinear of the laser itself. When the bias current of VCSEL is high, the optical power of logic high is compressed and no longer proportional to the input current; resulting in the eye height of logic ±3 in PAM4 signal is slightly less than 3 times of logic ±1.

3 System-level design
Based on the key technology of PAM4 signal CDR, we have designed and implemented a 50 Gbit/sPAM4 input CDR chip, and its system architecture is shown in Figure 4. PAM4 signal of 25 G Baud/s enters CDR and then reaches PD module. PD is composed of four time-interlaced 1/4 rate channels in parallel, and each channel works at 6.25 Gbit/s. Each PD channel consists of a front-end circuit (PD-FE), a retiming register, PAM4 decoder and a logic circuit module. Pd-fe uses three parallel deciders to quantify PAM4 signal with four levels of information, and outputs the result as a three-bit thermometer code. The decoder is responsible for converting the thermometer code into binary code, which represents the highest weight bit (MSB) of 2 times the weight bit and the lowest weight bit (LSB) of 1 times the weight bit (MSB_6G and LSB_6G in Figure 4) signals. It should be noted that: after PD time interlaced sampling, MSB_6G and LSB_6G are 6.25 Gbit/s NRZ data; The data is converted 4:1 through a serial converter and aggregated to a single 25 Gbit/s channel and then output to the driver for use by optical devices or test instruments. In serial conversion, compared with the traditional structure requiring multiple retiming, we use the orthogonal phase data recovered by CDR to naturally meet the requirements of relative delay of each channel data in the conversion. As shown in figure 4, PD - FE except three ruling machine we also design a new type of integrator, is used to implement the adjacent data points, and gives such adjust clock phase lead/lag (UP) (DN) of the signal, thus the control phase lock loop (PLL) CP on the flow of the loop filter (LPF), closed loop adjusting the clock phase. In this paper, the CDR chip is integrated with the inductor capacitive voltage controlled oscillator (LCVCO) working in 12.5ghz band, and its output signal can be divided by 2 (/2) by the frequency divider to form a 4-phase 6.25ghz clock for PD sampling. The phase-locked loop of CDR consists of PAM4PD, CP, low pass filter (LPF), VCO, /2 divider and duty cycle corrector (DCC). Its bandwidth is designed to be adjustable from 5 MHz to 15 MHz, and is realized by configuring charge pump charge and discharge current on the on-chip register and resistance and capacitance of the loop filter.

4 Circuit module level designs
The most critical circuit module in the clock and data recovery of PAM4 signals is PD, because it operates at a maximum rate of 25G Baud/s throughout the CDR and is responsible for the logical decision of PAM4 signals with inter-symbol interference and nonlinear distortion. As shown in FIG. 4, the core part of PD is its front-end circuit (abbreviated as PD-FE in the figure), which is composed of three parallel data paths and one edge path, each of which contains a decider. Since PAM4 signal has four different logic levels, the adjudicator determines the logic according to three different voltage thresholds, which are the logic threshold 0 corresponding to the +/- symbol decision and the decision threshold ±2 corresponding to the ±3 logic level. The output of each pD-Fe is first amplified by a sensitive amplifier (SA), and then sent to SR latch for logic level regeneration. At this time, the 3-bit data logic signal (DataA, B, C) decided by PD shows the characteristic of thermometer coding and needs to be converted into binary code before data output. Moreover, the indicator signal of the relative position between the current time data and the clock edge is generated by the dual-value logic to guide the phase-locked loop to achieve the real-time alignment of the data and the clock edge.
Figure 5 describes the working principle of the baud rate sampling technique proposed in this paper. Among them, ɸ0, ɸ90, ɸ180 and ɸ270 are four phases 0°, 90°, 180° and 270° in one cycle of the sampling clock, respectively. The PD front end divides the sampling process into three steps: integration stage, retention stage and reset stage within one clock cycle. In the integration stage, the input differential signal separates the output voltage by capacitor discharge according to the amplitude and positive and negative relationship. In the holding stage, the differential voltage output is kept constant for a period of time to be stable as the input of the post-stage SR latch. When the above stage is completed, the data judgment is finished and the reset stage is entered. The output node is pulled up to the power supply voltage again and waits for the capacitor discharge process in the next sampling period. It should be noted that in the above process, only the clock sampling point of the central phase of the data bit is needed, and the edge phase sampling point is saved.

In this paper, only the symmetric code type should be selected for clock alignment when the signal is more than zero. Specifically, the thermometer code data recovered by the phase discriminator is given directly to the edge selector before conversion to binary, where A<2:0> and B<2:0> represent the previous bit and current bit information output by the phase discriminator, respectively. Taking the conversion between logic ±3 as an example, the combined logic part of the edge selector outputs the judgment result of logic high level if and only if the above code type is recognized. The result is re-timed in single-phase clock register (TSPC) and gate logic, and used as the signal of optional communication signal to control the clock phase regulation of BB logic. The phase regulation is transmitted to the next level only under the premise of valid code type, otherwise the whole PLL loop has no action. Because the CDR in this paper adopts the method of 1/4 rate time interleaved data sampling, the initial data recovery rate of each channel is 6.25 Gbit/s. The above 4 bits of data were converted 2:1 parallel to serial twice under the control of 6.25ghz and 12.5ghz clocks, and finally output using a voltage-type serial resistance terminal (SST) driver. The on-resistor of the inverter at the output end together with the series resistor at the output end constitutes a matching terminal of 50 ohm to absorb the signal reflection from the load channel. In the traditional 2:1 parallel to serial conversion, five latches are used to align the two channels of data first and then translate and stagger each other 180°. In this paper, we use the direct structure in the second 2:1 conversion to save the above dynamic latch circuit by using the timing difference between multiple phases.
5 Test result
In this paper, PAM4 signal clock and data recovery technology proposed by us were verified in 65 nm and 40 nm complementary metal oxide semiconductor (CMOS) process, respectively. Figure 6 shows a 65 nm 50 Gbit/sPAM4 CDR chip using baud rate sampling technology, which occupies a total area of 1.36 mm 2 and includes a pad and a 3 kV ELECTROSTATIC damage (ESD) protection ring. The function and performance of the chip are tested by the package form of direct PCB gold wire bonding. To measure CDR bit error rate, PAM4 signal of 25 G Baud/s uses PRBS-9 encoding as input, which undergoes high-frequency loss of coaxial cable and on-board transmission line before reaching the chip under test. The CDR chip recovered two 25 Gbit/s NRZ signals corresponding to MSB and LSB bits after PAM4 demodulation respectively from the input PAM4 signal, and output them to a 30 G error code instrument to measure the corresponding BER.

In a related measurement of clock recovery performance, a dual-channel 64 GSs sampling rate arbitrary waveform generator was used to generate 50 Gbit/sPAM4 input signals, while an 80 GSs real-time sampling oscilloscope was used to measure the 12.5 GHz clock recovered by CDR. As shown in Figure 7, the clock has RMS jitter of 1.08 ps and peak-to-peak jitter of 8.4 ps. The closed-loop phase noise of CDR phase-locked loop is at 1 kHz, 8.1 MHz and 120 MHz frequency offset respectively. The test results are as follows: -93 dBc/Hz, -112 dBc/Hz and -122 dBc/Hz.

Figure 8 shows the bit error rate (BER) of real-time data recovery of THE CDR chip in this paper. During measurement, the chip is output to the error code meter and compared with the original data of the signal source bit by bit in real time. Where, the BER corresponding to MSB data output is 0 (lower than 10-12, no error code output). The BER corresponding to LSB data is about 3×10-9, which meets the requirements of PAM4 communication network with forward error correction function (common values are 10-4 to 10-6). The test results show that because the CDR input of PAM4 chip has no integrated channel equalization, there is obvious inter-symbol interference when PAM4 signal reaches CDR, which limits BER not reaching 0 completely.

6 Conclusion
In this paper, we introduce the challenges and potential solutions of circuit design in single-channel 50 Gbit/s optical communication with high speed PAM4 modulation, and propose a data and clock recovery technology based on integrated front-end circuit, which can save nearly half of the hardware cost and power cost of phase discriminator and clock network at high rates. The above technology was verified by 65NM MOS process, and the time-domain jitter of 1.08 ps RMS was obtained under the maximum 51 Gbit/s PAM4 signal input, and the data error rate of 3×10-9 and energy consumption efficiency of 6.27 pJ/bit were achieved.
Authors: Liao Qiwen, Patrick Yin CHIANG, Qi Nan
Source: ZTE Communications Technology





