RF Fingerprinting for Defeating MAC Address Spoofing
802.11 wireless devices rely on Media Access Control (MAC) addresses to identify wireless stations. However, MAC addresses can be easily faked or ”spoofed” by attackers attempting to impersonate other stations. This project applies the technique of RF fingerprinting to verify the identity of 802.11 wireless devices at the physical layer. While prior research into RF fingerprinting has looked at hand-crafted features, this project takes a deep-learning approach by training neural networks to perform classifications of raw I/Q samples captured by a software defined radio. This approach will be validated by constructing a proof-of-concept system for automatic identification of spoofed beacon packets in a realistic wireless environment.
This project is being conducted as my senior thesis advised by Professor Paul Prucnal and Thomas Ferreira de Lima of the Princeton Lightwave Communications Research Lab.
Radio-frequency (RF) fingerprinting generally refers to techniques that leverage imperfections or features of devices’ wireless transmissions to identification and classification tasks. Although this thesis looks at the specific task of identifying Wi-Fi devices, these RF fingerprinting methods are also applicable to other wireless communications systems. The first broad category of fingerprinting techniques look at transient features occurring during the powering up of the device's transmitter and RF amplifiers. A second class of analysis methods analyze steady state properties derived from the main body of a transmission. The primary advantages of such an approach come from the less stringent detection requirements compared to analysis of temporally short transients. However, steady state analysis depends on the existence of common portions of the signal that exist independent of any user-defined data.
The core of the system is a powerful server running CentOS 7. Equipped with dual NVIDIA GTX 1080 GPU's, this server will also be used to train and run the neural networks used to perform RF fingerprinting and classification.
A USRP X310 software defined radio is the primary device responsible for capturing raw RF waveforms. The USRP is initially equipped with a SBX-120 daughterboard permitting 120 MHz bandwidth receive capability between 400 MHz and 4.4 GHz. This enables the simultaneous sampling of the entire 2.4 GHz Wi-Fi band. Samples are streamed to the host computer over an 10 Gb/s Ethernet link.
For capturing higher level information about Wi-Fi packets, a Alfa USB Wi-Fi dongle configured in monitor mode was used. This specific dongle was chosen primarily for its comparability with Wireshark and the Aircrack-ng software suite needed to enable promiscuous monitoring of Wi-Fi packets.
Finally, a set of target Wi-Fi devices were chosen. To test the ability of the RF fingerprinting methodology to distinguish between different devices, a range of commerically available consumer and business access points were acquired. For the more difficult task of identifying different devices with the same model and chipset, four ESP32 Wifi Modules were used to provide a higher level control over the transmitted packets and eliminate confounding factors.
The packet recording process was designed to be as automated as possible. The goal was to detect incoming Wi-Fi packets and dump the raw complex I/Q samples into a binary file for later processing.
Control of the USRP software defined radio was managed through the GNU Radio open source signal processing library. GNU Radio offers a Simulink-style block based graphical interface for designing complex signal processing "flowgraphs." These flowgraphs are then converted into Python modules which can be independently executed. To determine the beginning and end of a Wi-Fi packet in the complex sample stream, a basic RMS power threshold was applied.
Beyond identifying packet start and end, the GNU Radio flowgraph was also performed Wi-Fi channel selection by frequency translation and band pass filtering. The full flowgraph is shown above.
Preliminary testing has shown the viability of the presented automatic packet capture and labeling system. Once proper thresholds were set, the flowgraph consistently captured all beacon packets transmitted by the target access point during the recording period. Furthermore, the packet separation block worked well even for very closely spaced packets.
Furthermore, visual inspection of the captured training data set shows distinct differences in the packets transmitted from different devices. Even with the same packet contents, taking the average of the Fourier Transforms for the packets sent by two of the ESP32 devices shows clear visual difference in the packets' spectral characteristics.
Now that a system has been established for collecting Wi-Fi data, work can begin on using this data to start exploring deep learning architectures for performing fingerprinting. It is highly likely that certain elements of the packet collection process may be tweaked to better suit the needs of machine learning algorithms.
Finally, while the current labeling scripts are run offline after the packet capture has completed, it would be interesting to create a version that runs online in parallel with data collection. This would ultimately permit live packet identification, which would significantly enhance the completeness and practicality of the proposed fingerprinting system.