Skip to main content

Command Palette

Search for a command to run...

Can LLMs Rewrite Python Code in C++ for Faster Network Analysis?

Published
6 min read
B

Software Security Engineer

I’ve been thinking about something lately: Can Large Language Models (LLMs) like GPT-4 help developers rewrite Python code into a faster language like C++ to boost performance? I work on writing small scripts to help with my security tasks, so I know that Python is great for scripting and trying out new ideas quickly. But when it comes to speed, especially for tasks that need real-time processing, Python does have its drawbacks.

So, I decided to put this idea to the test. I'll be rewriting a simple network packet sniffer—commonly used in network security to monitor traffic and detect threats—from Python to C++. The goal? To see if I can prove that rewriting the code in C++ will make it run faster and more efficiently, and if LLMs can help with this process.

Step 1: The Original Python Script

First, I need a baseline to compare performance. I wrote a simple packet sniffer in Python using the scapy library, which is a powerful Python library for network traffic manipulation. This script captures network packets on a specific interface and prints out basic information such as source and destination IP addresses and port numbers.

packet_sniffer.py

from scapy.all import sniff, IP, TCP
import time

total_packets = 0  # Total packets captured
ip_tcp_packets = 0  # Packets with both IP and TCP layers

# Function to process captured packets
def process_packet(packet):
    global total_packets, ip_tcp_packets
    total_packets += 1  # Increment total packet count
    print(packet.summary())  # Print a summary of each captured packet (for debugging)

    # Check if packet has both IP and TCP layers
    if packet.haslayer(IP) and packet.haslayer(TCP):
        ip_tcp_packets += 1  # Increment count for packets with IP and TCP layers
        ip_layer = packet.getlayer(IP)
        tcp_layer = packet.getlayer(TCP)
        print(f"Packet: {ip_layer.src} -> {ip_layer.dst} | {tcp_layer.sport} -> {tcp_layer.dport}")

# Start time
start_time = time.time()

# Start sniffing packets on interface 'en0' 
sniff(iface='en0', prn=process_packet, count=150)  # Adjust 'count' as needed

# End time
end_time = time.time()

# Print statistics
print(f"Total packets captured: {total_packets}")
print(f"Packets with IP and TCP layers: {ip_tcp_packets}")
print(f"Time taken: {end_time - start_time:.2f} seconds")

This script is pretty straightforward: it captures 150 packets on en0 and processes them using a callback function. If you want to know your network interface, run the command ifconfig on macOS or Linux, or ipconfig on Windows, and look for the active network interface to replace en0 with your specific one.

What I'm Going to Do: I'll run this Python script and measure the time it takes to capture and process 150 packets. This will give me a baseline to compare against when I rewrite it in C++.

Step 2: Rewriting the Code in C++ Using libpcap

Now comes the interesting part: rewriting the same functionality in C++ using libpcap, a well-known library for network packet capture. This is where LLMs like GPT-4, Claude .. etc come in handy. They can help translate the logic from Python to C++ while considering the nuances of lower-level programming.

Here’s the equivalent C++ version: packet_sniffer.cpp

#include <iostream>
#include <pcap.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <chrono>

// Counters for total packets and packets with IP/TCP
int total_packets = 0;
int ip_tcp_packets = 0;

// Function to process each packet
void packet_handler(u_char *user_data, const struct pcap_pkthdr* pkthdr, const u_char* packet) {
    total_packets++;  // Increment total packet count

    // Print a summary of each packet (for debugging)
    std::cout << "Packet captured: Length = " << pkthdr->len << " bytes" << std::endl;

    // Parse IP header (skip Ethernet header)
    const struct ip* ip_header = reinterpret_cast<const struct ip*>(packet + 14);  // Ethernet header size is 14 bytes

    // Check if the packet contains an IP layer
    if (ip_header->ip_v == 4) {  // IPv4 check
        // Parse TCP header (skip IP header)
        const struct tcphdr* tcp_header = reinterpret_cast<const struct tcphdr*>(
            packet + 14 + (ip_header->ip_hl * 4)
        );  // IP header size is ip_hl * 4 bytes

        // Check if the packet contains a TCP layer
        if (ip_header->ip_p == IPPROTO_TCP) {
            ip_tcp_packets++;  // Increment IP/TCP packet count

            // Print IP and TCP header information
            std::cout << "Packet: " << inet_ntoa(ip_header->ip_src) << " -> " << inet_ntoa(ip_header->ip_dst)
                      << " | " << ntohs(tcp_header->th_sport) << " -> " << ntohs(tcp_header->th_dport) << std::endl;
        }
    }
}

int main() {
    // Disable synchronization with C-style I/O for faster performance
    std::ios::sync_with_stdio(false);

    char error_buffer[PCAP_ERRBUF_SIZE];
    pcap_t* handle = pcap_open_live("en0", BUFSIZ, 1, 1000, error_buffer); // Open network device for packet capture

    if (handle == nullptr) {
        std::cerr << "Could not open device: " << error_buffer << std::endl;
        return 1;
    }

    // Start time
    auto start = std::chrono::high_resolution_clock::now();

    // Use pcap_loop to capture packets, which avoids manual for loops and conditions
    pcap_loop(handle, 150, packet_handler, nullptr);  // Capture 150 packets

    // End time
    auto end = std::chrono::high_resolution_clock::now();
    pcap_close(handle); // Close the handle

    // Calculate elapsed time
    std::chrono::duration<double> elapsed = end - start;

    // Print statistics
    std::cout << "Total packets captured: " << total_packets << std::endl;
    std::cout << "Packets with IP and TCP layers: " << ip_tcp_packets << std::endl;
    std::cout << "Time taken: " << elapsed.count() << " seconds" << std::endl;

    return 0;
}

I'll compile this C++ code and run it in the same network environment as the Python script. I'll measure the time it takes to process the same number of packets. To compile the C++ code, use the following command:

sudo g++ -std=c++11 -O3 -o packet_sniffer packet_sniffer.cpp -lpcap

Step 3: Comparing the Results

Now that both scripts are ready, the real fun begins: comparing the execution times. This is where the rubber meets the road. I expect the C++ version to be faster because it eliminates the overhead of Python's interpreter and makes more efficient use of system resources.

Expected Results:

  • Python Execution Time: Due to Python's interpreted nature and higher-level abstractions, the packet processing is expected to be slower. There’s overhead in every function call and object creation, which can add up quickly when processing a large number of packets.

  • C++ Execution Time: The C++ version, being compiled and closer to the hardware, should have lower latency in packet processing. It handles memory and data structures more efficiently, leading to faster execution, especially under high load.

Result Output:

  • C++ Execution Time: Time taken: 4.01112 seconds

  • Python Execution Time: Time taken: 8.84 seconds

The C++ version is indeed faster than the Python version, with a significant difference: C++ is approximately 54.6% faster than Python in this test case. This demonstrates the efficiency of C++ for tasks that require low-level operations and minimal overhead, making it a preferred choice for performance-critical applications like network packet processing, real-time data analysis, game development, or any other scenarios where maximum speed and resource efficiency are essential.

Leveraging LLMs for Code Optimization

Rewriting Python code in C++ and optimizing it can be a complex and time-consuming task, even for those who are experts in both languages. However, LLMs like GPT-4 can be valuable tools for experienced developers, assisting in translating high-level Python logic into low-level C++ code while taking performance nuances into account. For instance, the C++ code above was rewritten from Python to C++ using GPT-4, demonstrating its capability to aid experts in efficiently handling language translation. Additionally, GPT-4 can suggest optimizations tailored to C++ to further enhance performance, helping experts save time and effort by automating parts of the translation and optimization process, allowing them to focus on more critical aspects of their projects.