Record Traffic and Store in Google Cloud Storage (GCS)

Nov 22, 2023 - 6 min read

Kubeshark Google Cloud Storage (GCS) Integration

We've recently introduced Google Cloud Storage (GCS) as a storage option for traffic recording in Kubeshark. This integration is pivotal, especially in Kubernetes, due to its highly distributed architecture and reliance on network-based API calls for business logic execution.

Why Is It Important

Deep network observability, facilitated by Kubeshark, is essential for pinpointing production incidents, bugs, and performance issues. It enables comprehensive tracing of business-logic transactions over the network. Real-time capability is beneficial, but sometimes the issues are not immediately apparent.

Traffic recording in Kubeshark addresses this challenge, allowing for pattern setting and using a dashboard with advanced filtering to investigate issues offline, enhancing diagnostic capabilities within Kubernetes.

How It's Done

1. Assign a GCS Bucket

First, assign a GCS bucket for storing recorded traffic.

2. Create a (or Use and Existing) Service Account

Provide a service-account key as part of the Kubeshark configuration.

Download this key for the next steps.

3. Traffic Recording Script

Utilize Kubeshark's scripting to enhance functionality. Scripts can be tailored for specific needs. Download and customize the script from the provided GitHub repository.

curl https://raw.githubusercontent.com/kubeshark/scripts/master/dfir/forensics_gcs.js -O

4. Traffic Recording Pattern

Set a recording pattern in the RECORDING_KFL environment variable using KFL filter. This pattern filters and temporarily stores relevant L4 streams. Here are a few examples for KFL filters:

http or dns - Filter L4 streams that include either HTTP or DNS traffic
src.namespace=="my-namespace" or dst.namespace=="my-namespace" - Filter L4 streams that include namespace specific egress and ingress traffic
src.name==r"cata.*" - Use a regular expression to filter L4 streams with traffic that belong to pods where the pod name matches the regular expression.

5. Kubeshark Configuration

The configuration includes the GCS Bucket, service account key, and script folder location.

scripting:
  env:
    GCS_BUCKET: test-11-12
    GCS_SA_KEY_JSON : '{
  "type": "service_account",
  "project_id": "cloudstorage-integration",
  "private_key_id": "14a7e4f4816892b82c28132acf5fbb1c62c85181",
  "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQ..tDAYZxgNB0=\n-----END PRIVATE KEY-----\n",
  "client_email": "k..2@cl..on.iam.gserviceaccount.com",
  "client_id": "104953139650716443372",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/k..unt.com",
  "universe_domain": "googleapis.com"
}'
    RECORDING_KFL: http or dns
  source: "/path/to/scripts/folder"
}

6. Start Recording

Simply run Kubeshark to start recording. Active dashboard or Kubernetes proxy is not required.

7. Deactivating Recording

Remove the RECORDING_KFL variable from the configuration file to stop recording.

8. Viewing the PCAP files

Access and download PCAP files from the GCS console as needed.

9. Analyzing the Traffic Offline

Functionality is deprecated

Point Kubeshark's dashboard to the GCS bucket for offline analysis. Kubeshark CLI in Docker mode can be used for local viewing of the files.

kubeshark tap --pcap gs://bucket-name/

Remember to have gsutil installed and configured for GCS bucket access.

TL;DR - the Fine Details

Script Description

L7 Hook

function onItemCaptured(data) {
    if (!ACTIVE) return;

    // Check if data matches KFL and hasn't been captured before
    if (kfl.match(env.RECORDING_KFL, data) && pcapArr.indexOf(data.stream) === -1) {
        pcapArr.push(data.stream); // Add new stream to the array
        try {
            file.copy(pcap.path(data.stream), pcapFolder + "/" + data.stream); // Copy pcap file to the folder
        } catch (error) {
            console.error(currentDateTime() + "| Error copying file: " + pcap.path(data.stream) + "; Error: " + error + ";");
        }
    }
}

This code is triggered every time a new request-response pair is successfully dissected. Each such pair is part of an L4 stream that is stored in a local PCAP file. The L4 stream PCAP file has a very short TTL (e.g. 10 seconds), enough for a script to copy to an immutable folder..This code is responsible to copying the L4 stream to an immutable folder.

> Read more about this hook in the Hooks section.

Periodic Upload to GCS

L4 streams matching the KFL filter, that are copied to a temporary folder, are periodically uploaded to a GCS bucket.

// Function to handle GCS job
function dfirJob_gcs() {
    if (pcapArr.length === 0) return; // Exit if no pcap files are captured

    var tmpPcapFolder = "dfir_tmp"; // Temporary folder for pcap files
    var serviceAccountKey = JSON.parse(env.GCS_SA_KEY_JSON); // Parse GCS service account key

    try {
        console.log(currentDateTime() + "| dfirJob_gcs");
        file.delete(tmpPcapFolder); // Delete existing temporary folder
        file.move(pcapFolder, tmpPcapFolder); // Move pcap files to temporary folder
        pcapArr = []; // Clear the array
        file.mkdir(pcapFolder); // Create a new empty pcap folder

        var snapshot = pcap.snapshot([], tmpPcapFolder); // Create a snapshot of pcap files
        file.delete(tmpPcapFolder); // Delete the temporary folder
        vendor.gcs.put(env.GCS_BUCKET, snapshot, serviceAccountKey); // Upload snapshot to GCS
        file.delete(snapshot); // Delete the local snapshot

        var nrh = "name_resolution_history.json";
        vendor.gcs.put(env.GCS_BUCKET, nrh, serviceAccountKey); // Upload name resolution history to GCS
    } catch (error) {
        console.error(currentDateTime() + "| " + "Caught an error!", error);
    }
}

// Schedule the GCS job to run periodically
jobs.schedule("dfir_fcs", "0 */5 * * * *", dfirJob_gcs);