API Documentation

Warning

The CLI is the official interface of this project. The API is documented here for sake of completeness and is not explicity designed to be one.


pcapgraph module

version file.

pcapgraph.get_tshark_status()[source]

Errors and quits if tshark is not installed.

On Windows, tshark may not be recognized by cmd even if Wireshark is installed. On Windows, this function will add the Wireshark folder to path so tshark can be called.

Changing os.environ will only affect the cmd shell this program is using (tested). Not using setx here as that could be potentially destructive.

Raises FileNotFonudError:
If wireshark/tshark is not found, raise an error as they are required.

pcapgraph.draw_graph

Draw graph will draw a text or image graph.

pcapgraph.draw_graph.draw_graph(pcap_packets, input_files, output_fmts, exclude_empty, anonymize_names)[source]

Draw a graph using matplotlib and numpy.

Parameters:
  • pcap_packets (dict) – All packets, where key is pcap filename/operation.
  • input_files (list) – List of input files that shouldn’t be deleted.
  • output_fmts (list) – The save file type. Supported formats are dependent on the capabilites of the system: [png, pdf, ps, eps, and svg]. See https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.savefig for more information.
  • exclude_empty (bool) – Whether to exclude empty pcaps from graph.
  • anonymize_names (bool) – Whether to change filenames to random values.
pcapgraph.draw_graph.export_graph(pcap_names, save_fmt)[source]

Exports the graph to the screen or to a file.

Parameters:
  • pcap_names (list) – List of pcap_names
  • save_fmt (str) – File extension of output file
pcapgraph.draw_graph.generate_graph(pcap_vars, empty_files, anonymize_names)[source]

Generate the matplotlib graph.

Parameters:
  • pcap_vars (dict) – Contains all data required for the graph. {<pcap>: {‘pcap_start’: <timestamp>, ‘pcap_end’: <timestamp>}, …}
  • empty_files (list) – List of filenames of empty files.
  • anonymize_names (bool) – Whether to use pseudorandom names for files.
pcapgraph.draw_graph.get_graph_vars_from_file(filename)[source]

Setup graph variables.

This function exists to decrease the complexity of generate graph. The order of return vars start_times_array, end_times_array, and pcap_names should all match. In other words, the start_times_array[5] is for the same pcap as end_times_array[5] and pcap_names[5].

Parameters:filename (str) – Name of file
Returns:File start/stop times if file has 1+ valid packets.
Return type:(dict)
pcapgraph.draw_graph.get_x_minmax(start_times, end_times)[source]

Determine the horizontal (x) min and max values.

This function adds 1% to either side for padding.

Parameters:
  • start_times (np.array) – First packet unix timestamps of all pcaps.
  • end_times (np.array) – Last packet unix timestamps of all pcaps.
Returns:

min_x, max_x to be used for graph

Return type:

(tuple)

pcapgraph.draw_graph.make_text_not_war(pcap_times)[source]

Make useful text given pcap times.

Parameters:pcap_times (dict) – Packet capture names and start/stop timestamps.
Returns:Full textstring of text to written to file/stdout
Return type:(str)
pcapgraph.draw_graph.output_file(save_format, pcap_packets, exclude_empty, anonymize_names)[source]

Save the specified file with the specified format.

pcapgraph.draw_graph.remove_or_open_files(new_files, open_in_wireshark, delete_pcaps)[source]

Remove or open files.

delete_pcaps and open_in_wireshark should not both be true, because that would mean that wireshark would try to open deleted files.

Parameters:
  • new_files (set) – Set of new filenames to do something with
  • open_in_wireshark (bool) – Whether to open files in wireshark
  • delete_pcaps (bool) – Whether to delete generated pcaps
pcapgraph.draw_graph.set_horiz_bar_colors(barlist)[source]

Set the horizontal bar colors.

Color theme is Metro UI, with an emphasis on darker colors. If there are more horiz bars than in the color array, loop and continue to set colors.

Parameters:barlist (list) – List of the horizontal bars.
pcapgraph.draw_graph.set_xticks(first, last)[source]

Generate the x ticks and return a list of them.

Parameters:
  • first (float) – Earliest timestamp of pcaps.
  • last (float) – Latest timestamp of pcaps.
Returns:

x_ticks (list(float)): List of unix epoch time values as xticks. x_label (string): Text to be used to label X-axis.

Return type:

(tuple)


pcapgraph.generate_example_pcaps

Script to create three packet captures to demonstrate PcapGraph.

pcapgraph.generate_example_pcaps.generate_example_pcaps(interface=None)[source]

This script will create 3 packet captures, each lasting 60 seconds and starting at 0s, 20s, 40s. After 100s, this script will stop. Packet capture 0s should have 66% in common with pcap 20s and 33% in common with pcap 40s. Indeed, this is what we see in the graph.

Parameters:interface (string) – Optional interface to specify for wireshark.

pcapgraph.get_filenames

Parse CLI options and return a list of filenames.

pcapgraph.get_filenames.get_filenames(files)[source]

Return a validated list of filenames.

Parameters:files (list) – List of file params entered by user
Returns:List of files validated to be packet captures.
Return type:(list)
pcapgraph.get_filenames.get_filenames_from_directories(directories)[source]

Get all the files from all provided directories.

This function is not recursive and searches one deep.

Parameters:directories (list) – List of user-inputted directories.
Returns:Filenames of all packet captures in specified directories.
Return type:(list)
pcapgraph.get_filenames.parse_cli_args(args)[source]

Parse args with docopt. Return a list of filenames

Parameters:args (dict) – Dict of args that have been passed in via docopt.
Returns:List of filepaths
Return type:(list)

pcapgraph.manipulate_frames

Parse the frames from files based upon options.

Create the same JSON style with tshark -r examples/simul1.pcap -T json -x Note that the <var>_raw is due to the -x flag.

Frame JSON looks like this:
{
    '_index': 'packets-2018-11-03',
    '_type': 'pcap_file',
    '_score': None,
    '_source': {
        'layers': {
            'frame_raw': ['881544abbfdd2477035113440800450000380b5d0000...
            'frame': {'frame.encap_type': '1', 'frame.time': 'Sep 26, 2...
            'eth_raw': ['881544abbfdd2477035113440800', 0, 14, 0, 1],
            'eth': {'eth.dst_raw': ['881544abbfdd', 0, 6, 0, 29], 'eth...
            'ip_raw': ['450000380b5d00004011c7980a3012900a808080', 14, 2...
            'ip': {'ip.version_raw': ['4', 14, 1, 240, 4], 'ip.version'...
            'udp_raw': ['ea6200350024a492', 34, 8, 0, 1],
            'udp': ['udp.srcport_raw': ['ea62', 34, 2, 0, 5], 'udp.srcp...
            'dns_raw': ['9b130100000100000000000006616d617a6f6e03636f6d...
            'dns': {'dns.id_raw': ['9b13', 42, 2, 0, 5], 'dns.id': '0x00...
        }
    }
}

Many of these functions interact with this frame dict format or directly with the frame string (seen in ‘frame_raw’). The frame string is a string of the hex of a packet.

pcapgraph.manipulate_frames.anonymous_pcap_names(num_names)[source]

Anonymize pcap names.

Creation of funny pcap names like switch_wireless is intendeded behavior.

Parameters:num_names (int) – Number of names to be returned
Returns:Fake pcap name list
Return type:(list)
pcapgraph.manipulate_frames.decode_stdout(stdout)[source]

Given stdout, return the string.

pcapgraph.manipulate_frames.get_flat_frame_dict(pcap_json_list)[source]

Given the pcap json list, return the frame dict.

Parameters:pcap_json_list (list) – List of pcap dicts (see parse_pcaps for details)
Returns:{<frame>: <timestamp>, …}
Return type:frame_list (dict)
pcapgraph.manipulate_frames.get_frame_from_json(frame)[source]

Get/sanitize raw frame from JSON of frame from tshark -x -T json …

Parameters:frame (dict) – A dict of a single packet from tshark.
Returns:The ASCII hexdump value of a packet
Return type:(str)
pcapgraph.manipulate_frames.get_frame_list_by_pcap(pcap_json_dict)[source]

Like get_flat_frame_dict, but with pcapname as key to each frame list

Parameters:pcap_json_dict (dict) – List of Pcap JSONs.
Returns:[[<frame>, …], …]
Return type:(list)
pcapgraph.manipulate_frames.get_homogenized_packet(ip_raw)[source]

Change an IPw4 packet’s fields to the same, homogenized values.

Replace TTL, header checksum, and IP src/dst with generic values. This function is designed to replace all IP data that would change on a layer 3 boundary

Note that these options are found only in IPv4. TTL is expected to change at every hop along with header checksum. IPs are expected to change for NAT.

Parameters:ip_raw (str) – ASCII hex of packet.
Returns:Packet with fields that would be altered by l3 boundary replaced
Return type:(str)
pcapgraph.manipulate_frames.get_packet_count(filename)[source]

Given a file, get the packet count.

Parameters:filename (str) – Path of a file, including extension
Returns:How many packets were in that pcap
Return type:packet_count (int)
pcapgraph.manipulate_frames.get_pcap_as_json(pcap)[source]

Given a pcap, return a json with tshark -r <file> -x -T json.

tshark -r <pcap> -w -
Pipes packet capture one packet per line to stdout
tshark -r -
Read file from stdin
tshark -r <in.pcap> -x | text2pcap - <out.pcap>
Prints hex of pcap to stdout and then resaves it as a pcap. This WILL delete packet timestamps as that is not encoded in hex output.
Parameters:pcap (string) – File name.
Returns:List of the pcap json provided by tshark.
Return type:(list)
pcapgraph.manipulate_frames.get_pcap_frame_dict(pcaps)[source]

Like get_flat_frame_dict, but with pcapname as key to each frame list

Parameters:pcaps (list) – List of pcap file names.
Returns:{<pcap>: {<frame>:<timestamp>, …}, …}
Return type:(dict)
pcapgraph.manipulate_frames.parse_pcaps(pcaps)[source]

Given pcaps, return all frames and their timestamps.

Parameters:pcaps (list) – A list of pcap filenames
Returns:
All the packet data in json format.
[{<pcap>: {PCAP JSON}}, …]
Return type:pcap_json_list (list)
pcapgraph.manipulate_frames.strip_layers(filenames, options)[source]

Get the PCAP JSON dict stripped per options.

strip-l3:
Replace layer 3 fields src/dst IP, ttl, checksum with dummy values
strip-l2:
Remove all layer 2 fields like FCS, source/dest MAC, VLAN tag…
Parameters:
  • filenames (list) – List of filenames.
  • options (dict) – Whether to strip L2 and L3 headers.
Returns:

The modified packet dict

Return type:

(dict)


pcapgraph.pcap_math

Do algebraic operations on sets like union, intersect, difference.

class pcapgraph.pcap_math.PcapMath(filenames, options)[source]

Bases: object

Prepare PcapMath object for one or multiple operations.

Every PcapMath object should start with the data structures filled with the data that each operation needs to function.

Parameters:
  • filenames (list) – List of filenames.
  • options (dict) – Whether to strip L2 and L3 headers.
bounded_intersect_pcap()[source]

Create a packet capture intersection out of two files using ip.ids.

Create a packet capture by finding the earliest common packet by and then the latest common packet in both pcaps by ip.id.

Returns:Filenames of generated pcaps.
Return type:(list(string))
difference_pcap(pivot_index=0)[source]

Given sets A = (1, 2, 3), B = (2, 3, 4), C = (3, 4, 5), A-B-C = (1).

Parameters:[int] (pivot_index) – Specify minuend by index of filename in list
Returns:Name of generated pcap.
Return type:(string)
get_bounded_pcaps()[source]

Get the pcap frame list for bounded_intersect_pcap

Create a bounding box around each packet capture where the bounds are the min and max packets in the intersection.

Returns:A list of frame_dicts
Return type:bounded_pcaps (list)
get_minmax_common_frames()[source]

Get first, last frames of intersection pcap.

Returns:Packet strings of the packets that are at the beginning and end of the intersection pcap based on timestamps.
Return type:min_frame, max_frame (tuple(string))
Raises:assert – If intersection is empty.
intersect_pcap()[source]

Save pcap intersection. First filename is pivot packet capture.

Returns:Fileame of generated pcap.
Return type:(str)
inverse_bounded_intersect_pcap(bounded_filelist=False, intersect_file=False)[source]

Inverse of bounded intersection = (bounded intersect) - (intersect)

Parameters:
  • bounded_filelist (list) – List of existing bounded pcaps generated by bounded_intersect_pcap()
  • intersect_file (string) – Location of intersect file.
Returns:

Filenames of generated pcaps.

Return type:

(list(string))

parse_set_args(args)[source]

Call the appropriate method per CLI flags.

difference, union, intersect consist of {<op>: {frame: timestamp, …}} bounded_intersect consists of {pcap: {frame: timestamp, …}, …}

Parameters:args (dict) – Dict of all arguments (including set args).
Returns:
List of all files, including ones generated
by set operations.
Return type:filenames (list)
static print_10_most_common_frames(raw_frame_list)[source]

After doing a packet union, find/print the 10 most common packets.

This is a work in progress and may eventually use this bash:

<packets> | text2pcap - - | tshark -r - -o ‘gui.column.format:”No.”, “%m”,”VLAN”,”%q”,”Src MAC”,”%uhs”,”Dst MAC”,”%uhd”,”Src IP”,”%us”, “Dst IP”,”%ud”,”Protocol”,”%p”,”Src port”,”%uS”,”Dst port”,”%uD”’

Alternatively, just use the existing information in pcap_dict.

The goal is to print frame#, VLAN, src/dst MAC, src/dst IP, L4 src/dst ports, protocol

This should likely be its own CLI flag in future.

Parameters:raw_frame_list (list) – List of raw frames
symmetric_difference_pcap()[source]

For sets A = (1, 2, 3), B = (2, 3, 4), C = (3, 4, 5), A△B△C = (1, 5)

For all pcaps, the symmetric difference produces a pcap that has the packets that are unique to only that pcap (unlike above where only one set is the result).

Returns:Filenames of generated pcaps.
Return type:(list(str))
union_pcap()[source]

Given sets A = (1, 2, 3), B = (2, 3, 4), A + B = (1, 2, 3, 4).

About:
This method uses tshark to get identifying information on pcaps and then mergepcap to save the combined pcap.
Returns:Name of generated pcap.
Return type:(string)

pcapgraph.save_file

Save file.

pcapgraph.save_file.convert_to_pcaptext(raw_packet, timestamp='')[source]

Convert the raw pcap hex to a form that text2cap can read from stdin.

hexdump and xxd can do this on unix-like platforms, but not on Windows.

tshark -r <file> -T json -x produces the “in” and text2pcap requires the “out” formats as shown below:

Per Text2pcap documentation: “Text2pcap understands a hexdump of the form generated by od -Ax -tx1 -v.”

In format:

247703511344881544abbfdd0800452000542bbc00007901e8fd080808080a301290000
082a563110001f930ab5b00000000a9e80d0000000000101112131415161718191a1b1c
1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637

Out format:

0000  24 77 03 51 13 44 88 15 44 ab bf dd 08 00 45 20
0010  00 54 2b bc 00 00 79 01 e8 fd 08 08 08 08 0a 30
0020  12 90 00 00 82 a5 63 11 00 01 f9 30 ab 5b 00 00
0030  00 00 a9 e8 0d 00 00 00 00 00 10 11 12 13 14 15
0040  16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25
0050  26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35
0060  36 37

NOTE: Output format doesn’t need an extra n between packets. So in the above example, the next line could be 0000 00 … for the next packet.

Parameters:
  • raw_packet (str) – The ASCII hexdump seen above in ‘In’
  • timestamp (str) – An optional packet timestamp that will precede the 0000 line of the packet hex.
Returns:

Packet in ASCII hexdump format like Out above

Return type:

formatted_string (str)

pcapgraph.save_file.reorder_packets(pcap)[source]

Union causes packets to be ordered incorrectly, so reorder properly.

Reorder packets, save to 2nd file. After this is done, replace initial file with reordered one. Append temporary file with ‘_’.

Parameters:pcap (str) – Filename of packet capture. Should end with ‘_’, which can be stripped off so that we can reorder to a diff file.
pcapgraph.save_file.save_pcap(pcap_dict, name, options)[source]

Save a packet capture given ASCII hexdump using text2pcap

Parameters:
  • pcap_dict (dict) – List of pcaps of frames to timestamps. Format: {<frame>: <timestamp>, …}
  • name (str) – Type of operation and name of savefile
  • options (dict) – Whether to encode with L2/L3 headers.