# libmerc usage and configuration options

## High-level Usage Overview

This document will highlight some of the important pieces when leveraging libmerc,
but a full test program with proper error checking can be found here:
[libmerc_test.c](https://wwwin-github.cisco.com/network-intelligence/mercury-transition/blob/trunk/src/libmerc_test.c).

To setup a mercury packet processor with the default configuration options and to tear down that packet processor:

```C
struct libmerc_config config = libmerc_config_init();
int retval = mercury_init(config, verbosity);
mercury_packet_processor merc = mercury_packet_processor_construct();
...
mercury_packet_processor_destruct(merc);
retval = mercury_finalize();
```

To perform analysis on a packet:

```C
analysis_ctx = mercury_packet_processor_get_analysis_context(merc, client_hello_eth,
                                                             sizeof(client_hello_eth), &time);
```

The `analysis_ctx` object can be passed into several helper functions to extract useful context about the TLS `client_hello` packet:

```C
const char *fingerprint_string = analysis_context_get_fingerprint_string(analysis_ctx);
const char *server_name = analysis_context_get_server_name(analysis_ctx);
enum fingerprint_status status = analysis_context_get_fingerprint_status(analysis_ctx);
```

The `fingerprint_status` enumeration defines three different states for the analysis object:
`fingerprint_status_labeled`, `fingerprint_status_randomized`, and `fingerprint_status_unlabled`.
`fingerprint_status_unlabled` means that we have observed the TLS fingerprint string in the wild, but we
were unable to associate that fingerprint with any ground truth. `fingerprint_status_randomized`
means that we have *not* observed that fingerprint in the wild, and the fingerprint string was most
likely generated by a TLS scanner, evasive application, or nonconformant client. `fingerprint_status_labeled`
is returned when the TLS fingerprinting subsystem in mercury was able to perform process identification because
there was associated ground truth.

To get the results of the process identification, malware detection, and operating system identification:

```C
const char *probable_process = NULL;
double probability_score = 0.0;
analysis_context_get_process_info(c, &probable_process, &probability_score);

bool probable_process_is_malware = 0;
double probability_malware = 0.0;
analysis_context_get_malware_info(analysis_ctx, &probable_process_is_malware, &probability_malware);

const struct os_information *os_info = NULL;
size_t os_info_len = 0;
analysis_context_get_os_info(analysis_ctx, &os_info, &os_info_len);
```

## libmerc Configuration Object

The `libmerc_config` structure contains the following fields:

```C
struct libmerc_config {
    bool dns_json_output;         /* output DNS as JSON           */
    bool certs_json_output;       /* output certificates as JSON  */
    bool metadata_output;         /* output lots of metadata      */
    bool do_analysis;             /* write analysys{} JSON object */
    bool report_os;               /* report oses in analysis JSON */
    bool output_tcp_initial_data; /* write initial data field     */
    bool output_udp_initial_data; /* write initial data field     */

    char *resources;         /* directory containing (analysis) resource files */
    char *packet_filter_cfg; /* packet filter configuration string             */

    float fp_proc_threshold;   /* remove processes with less than <var> weight    */
    float proc_dst_threshold;  /* remove destinations with less than <var> weight */
};
```

`fp_proc_threshold` and `proc_dst_threshold` control how much of the TLS fingerprint database
is read into memory, which can be advantageous when running libmerc on lower-end platforms.
The default values of `0.0` tells libmerc to read in the full database. The high-level idea of
these parameters is to avoid loading data that is the least likely to be observed on the network
so that the performance of the process identification system can be maximized given memory constraints.

In the fingerprint database, each fingerprint string is associated with a list of processes.
`fp_proc_threshold` defines how many of those processes are read into memory. First, we compute
the relative prevalence for each process in a fingerprint database entry by dividing the number of
times we observed that process (using the specific fingerprint string) by the number of times we
observed the fingerprint. This gives us a real number between 0 and 1 for each process. We do not
load processes into memory when the process's relative prevalence is below `fp_proc_threshold`.

The fingerprint database also contains a list of destinations that each process visits using a
specific fingerprint string. Similar to the above, `proc_dst_threshold` controls how many destinations
per process are read into memory by skipping destinations whose relative prevalence is below
`proc_dst_threshold`.

The following table illustrates the tradeoff between these parameters, database size, and process
identification performance:

| `fp_proc_threshold`      | `proc_dst_threshold` | Database Size | Process Identification Accuracy |
| ------------------------ | -------------------- | ------------- | ------------------------------- |
| 0.0                      | 0.0                  | 31MB          | 99.10%                          |
| 0.01                     | 0.0                  | 12MB          | 96.05%                          |
| 0.0                      | 0.01                 | 4.7MB         | 98.92%                          |
| 0.001                    | 0.01                 | 1.2MB         | 97.96%                          |
| 0.001                    | 0.05                 | 864KB         | 97.68%                          |
| 0.01                     | 0.01                 | 820KB         | 95.99%                          |
| 0.05                     | 0.01                 | 545KB         | 92.12%                          |
| 0.1                      | 0.01                 | 424KB         | 90.20%                          |
| 0.05                     | 0.05                 | 383KB         | 92.05%                          |
| 0.1                      | 0.1                  | 264KB         | 90.17%                          |
