SIFT: Secure Index Filtering Technology

Overview

SIFT is a fully functional distributed private dataset keyword search solution that enables secure searches on encrypted and unencrypted data. SIFT can work with multiple servers and clients in a distributed environment and/or streaming data where data is continuously updated. Keys never need to leave the owner’s custody, even when data processing occurs outside the custodian’s environment.

Use Cases and Benefits

Stealth SIFT is relevant to any sensitive data such as trade secrets, PII data, financial secrets, and other compartmentalized views. SIFT’s underlying design guarantees that no information about the issuer’s private filtering keywords can ever be extracted from the encrypted filter or the small encrypted buffer it accesses. Indeed, even though the encrypted filter runs in an unrestricted environment, the machine executing the filter cannot tell if matches were found (even if they maliciously experiment on it with arbitrary data). Because the buffer (which contains the captured knowledge) is encrypted, no other participant or eavesdropper can know what information is being forwarded to the issuing agency. SIFT works by scanning arbitrary files – including documents, registry keys, executables, etc. – using a Yara-like syntax. SIFT’s streaming design allows for more general methods of knowledge capture and filtering.

How It Works

A user would take their private filter or search signature (specified in a Yara-like syntax) and using the SIFT toolkit, locally compile it into an encrypted search filter with their private key. The security of this encrypted search is rigorously and mathematically proven (and published in the flagship, peer-reviewed Journal of Cryptology), to leak no information about the private filter values and can thus be distributed freely. Since it is distributed, the burden of computation and filtering is offloaded from the querying party and as such, the solution is designed for highly parallel architectures. On an entry-level consumer laptop, Stealth obtained a resulting benchmark of processing unstructured data at 100KB/sec, and this computation is fully parallelizable.

These are the specific steps performed by SIFT:

  • Step 1: An analyst user creates an original filter issued within a “high” network (e.g. classified system) and applies the SIFT Search Compiler to create the encrypted filter. This gives rise to a keypair and an encrypted query. The secret decryption key is solely to be used by the issuer, and the public key and encrypted query can be distributed.
  • Step 2: Deploy the encrypted filter to the “low” network (e.g. unclassified system) where the corpus of data to be captured is residing.
  • Step 3: Each of the machines on the “low” side execute the encrypted filter on the corpus that they hold. They process the encrypted filter on each of their documents/objects one at a time, which updates a small encrypted buffer. Under the hood, the software actually retains (encrypted) only filtered data while making it impossible to tell which pieces of captured knowledge actually satisfied the filter. The encrypted buffer can be configured to capture a list of all scanned files and whether or not they matched (which will slowly grow), or in the case of infrequent results, capture all matching documents as long as it doesn’t overflow.
  • Step 4: Periodically (once per minute, hour, day, week, month, etc.) send back the small encrypted buffer that contains the encrypted answers from the low network to the high network.
  • Step 5: The issuer can now decrypt (using the key generated in Step 1) the contents on the high network, which will contain the captured knowledge satisfying the private filter.

Environment and Design

SIFT is built as a C library and binary which runs on most operating systems and desktop environments with a CLI interface. We also developed a GUI interface for scanning files on Windows. The streaming nature of the design means that SIFT can practically be engineered as a private stream-filtering service. For example, the software can already scan saved PCAP files for the secure filtering of IP addresses, MAC addresses, packet signatures, and can be extended to be pluggable with common packet sniffing tools.

SIFT is designed to have no interactive communication between the high and low sides: the only data that has to move is during the deployment of the encrypted query and the extraction of the encrypted results, which can be transmitted via any means. If equipped with a persistent communication channel, SIFT can be enhanced to use more powerful cryptographic techniques to improve throughput.