Article - I am afraid of leakage, help me (DLP - Data Leakage Protection)
Author: Werner Schmidt, CISSP
Date: January 2009
DLP or Data Leakage Prevention is a complex subject, overly simplified and little understood. Unfortunately, the tools out there just are not quite up to the task yet to really prevent DLP, but as with all things in security we must strive to improve and doing something is better than doing nothing.
Lets be realistic, we have some really daunting challenges with real DLP, in spite of what the vendors may be selling and telling. If we look at preventing car theft, that alone still stymies our abilities and we’re talking about a single car (without replication considerations), fixed roads, alarms on entry violations (access), ability to geotrack (think LoJack), large security force (police) protecting and attempting to recover the asset, somewhat limited destinations (usually single continents), relatively slow escape speed (compared to the speed of light), relatively low street value (compared to the value of some data, especially in bulk) and still we can’t prevent the theft. How are we really going to do this with data that can be replicated easy, has no geo tracking realistically, has enormous value and which seems to play out far better on the risk/reward curve?
The military of course has years of experience and has the basics down. Limit physical access, log access, authenticate access, attempt as best as possible the replication and transmission of documents and classification of documents.
While extremely important, this article will not address data classification which is of course a very important consideration, but beyond the scope of this article. There will be limited focus as well on entry or access protection mechanisms and instead will focus more on transmission methods and approaches after access has been granted or taken.
For the sake of simplicity, this article will focus more on realistic commercial applications of DLP, realizing it’s more about preventing either authorized users transmitting sensitive data via unauthorized channels or preventing unauthorized users from transmitting sensitive data. We can only hope the tools and techniques will continue to mature.
Essentially our tools today consist of:
- Controlling and logging access to data
- Marking or fingerprinting data for later detection of transmission
- Controlling and restricting transmission methods of data
I’m always for the basics, and adherence to best practices. If you have no visibility either with management tools or logging, you have minimal ability to learn from breaches let alone be able to do any forensics. A solid infrastructure is needed first in visibility to usage and access to data. This may be for structured data (typically data stored in a database), but more likely these days should also incorporate unstructured data (typically flat files such as Microsoft Excel, Word and PowerPoint documents). Logging is fairly easy to accomplish for both types of data, logging should include usernames, source addresses and be irrefutable from a security and legal perspective.
Access control is the process of validating a user through some kind of authentication measures, preferably two or more in terms of sensitive data. Those authentication factors should be a combination of what you know (e.g. passwords), what you have (e.g. token, card, etc.) and what you are (various biometrics).
Access control doesn’t address what happened to the data, merely who could have done something with the data in terms of duplicating or transmitting the data without authorization. To detect this, we need to “fingerprint” the data, not just to detect the transmission in its entirety, but also some portion of the data and we have to monitor all transmission vectors. As for fingerprinting, it’s still possible to circumvent simple mechanisms. For instance, if the source data is text, but a “copy” is made visually (e.g. a screenshot), it may longer be possible to detect the data patterns except with more extensive controls in place. This is why access control is still essential, because it is a record of who could have made a copy of the data and timestamps of that access. These data fingerprints should be tagged with data classification levels.
Today, transmission method detections are fairly simple. Many solutions still just look for Email. More robust ones detect patterns in major IM (Instant Messaging) applications (e.g. Google, MSN, AIM, Yahoo), but still can’t detect lessor IMs let alone self hosted or proxy anonymizer approaches. This is far too simple, numerous applications exist for the transmission of data, hundreds of them and it’s extremely easy to create your own custom service. The methods in place today for detection tend to either look at the target of the transmission (destination address e.g. myspace.com as in the case of various URL filtering approaches) or the application used to transfer the data (e.g. Email over port 25). Both approaches are woefully simplistic. Destination approaches are bound to fail for any kind of custom or unknown destination and application approaches limited to a mere handful or ports or applications (e.g. Email, IM, ftp) are bound to fail up against the literally hundreds of choices.
This problem is only further confounded with the usage of encrypted transports or frankly even simple obfuscation of the data (e.g. character shifting, A=B, B=C, etc.). Tools are just now arriving to deal with encryption, however, they really just look at the port (e.g. port 443). It’s still far to easy to use a legitimate clear or unencrypted port for transport of encrypted data or obfuscated data.
Lets limit our scope to that which is reasonably attainable, which is stopping or detecting the general user from using regular commercial applications to mask or hide their activities.
We have to get back to the basics:
- Don’t allow “any - any” rules on firewalls for outbound traffic, restrict the ports
- Don’t’ allow outbound port 25 (smtp) except for authorized mail servers
- Use firewalls to restrict the use of unauthorized applications that can be used to transport data (e.g. IM, P2P, etc.)
- Use next generation firewalls to inspect port 80 traffic for unauthorized applications that could be used to transport data
- Inspect the content of authorized transmission vectors (e.g. EMail, perhaps IM, ftp) for matches to data fingerprints
- Decide in advance based upon data classification if data should be allowed, blocked or quarantined for further inspection
If the data resides in a web application, make sure to use enterprise class web application firewalls to inspect the data and mask any inappropriate data (e.g. SSNs, account numbers, etc.) that might leak out due to an application error.
I’m a big fan of using tripwires approaches as records in a database or files on file servers. Use bogus entries that would never legitimately be accessed and then track via access control and transmission control any attempt to read the data and of course transmit the data. Except for backups, these records or files would never be access except in some bulk transfer attempt. Pattern matches are easy to detect for these special records.
DLP is in its infancy and I’m sure over time we’ll discover that even with proper tools in place it is extremely easy to bypass most approaches to attempt to curtail transmission, so we have to focus on access control and use DLP methods today more as a stop gap or early detection method of failed controls.
Read the rest of the articles. Please contact us for more information!