Cyber Security and big data aren’t going hand in hand and that concerns me immensely.

Bruce Schneier has an excellent article about data as a toxic asset.

Up until now, the data we’ve lost or had leaked about us has been credit card numbers and basic identity. While a nuisance and unacceptable, it pales to the potential for big data breaches. We don’t even have a grip of the amount of data and nature of data stored out there about us in various big data repositories without our knowledge.

If you’re a data collector, so far the focus is on the upside, but it’s time to focus on the downside of data collection. Data storage is an asset AND a liability.

Big data tends to be distributed, in special databases with an emphasis on access performance. I have yet to see a meaningful discussion or focus on making sure to first securely architect big data. The focus is on accessing and mining the data, not the safe storage, access or audit controls.

We need to make sure that corporations start to focus on the liability side of collecting massive amounts of data just because or in case of future need and without tackling key considerations such as data retention. 

Can you imagine if all our personal nuances and preferences were disclosed from a variety of data sources? It’s unimaginable the damage and lasting effects it would have if everyone’s digital footprints were released in the wild.

Everyone needs to ask questions like:

- Why store it?

- Can we really collect this?

- Can we do this without a release?

- To whom does this personal data belong?

- How do we expire data?

- How do we anonymize data?

- What kind of data should never be collected?

- What’s the potential liability of storing that much data?

We have all this data mining and collecting going on, but all I see are the potentially devastating toxic tailings. It’s not like we have the ability to clean up digital toxic superfund sites, data is different and it’s like a dirty bomb. Perhaps a nice start is to require anyone collecting massive data that they tag the data records with their registered identity, create a registry for “toxic data” types and require companies in their annual reports to disclose any data retention meeting those criteria, provisions for notification/compensation and financial set asides. These firms are mining toxic data, we need to treat it as such.