In the old days of on-premise data warehouses and data lakes, data security was simpler. Organizations brought their data to a single platform, from which data access could be centrally managed.
Now, however, the cloud has turned this on its head. Database instances from multiple vendors and across multiple clouds are very easy for various teams to spin up, and it is much harder to keep track of how data is shared and accessed across all of them. This leads to a pattern of constantly reacting to data access requests and issues, and "blind spots" going unnoticed until it is too late.
Let’s look more closely at the security blind spots associated with three common scenarios that are typical of cloud data environments.
How do organizations find themselves in a situation where they have no idea who has access to sensitive data?
This scenario is very common. This can happen due to too many administrators across many database instances, inactive users, user-roles explosion (such as too many RBAC roles granted to them) and ineffective access control policies (such as too many over-privileged accounts). In addition, wrong privacy policies end up with sensitive data in the wrong place, leading to data breaches and regulatory compliance violations.
Within hours of our quick deployment of a TrustLogix proof of concept, a financial services customer detected numerous anomalies and risky issues including:
- Ineffective access grants with 35 overly privileged roles
- Role explosion and operational complexities with 150+ overlapping roles assigned to users
- Over 40,000 unused tables containing sensitive data
- More than 100 inactive users (or ghost user accounts)
All of these issues were completely unknown to this customer’s data owners and data security staff, and each issue was a potential threat vector that could have resulted in a breach. Their previous approach of just having each database instance’s administrator manage access controls for their instance with SQL and Python scripts, which worked well for their previous centralized on-premise environment, was completely ineffective in identifying these issues.
What happens when data consumers can’t wait for their access requests to be granted?
As many organizations become more data-driven, data analysts, data scientists, application development teams, and even third parties are demanding access to data with both increasing frequency and urgency, and data owners and security staff struggle to keep up when data is distributed and access grants involve hand-coding or platform-specific tools. These data consumers can’t wait, else their business initiatives suffer.
“Shadow IT” is often the result, with consumers finding back-channel ways to get copies of data, often via the same role explosion and over-privileged accounts mentioned above. As sensitive data is accessed more broadly, the threat of leaks and breaches increases. The cloud makes this very easy, as data consumers can be at home or anywhere in the world and can readily access cloud-hosted data as long as their privileges allow it.
What happens when data is shared with other organizations without proper data protection and access control?
Across all industries, many organizations are seizing new business opportunities by sharing data with partner organizations, or even standing up data marketplaces to monetize their data. The cloud makes this very easy, and all the major cloud data platform vendors offer data sharing features that allow sharing with partners. Such datasets often include sensitive data attributes and should only be accessed by authorized users within partner organizations. But once data leaves your organization, it is no longer within your control. It is at the mercy of the partner’s data security and privacy policies (or lack thereof).
This is not simply a matter of data access, but also how the data should be de-identified so no harm can come if that data is misused while still being of value to the third party. The right data de-identification policies are also needed, leveraging protection methods like masking to ensure that only the right data attributes are shared with partners. When this doesn’t happen, any breaches and leaks that occur within the partner organization will affect you as well. It was your data, after all.
Introducing the Shift-Left Approach to Data Security
A new approach is needed, moving from implementing data security in reaction to security incidents and requests, to implementing before that data is even moved and made available to data consumers via the cloud. In short, data security needs to Shift Left.
The Shift-Left approach to data security is a proactive approach to monitoring and securing sensitive data in cloud data environments such as Snowflake, Redshift, and Databricks. Instead of reacting to blind spots as they are discovered, or reacting to incidents after they occur, one can identify risks and implement measures to protect against them before they lead to data leaks and compliance violations.
This means that you should have a solution in place that can continuously scan and audit your cloud data lakes and warehouses, looking for potential threats such as over-privileged accounts, unauthorized sharing of sensitive data, ghost accounts and inadequate access controls.
Shift-Left approaches have been widely adopted by developers and DevSecOps for applications. The same principle can be applied to data security. Instead of focusing on securing data at the time of use, Shift-Left focuses on securing it before it is made available for consumption. By doing so, you can make sure your business & customers' information remains safe throughout its lifetime.