AWS re:Invent was a terrific event as always, and once again there was a flood of announcements regarding customers, acquisitions, partnerships, and, of course, product news. Data was one of the core themes of the conference, with different announcements about databases, data lakes, analytics, and data engineering. Data-Centric Security was a key theme in many of the announcements we tracked, and continued a trend begun earlier this year with the announcement of Row-Level Security (RLS) for Amazon Redshift back in July (for which we were proud to be a Design Partner)
There were two aspects to these announcements that are important
The focus was on Fine-Grained Data Access Control and Data Entitlements, reflecting a maturation of how data-centric security is playing a key role in digital transformation initiatives
Data Access Governance is becoming much more important – Integration with Lake Formation was a prevalent point in most of the announcements, underlining the importance of not just providing robust, granular data access control, but also centrally managing those policies
Here are some of the highlights:
Customers use Redshift data sharing for access to live data across multiple Redshift data warehouses. This new capability lets customers centrally view, modify, and audit permissions via Lake Formation APIs and the AWS Console.
Data sharing has become essential as organizations of all sizes look to collaborate more effectively internally as well as with customers, partners, and suppliers. This creates many challenges including accidental sharing of sensitive data, sharing more data than necessary, and the risk of data exfiltration if the sharing is not monitored and secured.
This announcement builds on existing Amazon Redshift capabilities for Role-Based Access Control and Row and Column-Level Security by providing SQL-based controls for masking how sensitive data is returned to the consumer at runtime (query time).
Data masking is a key control point for many privacy regulations including HIPAA, GDPR, and CCPA. If data is classified as privacy sensitive, then it has to be secured by default and this capability should help with that.
This capability helps enforce fine-grained access control policies in Athena queries for data stored in any supported file format using table formats such as Apache Iceberg, Apache Hudi and Apache Hive. You can now enforce least-privilege data access controls in Athena queries so that, as an example, data analysts residing in different countries get access to data only for customers located in their own country to meet regulatory requirements.
You can now use AWS Lake Formation to apply Table and Column-level permissions with Apache Spark and Apache Hive jobs submitted as EMR Steps. This allows you to further simplify access controls, and provide each job with access to specific Databases, Tables, and Columns.
This capability greatly simplifies operational complexity and overhead for fine-grained data access control in EMR clusters. Previously, data owners had to choose between over-granting access by creating a union of all access levels to be able to access a cluster or taking on the expense and burden of creating unique clusters for each level and type of data access.
Customers can now leverage runtime roles in combination with table and column-level policies from Lake Formation to scope down the entitlements for different groups of users within the same cluster.
This new service allows data owners and data stewards to set up their own data catalog by defining their data taxonomy and governance policies across a broad range of AWS and non-AWS data services, including on-prem data repositories. It alleviates the burden of maintaining the catalog by using machine learning, and it also powers collaboration by making all of this data available for discovery and appropriate use by analysts, data scientists, and other consumers.
- Data-Centric Security was a common theme across many of the data product announcements at re:Invent
- Fine-Grained Data Access Control and Data Entitlements are coming to the forefront highlighting the need for requirements like row, column, and cell-level security and advanced data masking capabilities
- The need to manage these centrally (Data Access Governance) is evident with the prominence of Data Zone for managing business centric data catalog and Lake Formation being used to manage capabilities including Athena, EMR, and Redshift.
TrustLogix provides Data Security Posture Management with monitoring, alerting, and recommendations for visibility into data mis-use and data sprawl across your cloud data sources. We combine this with no-code based data access control and policy management to give you a unified data access control plane across all your AWS Data Services and other cloud and on-prem data stores including Snowflake, Databricks, MySQL, SQL Server, PostgreSQL, and many others.