Snowflake Summit 2023 Takeaways

Stay in the Know
Subscribe to Our Blog

This year’s Snowflake Summit was held in Las Vegas last week, and it was as engaging and educational as ever. Over 12,000 Snowflake customers, partners and other data professionals filled Caesar’s Forum and Caesar’s Palace to share knowledge about Snowflake capabilities and best practices, partner offerings, and of course new topics, such as Generative AI.

With respect to the data security market, it was interesting to see how much more energy and attention this topic received at this year’s Summit than in prior years. At least ten exhibitors were offering data security products or services, there were numerous sessions on the agenda from customers, partners, and Snowflake, and finally there was new functionality announced by Snowflake themselves. For us at TrustLogix, it was gratifying to see so much attention on this critical topic, with recognition from all types of organizations that getting data security right is critical to accelerating one’s data innovation on Snowflake.

Partner Focus on Industry Solutions

As in prior years, the opening day included a separate track for partners. This year, its focus was on Industry Solutions, especially those that involve data sharing and collaboration across a network of organizations. Snowflake is soliciting help from industry-specialist ISVs and service providers to build, market and sell such solutions jointly. Existing Industry Solutions being promoted included financial services counterparty, healthcare value chain and price transparency, healthcare unstructured data analytics (such as DICOM medical images), life sciences collaborative research, logistics and transportation, and media and entertainment distribution.

All of these solutions have something in common: sensitive data being shared. “Designing Data Security In” to these packaged solutions is essential and shouldn’t be an afterthought for any enterprise implementing them, and Snowflake presenters called out that solution providers should leverage Snowflake’s native features as well as engage with its security partners to ensure their packaged solutions include the right data security capabilities out-of-the-box. This is a great opportunity for data security ISVs to be included, especially those with lightweight cloud-native architectures that don’t add unnecessary complexity and implementation cost to the overall solution. For solution providers delivering such solutions, this is a great reason to consider TrustLogix.

Generative AI

The main Summit kicked off Monday 5pm with a joint keynote and Q&A with Snowflake CEO Frank Slootman and Nvidia CEO Jensen Huang, announcing a strategic partnership in which Snowflake workloads can leverage Nvidia’s Foundation AI models and run directly on Nvidia GPU. Snowflake also announced other new features intended for running Large Language Model (LLM) workloads on Snowflake, including the availability of unstructured document intelligence (Applica acquisition), the ability to run LLM and other compute-intensive workloads as containers directly within Snowflake, and streamlining the usage of external LLMs such as OpenAI via external functions.

Altogether, Generative AI is a significant growth area for Snowflake, with such workloads resulting in ever-more data being ingested and processed within the Snowflake ecosystem. And more data means more data security and privacy headaches, if one doesn’t plan ahead. In particular, many of the Generative AI use cases that were discussed involved generating new insights from customer engagement data, which can lead to new privacy and regulatory risks. Should the LLM’s dataset include only de-identified data, at the risk of the AI’s responses being less accurate and insightful as a result? If the LLM’s dataset does include PII, how to ensure that its answers don’t reveal those identities to unauthorized users, no matter what are their prompts? How to ensure that the prompts themselves don’t include or refer to identifiable information from unauthorized users? These questions represent a significant opportunity for ISVs and solution providers who designed security into their solutions from the get-go, as with the industry solutions highlighted above.

Data Applications

Last year, Snowflake acquired Streamlit, which provides a Snowflake-native environment for data application development. This year’s Summit showcased several success stories, with both ISVs and end customers demonstrating their apps and how their business benefited as a result.

And now this year, Snowflake acquired Applica, which provides the ability to analyze and build applications on top of unstructured data. This was positioned as particularly useful for Generative AI, but in principle can enable any workload on unstructured data.

In both cases, data security came up as an essential aspect of developing these applications. The ability to control access within Snowflake itself is preferable to each application team doing it their own way, especially if the same data is used or shared across applications. Just like with data analytics and data sharing, issues like role explosion, too many overprivileged users, data sprawl and so forth can happen when many application teams are working on the same datasets. It’s important to get ahead of this with a clear plan for scaling data security across many datasets and applications.

The Double-Edged Sword of Do-It-Yourself Data Security

There were several sessions from Snowflake customers that built their own access control solutions on top of Snowflake’s native functionality, including T-Mobile and Citi. While these homegrown solutions did solve for the initial problem of provisioning many sensitive datasets to many users at scale, they came with costs:

They took several months to build with a team of five or more in-house data engineering resources, during which time several projects were on hold until enough access control functionality was in place.
They were specific to Snowflake, and thus could not be easily reused when the same data was shared with other environments or with third parties.
Maintaining and enhancing the homegrown solution was an ongoing burden for the data engineers involved, resulting in less of their time being available for other data projects.

While watching these sessions, we couldn’t help but wonder if these customers would have gone down a different path, had they been aware of the great products available from TrustLogix and other partners. With TrustLogix:

Instead of waiting for months for a homegrown solution to complete, TrustLogix can be deployed, with data security risks identified and recommendations generated, within minutes.
Policies can be defined centrally and deployed everywhere, not just Snowflake or any other single platform.
Managing change and responding to new requirements can be done within minutes through a single console, instead of cracking open and changing existing code.

LEARN MORE about how "Do It Yourself" Cloud Data Security can lead to inefficiencies and security blind spots: https://www.trustlogix.io/blog/securing-data-in-the-cloud-security-blind-spots-will-hurt-you

Snowflake Continued Investing in Native Security Features

It was great to see Snowflake continuing to in new native functionality, such as:

Query constraint policies, to allow joins, filters and aggregations on sensitive fields while disallowing including those fields in query results
Classifications for additional countries
Tagging enhancements, such as auto-applying tag-based policies against future same-tagged data objects
Ability to manage masking and access control policies on both the provider and consumer side of a data share
Ability to manage fine-grained policies on external tables
Audit histories of sensitive tags, data objects and policies
Exposing more data security functionality via SnowSight

All of these were in response to customer demand, particularly customers doing extensive data sharing with partner organizations and need visibility and control of the partners’ data access, and customers rolling out enterprise-wide data meshes that need access control consistency and auditability across many data products.

However all are available primarily as SQL commands or scripts within Snowflake, requiring data engineering effort to implement and deploy for any given project. Look for TrustLogix to leverage these features, thus providing the same capabilities without coding and deployable in minutes, as with our current functionality. As Snowflake delivers more such native features that customers want to use, the advantage of using TrustLogix becomes greater and greater, in terms of data engineering time and effort saved, the speed of onboarding new users and projects safely, and more data use cases that can be protected.

Security Observability is a Game Changer

Finally, we heard about the importance of security observability from many end-customer and partner organizations. Knowing the current issues and gaps in one’s current data security posture can greatly inform what roles and access policies should be defined for a given data source. These issues and gaps can include:

“Dark Data”, tables with sensitive data that are not being accessed.
“Role explosion”, in which there exist many roles with overlapping privileges, leading to operational chaos.
Identify “ghost accounts” which have not been accessed in a long time.
Monitor usage behavior for unusual patterns, such as a spike in data volumes or accessing at unusual times, or access from unauthorized clients.
“Data sprawl”, for evidence that given tables or schemas that are being copied or transferred out of a given data store to many different targets, or to targets that don’t have the same access controls as the source.

One customer told us this was a “game changer”, in that their security team and data teams would no longer be “flying blind” when defining their access control and de-identification policies. With security observability, one can define the right policies with confidence, knowing their issues and risks are being accounted for.

In addition to identifying these issues, TrustLogix also provides recommendations, and the ability to act on those recommendations directly within our product.

LEARN MORE about how Classification and Security Observability technologies can together provide a superior data security posture: https://www.trustlogix.io/blog/data-intelligence-data-access-governance-data-centric-security

In Conclusion

In conclusion, we left this year’s Snowflake Summit energized by the attention and importance placed on data security as an essential requirement for data innovation and collaboration. Whether it is new industry solutions, generative AI, or supporting enterprise-scale data collaboration projects, the need for data security is clear, and TrustLogix can help. To learn more, visit https://www.trustlogix.io/snowflake and try our free audit at https://www.trustlogix.io/free-trial.