Ally Hepp of Databricks joins the podcast to help marketers make sense of clean rooms, composable CDPs and how to wrangle big data for brand advantage. The team also discusses the critical coupling of data and AI and how AI comes into play for speeding up code migration, conversational interfaces and predicting the next best action.
Transcript
Kyle Hollaway:
Hello and welcome to Real Talk about Real Identity from Acxiom. This podcast is devoted to important identity trends in the convergence of AdTech and MarTech. I’m Kyle Hollaway, your podcast host, and I’m joined by our co-host, Dustin Raney.
Dustin Raney:
So today’s episode will be of particular interest for our listeners who are out there and are curious about some of the leading-edge capabilities as it relates to data-driven intelligence. We’re happy to introduce our guest today, Ally Hepp, who is a solution architect for Databricks. So Acxiom considers Databricks one of its premier partners, and when you start to understand from Ally where they’re investing and innovating, AI, deep analytics, unified cloud infrastructure that you’ll really begin to understand why. Ally, welcome to Real Talk.
Ally Hepp:
Thanks for having me. Excited to be here.
Dustin Raney:
Yeah. So if you don’t mind, start by giving our listeners a snapshot of your background and what brought you to the point you are in your career.
Ally Hepp:
Yeah, absolutely. So I actually started my career in the MarTech space where I helped organizations really evaluate their system landscape across technologies and help them design, quality, scalable, and performance solutions. So prior to Databricks, I spent about eight years in the Salesforce ecosystem. The last four years at Salesforce on their data cloud specialist team really driving CDP and personalization adoption for several enterprise companies. This past year I made the move over to Databricks where I’m a solutions architect supporting our communications, media, and entertainment customers and have had the absolute pleasure of working with Acxiom and the two of you for about eight months now.
Kyle Hollaway:
For our listeners, Databricks obviously an up-and-coming brand. We’re starting to hear a lot more. What’s your position Databricks within the market for everyone, just so that everyone has an understanding where your sweet spot lies.
Ally Hepp:
Yeah, absolutely. So I like to say in really the simplest of terms, Databricks combines the capabilities of data engineering, data science, and data analytics all under one unified platform or environment. And our platform really gives customers the ability to seamlessly build and manage data pipelines. Data scientists can conduct complex analysis using cutting edge machine learning algorithms and business analysts can glean actual insights from massive data sets. And again, all of that being able to accomplish that under one hood or one platform.
Dustin Raney:
Very cool. I know the word clean room is thrown out a lot. Ally, is clean room and Databricks synonymous?
Ally Hepp:
I wouldn’t say Databricks, and clean rooms are synonymous, but I’d say that they are complementary. Databricks’s data intelligence platform is really built on this open source and open standard protocol, simplifying your data state by eliminating the silos that historically we see complicate the data and AI space. On the other hand, clean rooms in the context of Databricks specifically prefer to a specific feature or environment within the Databricks platform. And Databricks, clean rooms really provide that secure environment where data can be accessed and processed while maintaining really the proper clean room standards. And this creates that protection layer preventing any other cloud admins or others from accessing that data. And it’s really a way to ensure that data privacy and security while allowing for more complex computations and workloads.
So I’d say at Databricks, at least when we think about clean rooms, we think about them in two different ways. You have your platforms, and you have your clean room orchestrators. So your platforms being your managed clean rooms, your traditional clean room solutions. Walled Gardens have really employed these data clean rooms to enable that data collaboration between advertisers and their platforms. These clean rooms have demonstrated extreme effectiveness and improving advertisers’ performance by addressing issues like frequency capping and audience suppression.
Now, on the other hand, when we think of orchestrators in the clean room space, we think of your neutral third parties that span clouds and regions and data platforms. And these neutral clean rooms not only help you organize and analyze and measure data, but also facilitate activations across the entire advertising ecosystem, enabling strong planning and activation and measurement of your marketing spend. So I’d say from Databricks’s perspective, our clean room specifically really excels in that interoperability space, ensuring that food collaboration across diverse environments, collaborators can really seamlessly work together across different cloud providers and regions and even data platforms without that need to move data between platforms or between systems.
Dustin Raney:
Very cool. And I’m sure that opens up maybe some different types of data because of the security and collaboration that maybe wouldn’t otherwise have made it into the brand’s ecosystem for analysis and things like that for data scientists to use.
Ally Hepp:
Yeah, absolutely. Not only the need for structured data, but unstructured data and also brands and customers starting to look into, “How can we share more than just data?” So sharing notebooks or models that they’re building out with partners and agencies they’re working within customers. So it really expands beyond just the concept of sharing data, but other assets as well.
Kyle Hollaway:
So as you’ve seen the growth of Databricks and certainly being, like you said, across clouds or ecosystems, what’s the driving use case or value proposition that you’re seeing from your prospective customers coming in to be customers? What’s driving that?
Ally Hepp:
Yeah, I would say that the biggest trend that we’re seeing that are driving customers to come to Databricks right now is of course this trend around not wanting to move their data. So the trend and the requirement for, “How can we securely share data in a way that isn’t actually duplicating it or moving it from one system to the next?” So I think with that, we see a lot of customers who are interested in our Delta Sharing capabilities where our customers can maintain full ownership of that data and our platform really designed to work with their existing cloud storage, giving them the ability to control and manage that data while leveraging Databricks capabilities for any of those data analytics or AI workloads, not needing to move their data to proprietary databases or data warehouses or using a proprietary data format that ultimately lock them in.
I think the other use case that we see a lot of our customers really looking to us to really drive innovation in their teams is when it comes to AI, Databricks has really a long-standing history of being a data and AI platform when our CEO and founder Ali has said from the early days of Databricks, “There is no AI without data.” And Databricks has really always been on a mission to democratize data and AI and make it simple and accessible, leveraging really every company to capitalize on data and AI and not just that 1%.
Dustin Raney:
Yeah, I had a chance to speaking at a Databricks symposium last year and was talking to some of the existing clients of Databricks, and I remember just being blown away. I think a guy I was sitting next to during the symposium, they were leveraging the data in professional soccer, literally all these data points of movements of people on the field analyzing everything from the player’s gait on how they run, to their movement of the ball, and then bringing that back to the whiteboard for the coach to show players exactly what’s the most efficient play, what’s the most efficient move. I was really set back of actual application what’s actually happening with vast amounts of data today in the context of these data clouds with AI on top, do you see friends like banks and insurance companies, are they leveraging AI in this way inside of your environment?
Ally Hepp:
Yeah, absolutely. I think there’s so many use cases to use AI. They can be for more tactical or operational use cases where we see customers who really just want to use AI to speed up code migrations. They have code that’s written on proprietary systems, and they utilize models that they’ve built in Databricks to essentially modernize that code, migrate that code, and they don’t have to tie up all of their engineering resources to do… you can also utilize it for operational efficiencies. So any labor-intensive functions of pulling data from multiple locations or understanding unstructured personalized situations and unstructured compliance laws, making all of that more efficient utilizing models. And then I think on the business context use cases that we see a lot of customers using AI and using the platform to accomplish is of course your conversational interfaces. Creating interfaces for both customer facing interactions as well as internal co-pilots.
So how can you create a chatbot that provides personalized automated support or sales experience to your consumers? Or from a call center perspective, how can you enable a call center staff to ask the right questions on support tickets or gain better insight into customer service interactions or really just provide a more personalized customer support experience and resolve that more quickly. And then I think from your point of view, there’s a lot of those fun use cases, personalization, recommendations, whether it’s tracking the players on a field and what’s their next move and using the models to predict that. Or we also have use cases in the gaming industry.
In a game, you have a player map and where players tend to gravitating on that map and how can we monitor, predict that next action that they’re going to take and maybe put an offer there for them to buy something in the game. So there’s a lot of focus right now from our customers on those real-time personalized experiences, whether it’s on the web, in a game or in a sports league. How can we identify what those are and make better decisions because of them?
Kyle Hollaway:
Yeah, that’s the thing. You just blew my mind there for a second, so I got to regroup here. But it’s really this massive shift of perspective to the true efficacy of data and being able to apply it really across use cases. You as well know there was some very different use cases of a player on a field, a digital player, a shopper walking through your store, contextually they’re very different, but effectively the intelligence you’re trying to gather and assess is still predictive in nature.
You’re trying to look at past behavior, analyze where it may lead to, and then how to best monetize or make use of that knowledge in that engagement with that consumer, that player, whoever it may be. So I think it’s really cool how you can almost separate out the use cases from the mechanisms that you’re leveraging, which I think where Databricks comes in so strong again about… There’s data and then there’s this ability to leverage the data in this kind of AI driven uses to inform a wide variety of use cases.
Ally Hepp:
Yeah. It’s all about helping customers be more proactive than reactive. We’ve always done a lot of reactive analytics and using BI to report on and predict lookalikes, but now being more proactive and what are other customers like them doing and what are we going to assume that they might do as well and being more proactive about how we communicate with them or whatever it might be.
Dustin Raney:
So I think this is where… Sorry, my dog just barked. Let him get over his… All right. So I think this is where our listeners typically think of things through the context of data-driven marketing, most of our listeners are sitting here going like, “How do I practically apply what I’m hearing today on Real Talk to maybe my business.” And I know a lot of the businesses we’re talking to are exploring CDPs customer data platforms, they’re ingesting all the data and that’s where their marketers are actually putting their use cases to work. But in the back end, and I know you guys just recently made an announcement with Salesforce, about lakehouse, data sharing, shared AI models, integrations, ETLs, can you share a little bit with our listeners what you guys are doing there and where you see that going?
Ally Hepp:
Absolutely. So I think it’s twofold. So there’s a question out and we get it from customers a lot of just, “How does Databricks play in… even the smart tech space or the CDP market.” It’s been just booming over the last couple of years. Every enterprise wants a CDP. And I think that now more than ever, we are seeing that CDP interest not just from marketing teams, but really from all teams who work with data across an organization who see the value in having that unified view of the customer. Whether you’re on the data science team or analytics, you need a single view of the customer to really precisely understand who they are and make sure all of the teams across the organization are talking about Ali, who she is with the brand and not in these silos. So I think in regards to how Databricks approaches the CDP market is we center ourselves around the flexibility and extensibility and really helping our customers leverage the power of the lakehouse architecture as really that foundation of a CDP.
So we recognize that there’s tons of value in CDPs, but we also acknowledge that there can also be challenges that organizations face with an off the shelf CDP solution or even just trying to implement a CDP solution as typically your marketing team. And these challenges often send from organizations sometimes treating CDPs as a standalone application separate from an organization’s data stack. And this ultimately leads to more data silos internally and frustration among data engineers and analysts and data scientists, and of course from the business as well.
So to really address these challenges, we like to use the concept of a composable CDP, and all this really means is a composable CDP will consist of different components of off the shelf CDPs. So you still have your data collection, your data storage, your data modeling, of course identity resolution and data activation, but in our case, we’re looking at it as, “How could we implement the best-in-class product for each of these components or capabilities of a CDP?” So that the organization can achieve really a more extensible CDP solution that can solve their specific problems well beyond maybe the common use cases of an off the shelf CDP. And Databricks in that kind of architecture can be used as the storage and modeling layer of that composable CDP, and then we rely on our partners like Acxiom for that identity resolution or for that activation.
And with that you mentioned our Salesforce partnership and just like they’re a great partner to you, they’re a great partner and to us at Databricks as well. And of course, coming from Salesforce, this is my sweet spot. When I heard of this partnership, I was so excited and when I was at Salesforce, I remember when they were just starting to explore this whole bring your own model capability to really meet customer needs who use a Databricks today and don’t have really a streamlined way to access that data from Data Cloud. So it’s been really exciting to see Databricks and Salesforce and this partnership really take off over the last few months. And we are only really just in pilot with a few customers on these capabilities now, but we’re starting to see really great results and feedback from those customers. And when it comes to the benefits of this partnership and what it provides to the business, and I’ll even say organizations as a whole is Salesforce is of course incredibly good when it comes to surfacing big data for business users in a really consumable way.
Databricks, of course, historically has been focused on really providing a platform that can handle the variety and veracity and volume of data, that those data engineers or data scientists or analytics need. And when it comes to the power of the lakehouse for big data and unifying that or combining it with the power of Salesforce when it comes to surfacing that data, it really brings us to, I’d say this catalyst of finally really having all of an organization’s data in a central place, being the lakehouse and opening it up to bring your own model capability so that the business team also has the ability to collaborate on that same set of data. So gone are the days of MarTech teams or CRM admins setting up their own integrations for data ingestion from different sources. Now, they can already use what their data teams have put together in the lakehouse and really just get to work faster. Start to bring more business value to the data. Those experience to customers rather than working on a lot of putting together of integrations or connecting two different data sources together when their data teams already have that done.
Kyle Hollaway:
It seems certainly the concept around composability and this kind of best of breed integration founded on certain specific layers is really transforming the systems’ integration industry or marketplace because now you’ve got these horizontal capabilities that are really helping drive towards that. You’ve got the clouds, which are more of the infrastructure play. Now you’ve got Databricks with this data layer that pre-integrated and creates that holistic view, which then certain application layers can lay on top of that and now really expose capabilities to the end user and obfuscate the complexity that rides within each of those. And you’re really driving towards more business value. And so I think that’s a really interesting dynamic within the marketplace today.
And certainly, Databricks as you’re describing, has a key role in that question, we always like to have maybe some questions about things, and that is in market, it feels like Databricks is carrying forward more of a analyst’s tool or platform persona. That’s where people have maybe pigeonholed you versus other platforms which may be more marketer oriented or even as being more just the customer journey side. Where do you see Databricks? Is there a broader play? Have you grown up into a new persona?
Ally Hepp:
Yeah. I think it’s a good question. And coming from a MarTech background into Databricks, I think I get asked this question all the time. “What the heck are you doing at Databricks? This is a tech focused platform built for the engineers, built for the analysts, what are you doing?” And I think that like, “You’re not wrong.” Databricks has always been perceived in the market as built for technical teams, built for your data engineers and your data scientists. And although those teams do make up a large majority of our users today, we are increasingly finding that the data intelligence platform unlocks even more value for our customers and for organizations when we expand into those MarTech and advertising teams and use cases.
As you both know, the MarTech landscape is incredibly vast and where we see a lot of customers start to utilize Databricks as part of that is really as blue, meaning you have value of data coming in, you can ingest it, clean it, you can add any business context that you need and do all of that in an extremely scalable way. And then you can utilize that data or share that data with the Acxioms, the Adobe, the Salesforces, really your entire MarTech stack. And when I used to work in campaign implementations, one of the primary challenges that I had and I saw was the lack of visibility between tech teams or data teams and your marketing teams, meaning we often didn’t know the vast amount of data that the organization actually had, which could unlock so many additional customer journeys, more personalized experiences.
And this is really where we see Databricks sweet spot and what we can solve for. The glue between the data and the business teams built for collaboration and when used correctly really as the foundation for all of your data needs across the organization. So really, I like to think of it as making your MarTech stack an extension of your data strategy rather than just a silo for your marketers. And I’d say with that, we’re seeing more and more of the trend around interoperability. How can you have it such that you have bidirectional flow of data across the full stack? And with that in mind, because Databricks is open source, it’s really easier than ever for our customers to integrate with tools that they have today or new technologies that they might adopt tomorrow. Again, including the Acxioms and your identity solution, your third-party data solution, making it as easy as possible for our customers to use Databricks as that foundation and continue to use the benefits that an Acxiom might provide as well.
Dustin Raney:
So staying down that same thread, interoperability access for certain users, especially the marketers out there who aren’t necessarily the engineering geeks that understand the deep code, those types of things. We’re seeing a lot more platforms shift to native app environments. I know my phone love that I can call people and text people, but what makes it super valuable is the app store. I can quickly go and download in the app and all of a sudden, I’ve got some new cool game or functionality or access to the different things, features that I wouldn’t have previously. Is Databricks thinking down that path as well as far as an architecture, are you guys maybe going down that native app platform allowing companies like Acxiom or data companies to build native apps inside of your environment?
Ally Hepp:
Yeah, so I definitely think that there’s been lots of conversations about this internally. I wouldn’t be maybe surprised if eventually we have capabilities like that. I think ultimately Databricks being an open-source platform. A lot of our customers actually build on top of Databricks already today. And one thing that we find in talking to our customers is that a lot of the times their consumers or even internally if they’re using that kind of app that was built on top of Databricks, they don’t even know they’re actually using Databricks under the hood because you’re able to so seamlessly build on top of us.
I think with that being said in regards to just some of the additional capabilities that we try to bring our customers is of course we have our marketplace, which Acxiom is a part of where Acxiom customers can go in and easily get Acxiom third party data off the marketplace as a very seamless way to integrate some of those solutions or needs as part of their build or their implementation of Databricks. Again, just a more seamless way to access that versus needing to build separate integrations with their partners like Acxiom.
Dustin Raney:
No, that makes sense. And companies can still build, like you said, applications, it just might not be native to the code set required by Databricks to do so. You guys are still taking more of the agnostic approach to application. Very cool. And with large language models now in the major upper part of the hype cycle, are you seeing brands start to build their own ChatGPTs, their own large language models? And then maybe second question behind that, what privacy and security things are you seeing people put in place to ensure that the model doesn’t go rogue?
Ally Hepp:
Yeah, so I’d say of course in the world of gen AI, we’ve seen this uptick of concerns around governance. And I think it really brought back up a lot of those conversations around data privacy and control. And I will say at Databricks governance and data privacy has always been one of our key value levers and really integral to the company’s approach to data and AI. I think our Unity Catalog, which is really our governance layer is a prime example of this where in a world without the lakehouse organizations are having to govern different policies in their data warehouse for BI, in their data lake for data management, and in their tools for data science and machine learning, all in this siloed approach where with Databricks, we try to make it easy for our customers, how can we up level this governance so that you’re only having to manage it once across all of your data and AI assets, and you still have all of the fine-grained access controls, but you also get lineage, end to end.
So you have lineage from the second that data is ingested through a pipeline all the way through transformation, building out your goal level tables, and now even into when you start to build your models against it. So you have full end-to-end visibility of everywhere that table and even row level insights of where that data is being utilized throughout your organization. So I’d say with that, governance and data privacy and control is really a huge topic for us. And I think on the LLM front, we definitely have customers who are using open AI type models in Databricks, but we also have customers who want to build LLMs with only their data. And we’re seeing them start to explore these custom approaches where they fully own and manage and have security over those models end to end. So it’s a really exciting time to be part of these conversations with customers when they’re really starting to get into, “Okay, how could we actually own everything about this model versus using something like an open AI to build something like that.”
Kyle Hollaway:
Yeah, it’s awesome. And it is exciting time and just to see how it’s rapidly evolving in that space. Just to deviate just a little bit, ’cause I know we’re getting shorter on time, but you talk about ownership governance and really brands wanting to do their own and now they’ve got this great platform that’s really trying to enable that to this composability and stuff. And then you have the Walled Gardens. And we’re interacting with the all Walled Gardens, especially in the marketing and advertising space of being able to reach, do all the analytics like you mentioned before, really understand your customers, “What’s the next best action? All this kind of stuff.” And then it’s like, “Oh, I need to reach them in the Walled Garden setting.” Yet there’s a lack of data transparency there. How is Databricks working with the Googles and Facebooks of the world, the Metas, and do you see that evolving any… Or how’s that playing out for you guys?
Ally Hepp:
Yeah. I think with the Walled Gardens specifically, we have a close partnership with them. And I think with that, it brings us back to the clean room type conversations where we would utilize our clean room solution with potentially a Walled Garden to help our customers build out different analytics that they might need or any information that they’d need to gather off of that. So I’d say we have a close partnership with them. Absolutely, especially in the CME industry. We are talking about advertising all the time. How can we measure ROI based off of advertising? A lot of insights that we want to gather from effectiveness there. So it’s definitely an area that we focus on and make sure that our customers can glean the insights that they need from that.
Dustin Raney:
One more question down that path. We know that Apple already deprecated cookies, third party cookies for years now, Google’s next. And that obviously had an effect on the Walled Gardens. Companies like Facebook where their pixel isn’t showing the actual results or results of advertising performance because there’s breakage, they’ve shifted to conversion APIs and things like that. Is Databricks playing? You guys are huge in the analytics. Are you being used to offset some of the things that brands are having the challenges they’re facing with cookie deprecation?
Ally Hepp:
Yeah. And I think that’s a big area where clean rooms come in. Like you said, we’ve all been anticipating this cookie deprecation and Google has finally announced that this is the year it’s going to happen. So we’re definitely seeing an influx of interest from customers around clean rooms and how can they start to use that to replace what cookies were giving them before, and giving marketers and advertisers new tools that they can use to target their consumers and engage with them in this cookie list future. And I think collaboration is going to be a huge piece of that. Again, we’ve seen this massive shift in the market over the last 24 months where clean rooms and data sharing have become really fundamental for organizations to really invest in.
And we’re seeing a lot of asks from customers and they’re looking at data not just as an internal asset anymore, but how can they maybe be able to commercialize it for external purposes as well, whether it’s a weather organization or geolocation type use case. We’re seeing a lot of really more creativity; I’d say with publishers that are looking to make abstractions of their own data and make it available for monetization opportunities. And we’re seeing more and more asks around AI and ML use cases as well when it comes to sharing data to get better real-time personalization or ad use cases. Again, and I know we started with this, but looking beyond just the sharing of data and using clean rooms or data sharing or sharing, I’ll say capabilities for the ML type use cases as well.
Dustin Raney:
Awesome. Well, Ally, we are definitely coming to unfortunately an end of our time today in this episode. We always like to ask one standard wrap-up question, what has you the most excited about the next 12 months?
Ally Hepp:
I would say I’ve only been at Databricks a little less than a year now, and we just churn out features and iterate so incredibly fast. But I would say what I’m most excited about is the data intelligence platform. And I think it’s really going to fundamentally change how our customers use the lakehouse and use Databricks. And the data intelligence platform is rooted in the lakehouse. So you still have everything that the lakehouse provides when it comes to governance and scalability, but now you have this intelligence layer on top of it. Customers can securely find data and understand tables with natural language and the intelligence engine will automatically take some of that admin workout for a lot of our users, whether it’s optimizing the data or needing to index or partition the data, they don’t have to do these tasks themselves. This intelligence engine will take care of that.
Your queries will be optimized for you, they’ll be more performant and really let users focus on business value rather than those tuning capabilities. And I think the other aspect that brings us back to the very beginning of this conversation is for those less technical savvy users who maybe aren’t typically used to using with a Databricks, they can now use the platform and we even have capabilities like Text2SQL where you can write queries using natural language and Databricks will generate the necessary code for you and run that for you to give you the results that you’re looking for. So I mentioned a few times today, Databricks is really founded on this concept of democratizing data and AI, and I really think that this data intelligence platform that it’s allowing us to do this at a scale that Databricks has just never realized before, and I can’t wait to see what our customers are going to do with it.
Dustin Raney:
Very cool. It’s very exciting times indeed when AI starts generating code on your behalf. That’s very cool. So I can’t wait to see what transpires inside of Databricks over the next 12 months as well. And we certainly look forward to helping brands make use of such incredible technology. Ally, thank you so much for joining us today. I know Kyle and I, we always learned so much and we learned so much today from you. We’re excited about what you guys are doing and we certainly would hope to have you back maybe six months to a year to see where things are for our listeners, thank you for sticking with us for another episode of Real Talk. You can go check out our other episodes at acxiom.com/realtalk or find us on your favorite podcast platform. Thank you, and we’ll talk to you next time.