How Databricks won the battle, for now - Conference Recap - Part I
The battle between Product led Growth and Sales (SLG)
“Databricks is a $38 billion dollar mistake” wrote
in his, The end of Big Data back in April of 2022. Benn opened it up by recalling his early experience with Databricks:I first heard about Databricks in 2014…. When I looked it up, my first reaction was that it was built for people smarter than me. Instead of a demo or product screenshots, I found technical papers explaining something I didn’t understand.
Benn goes on comparing and contrasting his experience with Databricks relative to that with Snowflake, and how positively different was the latter:
The pitch we heard from Snowflake was both the dumbest and most effective sales pitch I’ve ever heard. We were told that it was the same as Redshift—and really, the same as Postgres—but big, fast, and stable.
And maybe that is where it could have ended if it were not for one small detail: Databricks actually had a strong product, built around a foundationally-important way to query Data lake with the industry’s most favorite and accessible language “SQL” (i.e. SparkSQL).
And for those who had any remaining doubt, we saw all that culminate at the recent Data & AI Summit.
Coincidently, Snowflake too hosted its event that same week. The two events could not have been more different.
The Future vs The Party
Databricks event took place in San Francisco, the city that has supposedly been under collapse for like..2, 4, 20, oh wait,… 40 years (or, as some books would say: forever). The city that has been taken by the AI hype, which has managed to attract a critical mass of AI developers. Arriving in SF these days is like looking into the future with all of its downfalls: expensive self-driving cars passing every 5 minutes, all while countless homeless people and drug addicts spread themselves along city pavement.
Snowflake’s event, on the other hand, just skipped the future part altogether and threw a big party. Snowflake’s event took place in.. **drum roll**… Las Vegas. The choice of locations could not be more telling.
Snowflake’s event, on the other hand, just skipped the future part altogether and threw a big party
If you’re into a good party, Databricks was not a good event for that. The big party Databricks threw at the end of the event is what you would expect of the party in San Francisco… The most party-level exciting thing at the event was a private party for Europeans held at a nearby InterContinental hotel, where one could overhear funny politically incorrect French jokes and boring Swiss-German jokes all within 6ft of each other.
People clearly gathered at Databricks summit for sponging raw knowledge. The entire first day of the conference was devoted to technical certification on Databricks, and engineers lined up for certifications all through the week.
When it came to individual talks, people were standing in lines for 30+ min to get into top AI talks.
But this is tech - no one said it was supposed to be a big party. And maybe this is what Benn missed in his analysis. Not every part of tech needs to have a pitch-ready delivery for a business buyer on day 1.
Not every part of tech needs to have a pitch-ready delivery for a business buyer on day 1
Snowflake’s Sales Led Growth vs Databricks’ PLG
Increasingly there are many voices (Lauren B 👀) that predict a downfall for Snowflake and its associate vendors (e.g. Fivetran). The premise is that the pricing structure of these vendors is built on a deck of cards.
I don’t personally subscribe to this Utopian world view. If there is anything that Benn’s forecasting mistake teaches us is that in tech it is never a good idea to predict someone’s downfall. Next time you decide to predict someone’s downfall, I highly encourage you to look into the stories spanning two decades predicting downfall of Dell.
in tech it is never a good idea to predict someone’s downfall
What’s really different about the two vendors and what all previous analysis seems to have missed is that Snowflake grew via Sales Led Growth, while Databricks took on time to build sales via Product Led Growth.
Naturally, the two companies look nothing like each other. Databricks - highly distributed with a lot of strong core engineering working across internal products. Snowflake, a company with one of the best sales forces in tech, but a product strategy that largely bets on innovation through acquisitions (Databricks is starting to act very similarly here, but Snowflake has been at it longer).
And in each case, the strategy makes perfect sense. If you’re Snowflake, and you’re good at capturing value upfront through pricing and 1st-tier sales team, it makes perfect sense to bet on external innovation. Snowflake knows how to generate revenue from user adoption, so they can afford to wait until they can acquire a product such as Streamlit - with all the users, but no revenue.
If you’re Databricks, and your product sells itself via bottom-up developer adoption (ie PLG), it makes perfect sense to double down on less-structured approach to sales, which maybe is not as good at capturing all value upfront, but leads to major one-off type of contracts with the likes of The Department of Defense. Those contracts are not immediately lucrative, but eventually lead to major strategic expansions over time.
Coming back to Benn’s initial prediction… It wasn’t that Databricks was bad (and was going to die along with Big Data) and Snowflake was good, but it’s that basically we’re at the end of a decade with two very different cycles: Databrick’s PLG that is only now fully capturing all of its value vs Snowflake’s SLG, which captured all value sometime 3-5 years ago and is now trying to hold on to it all for a dear life.
But don’t take my word for it
I chatted with a lot of people at the event. And while I am still assembling all the conversations and interviews, here are few short takes from some of the attendees.
(Rui) CTO from Expedock, an AI for Global Supply Chain Companies says:
Generative AI was the real standout of this event. Building large language models natively in the orgazational data lakehouse seems like a clear way Databricks will capitalize on this new technology wave. I particularly enjoyed the talks that showed applications of these models and how to construct them from the ground up.
Sharon from Citco Group, a 10,000 employee corp providing financial services for hedge funds says:
The Keynote talks on Lakehouse iQ powered by AI engine sounds exciting and promising, because I think that will give the business users a platform to directly interact within a data warehouse environment without having prior knowledge of querying/scripting to generate queries and charts for themselves. I like how generative AI capabilities are being planned to be incorporated into the data engineering/BI/science area by databricks.
Lakehouse monitoring, Vectorless indexing were other topics that caught my interest; as those capabilities will certainly make data engineer's lives easier on day to day basis.
I also found the session by Bradesco open-banking interesting. It displayed the power of Delta Lake and unity catalog, PII features on Databricks.
Head of AI from RBC Capital Markets:
It was great learning about every new product like Lakehouse AI, OSS model library, MLflow AI gateway, Lakehouse Monitoring, Lakehouse IQ.
So there you have it. Databricks - from Big Data to AI. Not sure about the right $ figure here, but definitely not a version of Benn Stancil’s “$38 billion dollar mistake”.