State of the Data Lakehouse, 2024: Businesses Are Leaving Cloud Data Warehouses For Data Lakehouses

health news

New Dremio survey of enterprise IT professionals finds data lakehouses are the primary architecture for delivering analytics, with 65% running a majority of analytics on lakehouses

Over half are saving more than 50% on analytics, and 81% are using a data lakehouse to support work on AI models and applications

SANTA CLARA, Calif.–(BUSINESS WIRE)–Dremio, the easy and open data lakehouse, today announced the release of its survey findings and full report, The State of the Data Lakehouse, 2024. The report offers fresh insights from 500 full-time enterprise IT and data professionals on data lakehouse adoption, open table format trends, data mesh implementation for self-service analytics, and AI’s impact on the lakehouse and beyond.


Data lakehouse adoption is on the rise and cost savings are key

The data lakehouse is fast becoming the primary architecture for delivering analytics. With 65% running a majority of analytics on lakehouses now, survey respondents cited cost efficiency and ease of use as the top reasons.

  • 70% of respondents say more than half of all analytics will be on the data lakehouse within three years, and 86% said their organization plans to unify analytics data.
  • Over half (56%) expect they are saving more than 50% on analytics by moving to the data lakehouse; almost 30% of respondents from large enterprises with more than 10,000 employees expect their savings are greater than 75%.
  • 42% moved from a cloud data warehouse to the data lakehouse—more than from any other environment. Top reasons for the shift were cost efficiency and ease of use.

Open table formats are transformative and Apache Iceberg is quickly gaining momentum

Amid the generative AI frenzy, a quieter revolution has been taking place: Open table formats—a foundational component of data lakehouses—are bringing full SQL functionality directly to the data lake. This enables organizations to move away from decades-old data warehouse architectures and their associated inefficiencies.

The survey found that Apache Iceberg and Delta Lake are clearly the leading open table formats. The survey confirmed Iceberg’s growing popularity. While 39% of respondents are currently using Delta Lake, compared to 31% who are using Iceberg, 29% adopting an open table format in the next three years plan to choose Iceberg, compared to 23% for Delta Lake.

Respondents cited multiple factors that influenced their choice of a particular table format including: performance (77%), compatibility with specific tools or platforms (72%), specific features (62%), and an open ecosystem (59%).

Data mesh is at the heart of digital transformation and is driven more by business units

Full or partial data mesh implementations are happening at most enterprises and expansion is expected by nearly everyone, according to the survey. As a key technology enabling the success of data mesh strategies, the data lakehouse is making self-service analytics, domain-driven data ownership, data as a product, and federated governance a reality for teams on the ground.

  • 84% of respondents have fully or partially implemented data mesh, and 97% expect data mesh implementation to continue to expand in the next year.
  • Top objectives of implementing a data mesh are improved data quality (64%) and data governance (58%); almost half or just over half of respondents also named agility, scalability, improved data access, and improved decision-making.
  • Data mesh initiatives are driven more by business leaders and business units (52%) than by centralized IT teams.

The data lakehouse is critical in the AI era

The data lakehouse is already improving AI-driven data management, governance and compliance, as well as the work lives of IT professionals. Data self-service, which data lakehouses enable, is fundamental for AI development, as the vast majority of respondents say their enterprise is using a lakehouse to support data scientists building and improving AI models and applications. With respect to job-related issues, a majority cited manual repetitive processes and manual data merging and reconciliation as problems, speaking to the need for more automation and AI-assisted data management and governance.

  • 81% of respondents are using a data lakehouse to support data scientists building and improving AI models and applications.
  • 62% disliked manually merging and reconciling data from multiple sources, repetitive manual processes, and cleaning up raw data.
  • Technical professionals overwhelmingly agree AI is a national security priority (84%), noteworthy in light of the recent U.S. executive order on AI.

For the full survey report, download the State of the Data Lakehouse, 2024 here.

About the Survey

A nationwide survey of 500 full-time IT and data technology professionals distributed across industries was conducted by Propeller Insights, sponsored by Dremio, between August and September 2023. The survey fielded questions looking toward 2024. All respondents worked at enterprises with 1,000 or more employees. Industries represented included: banking, financial services and insurance; health technology; manufacturing; science and technology; high tech; telecommunications and media; retail; education; construction; and other industries. Roles included IT directors (64%), as well as data and analytics managers and directors, data scientists, software engineers, data analysts, and data engineers.

Propeller Insights is a full-service market research firm based in Los Angeles. Using quantitative and qualitative methodologies to measure and analyze marketplace and consumer opinions, they work extensively across industries such as travel, brand intelligence, entertainment/media, retail, and consumer packaged goods.

About Dremio

Dremio is the easy and open data lakehouse, providing self-service analytics with data warehouse functionality and data lake flexibility across all of your data. Use Dremio’s lightning-fast SQL query service and any other processing engine on the same data. Dremio increases agility with a revolutionary data-as-code approach that enables Git-like data experimentation, version control, and governance. In addition, Dremio eliminates data silos by enabling queries across data lakes, databases, and data warehouses, and by simplifying ingestion into the lakehouse. Dremio’s fully managed service helps organizations get started with analytics in minutes, and automatically optimizes data for every workload. As the original creator of Apache Arrow and committed to Arrow and Iceberg’s community-driven standards, Dremio is on a mission to reinvent SQL for data lakes and meet customers where they are on their lakehouse journey.

Hundreds of global enterprises like JPMorgan Chase, Microsoft, Regeneron, and Allianz Global Investors use Dremio to deliver self-service analytics on the data lakehouse. Founded in 2015, Dremio is headquartered in Santa Clara. CNBC recognized Dremio as a Top Startup for the Enterprise and Deloitte named Dremio to its 2022 Technology Fast 500. To learn more, follow the company on GitHub, LinkedIn, Twitter, and Facebook, or visit www.dremio.com.

Contacts

Elise Woodard

elise.woodard@dremio.com