Google data lakehouse

12/18/2023

Dremio Arctic overviewĭremio Arctic is an intelligent metastore for Apache Iceberg, an open table format for huge analytic datasets, powered by Nessie, a native Apache Iceberg catalog. Your VPC hosts the execution plane, which contains the compute engines. Dremio’s VPC hosts the control plane, including the SQL processing. IDGĭremio Cloud is split into two Amazon virtual private clouds (VPCs). It can also connect to data sources in tables in lakehouse storage and in external relational databases. Dremio also claims, without specifying a point of comparison, that data engineers can ingest, transform, and provision data in a fraction of the time thanks to SQL DML, dbt, and Dremio’s semantic layer.ĭremio has no business intelligence, machine learning, or deep learning capabilities of its own, but it has drivers and connectors that support BI, ML, and DL software, such as Tableau, Power BI, and Jupyter Notebooks. Dremio claims raw speed that’s 3x faster than Trino (an implementation of the Presto SQL engine) thanks to Apache Arrow, a standardized column-oriented memory format. The control plane processes SQL queries with the Sonar query engine and sends them through an engine manager, which dispatches them to an appropriate compute engine based on your rules.ĭremio claims sub-second response times with “reflections,” which are optimized materializations of source data or queries, similar to materialized views. The execution plane holds multiple clusters, called compute engines. If you use multiple cloud accounts with Dremio Cloud, each VPC acts as an execution plane. Dremio Cloud is basically the Dremio server software running as a fully managed service on AWS.ĭremio Cloud’s functions are divided between virtual private clouds (VPCs), Dremio’s and yours, as shown in the diagram below. Dremio Cloud overviewĭremio server software is a Java data lakehouse application for Linux that can be deployed on Kubernetes clusters, AWS, and Azure. After all, data lakes and data warehouses fulfill different use cases and serve different users, even though data lakehouses at least partially span the two categories. Less direct competitors are data warehouses that support external tables, such as Snowflake and Azure Synapse.ĭremio has painted all enterprise data warehouses as their competitors, but I dismiss that as marketing, if not actual hype. Easy to adopt new engines in the future, simply point them at the data.Ĭompetitors to Dremio include the Databricks Lakehouse Platform, Ahana Presto, Trino (formerly Presto SQL), Amazon Athena, and open-source Apache Spark.Easy to adopt additional engines today and.Flexibility to use multiple best-of-breed engines on the same data and use cases.

Dremio claims three advantages that derive from this: I don’t completely agree with this, but I do agree that it’s really hard to move large amounts of data from one cloud system to another.Īlso according to Dremio, cloud data lakes such as Dremio and Spark offer more flexibility since the data is stored where multiple engines can use it. No lock-in, with the flexibility to use any engine today and tomorrow.Īccording to Dremio, cloud data warehouses such as Snowflake, Azure Synapse, and Amazon Redshift generate lock-in because the data is inside the warehouse.

Support for any data, with the ability to ingest data into the lakehouse or query in place and.
Fully managed, with minimal software and data maintenance.
SQL for everyone, from business user to data engineer.
They start with a data lake and add fast SQL, a more efficient columnar storage format, a data catalog, and analytics.ĭremio describes its product as a data lakehouse platform for teams that know and love SQL. Data lakes hold even more data that can be unstructured or structured, initially stored raw and in its native format, typically use cheap spinning disks, apply schemas when the data is read, filter and transform the raw data for analysis, and are intended for use by data engineers and data scientists initially, with business analysts able to use the data once it has been curated.ĭata lakehouses, such as the subject of this review, Dremio, bridge the gap between data warehouses and data lakes. As you may recall, data warehouses contain curated, structured data, have a predesigned schema that is applied when the data is written, call on large amounts of CPU, SSDs, and RAM for speed, and are intended for use by business analysts. Both data warehouses and data lakes can hold large amounts of data for analysis.

0 Comments

Author

Archives

Categories

Google data lakehouse

Leave a Reply.