AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Google data lakehouse12/18/2023 ![]() ![]() Dremio Arctic overviewĭremio Arctic is an intelligent metastore for Apache Iceberg, an open table format for huge analytic datasets, powered by Nessie, a native Apache Iceberg catalog. Your VPC hosts the execution plane, which contains the compute engines. Dremio’s VPC hosts the control plane, including the SQL processing. IDGĭremio Cloud is split into two Amazon virtual private clouds (VPCs). It can also connect to data sources in tables in lakehouse storage and in external relational databases. Dremio also claims, without specifying a point of comparison, that data engineers can ingest, transform, and provision data in a fraction of the time thanks to SQL DML, dbt, and Dremio’s semantic layer.ĭremio has no business intelligence, machine learning, or deep learning capabilities of its own, but it has drivers and connectors that support BI, ML, and DL software, such as Tableau, Power BI, and Jupyter Notebooks. Dremio claims raw speed that’s 3x faster than Trino (an implementation of the Presto SQL engine) thanks to Apache Arrow, a standardized column-oriented memory format. The control plane processes SQL queries with the Sonar query engine and sends them through an engine manager, which dispatches them to an appropriate compute engine based on your rules.ĭremio claims sub-second response times with “reflections,” which are optimized materializations of source data or queries, similar to materialized views. The execution plane holds multiple clusters, called compute engines. If you use multiple cloud accounts with Dremio Cloud, each VPC acts as an execution plane. Dremio Cloud is basically the Dremio server software running as a fully managed service on AWS.ĭremio Cloud’s functions are divided between virtual private clouds (VPCs), Dremio’s and yours, as shown in the diagram below. Dremio Cloud overviewĭremio server software is a Java data lakehouse application for Linux that can be deployed on Kubernetes clusters, AWS, and Azure. After all, data lakes and data warehouses fulfill different use cases and serve different users, even though data lakehouses at least partially span the two categories. Less direct competitors are data warehouses that support external tables, such as Snowflake and Azure Synapse.ĭremio has painted all enterprise data warehouses as their competitors, but I dismiss that as marketing, if not actual hype. Easy to adopt new engines in the future, simply point them at the data.Ĭompetitors to Dremio include the Databricks Lakehouse Platform, Ahana Presto, Trino (formerly Presto SQL), Amazon Athena, and open-source Apache Spark.Easy to adopt additional engines today and.Flexibility to use multiple best-of-breed engines on the same data and use cases. ![]() Dremio claims three advantages that derive from this: I don’t completely agree with this, but I do agree that it’s really hard to move large amounts of data from one cloud system to another.Īlso according to Dremio, cloud data lakes such as Dremio and Spark offer more flexibility since the data is stored where multiple engines can use it. No lock-in, with the flexibility to use any engine today and tomorrow.Īccording to Dremio, cloud data warehouses such as Snowflake, Azure Synapse, and Amazon Redshift generate lock-in because the data is inside the warehouse. ![]()
0 Comments
Read More
Leave a Reply. |