Today’s DBA typically manages 10s, 100s, or even 1000s of databases, often from multiple vendors, both on-premise and in the cloud. These may include RDBMS, NoSQL DBMS, and or Hadoop clusters.
While management automation has made substantial strides enabling DBAs to handle larger workloads, the care, and feeding of these databases are still all too frequently burdensome tasks.
Data warehouses, data marts, and data lakes usually require the most attention. Let’s discuss how using Snowflake RDBMS can dramatically reduce the DBA’s workload!
Snowflake’s architecture consists of 3 distinct layers.
Snowflake’s revolutionary architecture brings new opportunities and approaches to managing the cost of running its platform. To paraphrase Voltaire and Peter Parker’s uncle Ben Parker, “ with great innovation comes great power; with great power comes new responsibilities”. The overall workload architecture is key to managing spend, which in turn requires new modes of thinking and new administration responsibilities.
Historically, workload management was focused on improving performance and throughput; it was seldom directly involved in cost control.
In “legacy” RDBMS, both in cloud and on-premises, workload management is focused on allocating scarce resources among users running different types of queries…
In Agile Cost Management in Snowflake — Part 1, Compute, we discussed managing compute costs. Managing storage costs is much simpler, but it is still very important, as poorly managed storage will result in unexpected expenses.
As with Compute, Snowflake’s revolutionary architecture requires a different approach and mindset to managing storage. In legacy DW RDBMS, storage is a limited resource. The need to accurately estimate current and future requirements is a key driver in determining cost. Adding storage is expensive and often requires downtime. Reducing physical storage is unheard of. Instead, DBAs typically focus on reclaiming and reusing storage as…
In this article, we discuss building an architecture for performing user friendly historical analysis of Industrial Internet of Things (IIOT) data.
IIOT data is a sub-category of Internet of Things (IOT) data. IOT data is gathered from a large number of similar or identical devices. For our purposes, IIOT data is collected from Programmable Logic Controllers (PLCs) used to control and monitor equipment in “industrial” plants. Each PLC typically manages and monitors a well-defined set of steps in the overall process. Multiple data samples are gathered from multiple PLC sensors. …
Hadoop, (symbolized by an elephant), was created to handle massive amounts of raw data that were beyond the capabilities of existing database technologies. At its core, Hadoop is simply a distributed file system. There are no restrictions on the types of data files that can be stored, but the primary file contents are structured and semi-structured text. “Data lake” and Hadoop have been largely synonymous, but, as we’ll discuss, it’s time to break that connection with Snowflake’s cloud data warehouse technology.
Hadoop’s infrastructure requires a great deal of system administration, even in cloud managed systems. Administration tasks include: replication, adding…
This is the promised follow up to my last article, Snowflake, and ELVT vs. [ETL, ELT], discussing the advantages of Extract, Load and Virtual Transform (ELVT) using VIEWs instead of physical [ETL, ELT].
This article expands ELVT into a layered data model I call the Data Cake model. The ELVT style of using VIEWs is not new; Snowflake’s unique architecture and performance enable a more comprehensive and robust use of this technique.
The Data Cake model’s purpose is “freeing the data” from BI vendor lock-in. providing all the EVLT advantages previously described in Snowflake and ELVT vs. [ETL, ELT].
Over several generations of RDBM technologies, I have learned that common practices, knowledge, and attitudes have become obsolete. Query planning and optimization are continually evolving. This evolution, combined with Snowflake’s revolutionary performance capabilities and data model agnosticism, leads many database practitioners to a new architectural pattern. I call Extract, Load, Virtual Transform (ELVT) a key element of this new pattern.
A little historical perspective is in order. Running a SQL query involves three fundamental steps:
I have been in the data business through several RDBM generations and have seen many attempts at comparing performance between competing vendors.
To say those comparisons should be taken with a grain of salt is an understatement. The resulting salt consumption would not be good for anybody’s health.
The Transaction Processing Council ( TPC) performance benchmarks provide the standard. TPC provides datasets and specifications for various benchmarks.
Historically, RDBMS vendors ran (or avoided running) TPC benchmarks themselves and boasted about the results.
This process came with the caveat: “there are lies, damn lies, and (vendor) benchmarks”. …