IBM Netezza Data Warehouse Appliance

What Exactly Is Netezza?

The IBM Netezza data warehouse appliances are purpose-built for crunching massive volumes of data quickly and efficiently. This is delivered with IBM Netezza Analytics which is fully integrated into the IBM Netezza data warehouse asymmetric massively parallel processing (AMPP) architecture enabling data exploration, model-building, model-diagnostics, and scoring with unprecedented speed.

IBM Netezza data warehouse appliances eliminate administrative tasks such as query indexing, storage management, buffer pool tuning, memory allocation, and schema optimization.

The combination of IBM patented AMPP and Field Programmable Gate Arrays (FPGAs) deliver fast query performance and modular scalability on highly complex workloads, while still supporting business intelligence and data warehouse users.

Asymmetric Massively Parallel Processing (AMPP) Architecture

The key to AMPP is performance. Scalability goals can be met using elements of both SMP and MPP, applying each method where it is best suited to meet the specific needs of BI applications operating on terabytes of data. This is implemented with a two tiered architecture:

Tier 1: SMP Host:

  • Compiles the queries received from all users
  • Generates the query execution plan
  • Divides the query into sub-queries or snippet which can be executed in parallel and distributes snippets for the SPU
  • Finally returns the results to the users

Tier 2: SPU (Snippet Processing Unit):

  • This tier contains many SPUs which operate in parallel
  • Each SPU:
    1. Is an intelligent query processing unit
    2. Is a storage node
    3. Consists of a powerful commodity processer, dedicated memory, a disk drive, and a field-programmable disk controller with hard-wired logic to manage data flows and process queries at the disk level
  • The massively parallel, shared-nothing SPU blades provide the performance advantage of an MPP system
  • The SPUs respond to requests from the host, they are highly autonomous, performing their own scheduling, storage management, transaction management, concurrency control and replication

The data traffic among SPUs and between SPUs and the SMP host is greatly reduced using intelligent query streaming technology. This technology intelligently filters records as they stream off the disk, delivering only the relevant information for each query instead of moving data into memory or across the network for processing. Intelligent query streaming is performed on each SPU by a Field-Programmable Gate Array (FPGA) chip that functions as the disk controller, and is also capable of basic processing as data is read off the disk. The system is able to run critical database query functions such as parsing, filtering, and projecting at full disk reading speed, while maintaining full ACID (Atomicity, Consistency, Isolation, and Durability) transactional operations of the database.

Netezza Analytics

IBM Netezza Analytics is designed to accelerate analytic queries and shorten query times, effectively providing better and faster answers to the most complex business questions.

This can be used for:

  • Data exploration and discovery
  • Data transformation
  • Model building
  • Model diagnostics
  • Model scoring

The IBM Netezza 100 series, IBM Netezza 1000 series, and the IBM Netezza High Capacity Appliance series are the parts of IBM Netezza data warehouse appliance family.

Netezza 100 Series

Delivers fast performance for entry-level data warehouses. This is a powerful solution for small to mid-sized data warehouses and can be used as development and test systems for high-performance BI applications.

This is an easy-to-use appliance that delivers high performance out of the box, with no indexing or tuning required. It is delivered ready-to-go for immediate data loading and query execution and integrates with all leading ETL, BI and analytic applications through standard ODBC, JDBC, and OLE DB interfaces.

Netezza 100 is a very affordable analytical option, delivering up to 10 TB of user data capacity in a compact physical and environmental footprint.

Netezza 1000 Series

IBM Netezza 1000 is a purpose-built, standards-based data warehouse appliance that architecturally integrates database, server, storage, and advanced analytic capabilities into a single and easy-to-manage system. The IBM Netezza 1000 appliance is designed for rapid and deep analysis of data volumes scaling into the petabytes.

This appliance helps modelers to operate on the data directly inside the appliance instead of having to offload it to a separate infrastructure and deal with the associated data preprocessing, transformation, and movement. Once the model is built, the prediction and scoring can be done right where the data resides, in line with other processing, on an as-needed basis. Users can get the results of prediction scores in near real-time, helping operationalize advanced analytics and making it available throughout the enterprise.

IBM Netezza 1000 adheres to IBM’s basic principle of moving processing close to the data.

Each IBM Netezza 1000 appliance contains multiple snippet blades or S-Blades, where SQL query code segments (or ‘snippets’) and complex analytic processes are executed. The S-Blades are intelligent processing nodes that make up the massively parallel processing engine of the appliance. Each S-Blade is an independent server that contains powerful multi-core Intel CPUs, IBM Netezza’s unique multi-engine FPGAs and gigabytes of RAM – all balanced and working concurrently to deliver peak performance.

IBM Netezza High Capacity Appliance

The IBM Netezza High Capacity Appliance extends IBM Netezza’s family of data warehouse appliances to new extremes of data capacity, scaling to multiple petabytes of user data. This will enable organizations to meet a variety of analytical and historical data storage requirements with a single cost-effective appliance.

The IBM Netezza High Capacity Appliance series accelerates the industry’s leading massively-parallel data warehouse architecture to multi-petabyte scale, creating a “queryable archive” that can store, query, and analyze thousands of terabytes of data quickly and cost-effectively.

The IBM Netezza High Capacity Appliance series arrives preconfigured and is typically ready to load data.

As databases scale to tens or hundreds of terabytes and petabytes, the increased data movement becomes unworkable, resulting in “data inertia”. The IBM Netezza High Capacity Appliance runs analytic computations directly in the appliance – without moving data – to ensure maximum analytics performance.

The IBM Netezza High Capacity Appliance reduces the cost and expands the available disaster recovery options for IBM Netezza users. Offering a wide range of capacities several times larger than those available in IBM Netezza 1000 (formerly known as TwinFin), one IBM Netezza High Capacity Appliance can serve as a consolidated hot-standby platform for one or more IBM Netezza 1000 appliances. This option is a good fit for users with multiple systems who need to redirect critical workloads to hot-standby systems during an outage.

The IBM Netezza High Capacity Appliance processes queries using IBM Netezza’s proven Asymmetric Massively Parallel Processing (AMPP) architecture. With AMPP, load, query, and analytic work is split into many pieces and run in parallel to accelerate results. IBM Netezza High Capacity Appliances further shorten query times and raise throughput using software innovations such as ZoneMap acceleration, Clustered Base Tables, and automatic data compression designed to streamline data movement and minimize I/O.