Introduction To Netezza
Netezza is a data warehouseing appliance and an advanced analytics engine used for business intelligence, predictive analysis, and business planning.
A data warehouse appliance is one which has a specific functionality, but performs better than other multipurpose tools. As a data warehouse appliance, Netezza integrates the server, database, storage, software, and hardware within a single unit.
Netezza has asymmetric massively parallel processing architecture which enables it to have more functionality (such as filtering, aggregating, etc…) closer to the data. This enables it to reduce the burden of data warehousing tools such as DataStage, AbInitio, Informatica etc.
Key Components of Netezza
- NPS Host
- Snippet Processing Units
- Snippet Processing Arrays
- Storage Arrays
The NPS Host consists of 2 Linux servers configured as Active-Passive technology. If the first server fails, the second will bootup automatically. The host server sends the query to the snippet processing units (SPU) for the query execution. A combination of SPU’s are called a Snippet Processing Array (SPA). Each SPU has its own memory and CPU which is made of the Field Programmable Gate Array (FPGA). Data is stored in the storage array. The storage array is a collection of disks. Netezza’s replication of data across the storage array enables it to have 99.99% availability of data and results in no loss of data.
The Netezza architecture will allow any component failure to not result in a system shutdown, and in most of the cases recovery is automatic. If the SPU fails for any reason, you can hot swap the SPU.
Advantages of Netezza
Why it is better than other databases?
- No index concept
- Less administrative tasks
- Less maintenance
- High throughput
- Complex queries can be performed within database. This will reduce data flow between the application server and the database server
- Automatic failure recovery
- All units are hot swappable, making the system highly available
- The zone map concept will enable faster processing of data
- Compressed data storage and transfer will increase the transfer rate between the storage area and processing units
- Dual power supply will make the system stable during power outages
Common Administration Tasks
- Creating projects
- Creating users and groups
- Loading and unloading data
- Monitoring SPU failures
- Notifying the application developers on poorly distributed tables
- Taking incremental and full backups
- Cleaning up resources
- Prioritizing resources
Identify long running queries using the following SQL:
SELECT (extract( hour from age(qh_tend,qh_tsubmit)) * 3600 )
+ (extract( minute from age(qh_tend,qh_tsubmit)) * 60)
+ extract( second from age(qh_tend,qh_tsubmit)) as runsecond,
qh_sessionid, qh_planid, qh_clientid, qh_cliipaddr, qh_database, qh_user, qh_tsubmit, qh_tstart, qh_tend, qh_priority, qh_pritxt, qh_estcost,qh_sql
qh_tend between ’12/31/2012 05:30:00′ and ’12/31/2012 08:30:00′ — Modify to your recent date range
and runsecond > 100
order by runsecond;
Procedure for tuning the query:
- Stage the data before loading. Otherwise the Netezza connection will be open as long as the data is being prepared.
- Distribute the data using keys which have a wide range of values.
- Possibly distribute the data on numeric columns.
- Limit the maximum number of distribution keys.
- Try to make sure your join keys are used in the distribution list.
- Try not to delete a large amount of data in the database. Rather, try to truncate the tables.