Teradata Load and Unload Utilities

What is Teradata?

Today’s growing data warehouses demand fast, reliable tools to help acquire and manage data and flexibility to load large volumes of data from database perspective with any source at any time. Challenges come from everywhere: more data sources, growing data volumes, dynamically changing business requirements, and user demands for newer data. Teradata data warehousing systems offer the following:

  • Quick access to data for more timely decision making
  • Solutions for the entire spectrum of load requirements from batch to near-real time
  • Unmatched scalability for large-volume loads
  • Fail-proof loads with checkpoint restart capabilities
  • Proven technology from the data warehouse technology leader
  • Integration with industry-leading ETL and ELT tools (IBM DataStage)

These ETL tools provide a wide array of features that are efficient, scalable, and flexible – allowing you to centralize data from disparate sources seamlessly into the Teradata system. They are fully parallel to provide optimal and scalable performance for getting data in and out of your Teradata database. In addition, these utilities allow you to import and export data to and from host-based and client-resident data sources, including mainframe host databases, enterprise server databases, or departmental data marts.

Primary Components of Loading Teradata

  1. TeraGSS (Teradata Global Security Services) is a security interface – for user authentication in particular.
  2. TDICU is a Teradata shared ICU library which deals with various aspects of ‘national language’ formatting and character set support.
  3. Teradata Call Level Interface (CLIv2) provides an API for sending SQL requests to the Teradata Database.
  4. Teradata Data Connector also known as PIOM reads and writes flat files to provide the input/output interface for access modules.
  5. BTEQ is a general-purpose, command-based tool that enables you to communicate with one or more Teradata Databases. BTEQ provides an interactive or batch interface that allows you to submit SQL statements, import and export data, and generate reports. BTEQ provides report-formatting options where queried data are formatted and returned to the screen, file or printer. Scripts are sent from the client to the Teradata Database where the actual process occurs. You can also use BTEQ to import and export data between your client and database tables. BTEQ’s streamed package format design ensures fast processing and effective resource utilization.
  6. ODBC driver delivers open standards-based access to the Teradata Database from many popular client applications and tools. By developing applications using the standard ODBC API, users can maximize the application portability across platforms and databases and reduce development and maintenance costs.
  7. FastExport is a high-speed utility that quickly exports large data sets from Teradata tables or views to a client system for processing, generating large reports or loading data into a smaller database. Like all Teradata load utilities, FastExport can export data to both channel- or network-attached client systems. FastExport also provides session control and data-handling specifications for the data transfer operations.
  8. FastLoad is a highly reliable, parallel-load utility designed to move large volumes of data — collected from data sources on channel and network-attached clients — into empty tables in the Teradata Database. Execute Teradata FastLoad from any client platform, mainframe or load server — for automatic parallel data allocation, conversion, movement and loading that achieves higher performance levels.
  9. Multiload is a time-tested, highly reliable, parallel-load utility used to create and maintain Teradata Databases. MultiLoad optimizes operations that rapidly acquire, process and apply data to tables in a Teradata Database. For data maintenance, MultiLoad updates, inserts, upserts and deletes large volumes of data in empty or populated tables and works at the data block level for higher performance. MultiLoad runs on a variety of client platforms, operates in a fail-safe mode and is fully recoverable.
  10. Tpump is a highly parallel utility designed to continuously move data from data sources into Teradata tables without locking the affected table. TPump provides near-real-time data into your data warehouse, allowing you to maintain fresh, accurate data for up-to-the-moment decision making. Use TPump to insert, update, upsert and delete data in the Teradata Database, particularly for environments where batch windows are shrinking and warehouse maintenance overlaps normal working hours. And because TPump uses row hash locks, users can run queries even while it’s updating your Teradata system.
  11. SQLPP is a Teradata C Preprocessor2 package. Teradata C Preprocessor2 provides an application programmer with a simplified method of reading and changing information stored in a Teradata Database by interpreting and expanding SQL statements that have been incorporated in an application program.
  12. Teradata Parallel Transporter was designed for increased functionality and customer ease of use for faster, easier and deeper integration. The new Teradata Connector Stage in DataStage invokes this Teradata Utility for the database operations. An enterprise with multiple business silos works better when the units function harmoniously as a whole. Likewise, separate data warehouse tools are more productive when they are combined within the same infrastructure — when they speak the same language and communicate as one. Teradata Parallel Transporter is one example of products working together within an active data warehouse. This new-generation product simplifies the data loading process by running the protocols used by each of the Teradata Load and Unload Utilities as modules or operators: load, update, export and stream. Unlike conventional utilities and products in which multiple data sources are usually processed in a serial manner, Teradata Parallel Transporter can access multiple data sources in parallel. This ability can lead to increased throughput. Teradata Parallel Transporter also allows different specifications for different data sources and, if their data is UNION-compatible, merges them together. Teradata Parallel Transporter was designed for increased functionality and customer ease of use for faster, easier and deeper integration. The capabilities include:
  • Simplified data transfer between one Teradata Database and another; only one script is required to export from the production-and-load test system.
  • Ability to load dozens of files using a single script makes development and maintenance of the data warehouse easier.
  • Distribution of workloads across CPUs on the load server eliminates bottlenecks in the data load process.
  • Data flows through multiple instances of UPDATE OPERATOR and in-memory data streams to update tables.
  • Option is available to export data to in-memory data stream instead of landing data.
  • The open database connectivity (ODBC) operator reads from the ODBC driver, which could pull data from any database; for example, DB2 or Oracle.
  • Accessibility to myriad data sources via open standards is possible.
  • Multiple operators can scan directories for files to load and can combine the data in the in-memory data stream with UNION ALL operation and stream operator loads.
  • Script-building wizard is available to aid first-time users.

Primary Components of Teradata Parallel Transporter

  • Load Operator, using the FastLoad protocol, is a parallel load utility designed to move large volumes of data collected from data sources on channel- and network-attached clients into empty tables in the Teradata Database.
  • Update Operator optimizes batch operations that rapidly acquire, process and apply data to tables in a Teradata Database by using the MultiLoad protocol. For data maintenance, MultiLoad updates, inserts and upserts large volumes of data into empty or populated tables and/or deletes data from the tables. MultiLoad works at the data block level, providing an alternative to insert/select operations that touch a significant portion of the target table. A single MultiLoad job can maintain up to five Teradata tables by extracting large volumes of data, locking the destination tables and then loading data using block level updates.
  • Export Operator exports large data sets from Teradata tables or views and imports the data to a client’s system for processing or generating large reports or for loading data into a smaller database. It uses the FastExport protocol. Like all Teradata load utilities, FastExport can export data to channel- or network-attached client systems. FastExport also provides session control and data handling specifications for the data transfer operations.
  • Stream uses the SQL-based TPump protocol, not block-based, for continuous loading that is designed to move data from data sources into Teradata tables without locking the affected table. TPump loads near- real-time data into the data warehouse, allowing users to maintain fresh, accurate data for up-to-the-minute decision making. TPump can be used to insert, update, upsert and delete data in the Teradata Database, particularly for environments where batch windows are shrinking and data warehouse maintenance overlaps normal working hours. Because the TPump protocol uses row hash locks, users can run queries even while they are updating the Teradata system.

References Used:

 Link 1, Link 2, Link 3, Link 4, Link 5, Link 6, and Link 7