2014/08/08

about hash index

http://forums.teradata.com/forum/database/hash-index

http://forums.teradata.com/forum/general/difference-between-single-table-join-index-nusi-and-hash-index

http://teradatardbms.blogspot.jp/2011/03/hash-index-examples.html

17 Nov 2006

A hash index is simply a restricted form of join index. It was introduced mainly to satisfy some rules in the TPC-H benchmark specification that did not allow regular join indexes.The syntax for creating a hash index is simpler than for creating an equivalent join index. As far as I know, that is the only reason one might prefer to use a hash index instead of a join index. The underlying technology is the same. There should be no difference in performance between a hash index and an equivalent join index.

Teradata Diagnostic Commands

DIAGNOSTIC HELPSTATS ON FOR SESSION;
DIAGNOSTIC COSTPRINT ON FOR SESSION;
DIAGNOSTIC OLDREWRITES ON FOR SESSION;
DIAGNOSTIC DUMP COSTS ON FOR SESSION;
DIAGNOSTIC HELP COSTS ON FOR SESSION;
DIAGNOSTIC SET COSTS ON FOR SESSION;
DIAGNOSTIC HELP PROFILE ON FOR SESSION;
DIAGNOSTIC SET PROFILE ON FOR SESSION;
DIAGNOSTIC DUMP SAMPLES ON FOR SESSION;
DIAGNOSTIC HELP SAMPLES ON FOR SESSION;
DIAGNOSTIC SET SAMPLES ON FOR SESSION;
DIAGNOSTIC DUMPCACHE ON FOR SESSION
DIAGNOSTIC JOINPLAN ON FOR SESSION;
DIAGNOSTIC NOJIND ON FOR SESSION;
DIAGNOSTIC PSTEPS ON FOR SESSION;
DIAGNOSTIC REQUEST ON FOR SESSION;
DIAGNOSTIC SPOIL ON FOR SESSION;
DIAGNOSTIC STEPS ON FOR SESSION;
DIAGNOSTIC USESTATS ON FOR SESSION;
DIAGNOSTIC VALIDATEINDEX ON FOR SESSION;
DIAGNOSTIC WHITE ON FOR SESSION;

2014/06/04

count(*) performance improved since teradata 14

since teradata 14, count(*) without where condition was computed as cylinder level

Explain sel count(*)  from SV_RY_CAMP_KIN
  1) First, we lock a distinct DEV_TRVL_RAW."pseudo table" for read on
     a RowHash to prevent global deadlock for
     DEV_TRVL_RAW.SV_RY_CAMP_KIN.
  2) Next, we lock DEV_TRVL_RAW.SV_RY_CAMP_KIN for read.
  3) We do an all-AMPs SUM step to aggregate from
     DEV_TRVL_RAW.SV_RY_CAMP_KIN by way of a cylinder index scan with
     no residual conditions.  Aggregate Intermediate Results are
     computed globally, then placed in Spool 3.  The size of Spool 3 is
     estimated with high confidence to be 1 row (23 bytes).  The
     estimated time for this step is 2.83 seconds.
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with high
     confidence to be 1 row (25 bytes).  The estimated time for this
     step is 0.01 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 2.83 seconds.

Explain sel count(*)  from SV_RY_CAMP_KIN   where HIZUKE >= cast(date'2014-05-01'   as timestamp) ;
  1) First, we lock a distinct DEV_TRVL_RAW."pseudo table" for read on
     a RowHash to prevent global deadlock for
     DEV_TRVL_RAW.SV_RY_CAMP_KIN.
  2) Next, we lock DEV_TRVL_RAW.SV_RY_CAMP_KIN for read.
  3) We do an all-AMPs SUM step to aggregate from
     DEV_TRVL_RAW.SV_RY_CAMP_KIN by way of an all-rows scan with a
     condition of ("DEV_TRVL_RAW.SV_RY_CAMP_KIN.HIZUKE >= TIMESTAMP
     '2014-05-01 00:00:00.000000'").  Aggregate Intermediate Results
     are computed globally, then placed in Spool 3.  The size of Spool
     3 is estimated with high confidence to be 1 row (23 bytes).  The
     estimated time for this step is 16.23 seconds.
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with high
     confidence to be 1 row (25 bytes).  The estimated time for this
     step is 0.01 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 16.24 seconds.

in older version, it was done by all row scan

Explain sel count(*)  from SV_RY_CAMP_KIN
  1) First, we lock a distinct DWHRUN."pseudo table" for read on a
     RowHash to prevent global deadlock for DWHRUN.SV_RY_CAMP_KIN.
  2) Next, we lock DWHRUN.SV_RY_CAMP_KIN for read.
  3) We do an all-AMPs SUM step to aggregate from DWHRUN.SV_RY_CAMP_KIN
     by way of an all-rows scan with no residual conditions.  Aggregate
     Intermediate Results are computed globally, then placed in Spool 3.
     The input table will not be cached in memory, but it is eligible
     for synchronized scanning.  The size of Spool 3 is estimated with
     high confidence to be 1 row (23 bytes).  The estimated time for
     this step is 3 minutes and 57 seconds.
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with high
     confidence to be 1 row (25 bytes).  The estimated time for this
     step is 0.00 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 3 minutes and 57 seconds.

2014/05/19

table name limits changed since teradata 14.10

since teradata 14.10, the table name limits was changed from 30 bytes to 128 bytes
that's maybe a good news , but DO NOT FORGET that when we refernce the DBC.TABLES view,
the tablename column still show the table name in 30 bytes. you should use DBC.Tablesv to view the "REAL" 128 bytes table name.

so when you reference dbc.tables and you find that two or more table under the same name in the same database, do not confused, use help database or DBC.Tablesv you will found that they were cut off by the 30 bytes and loooks like "the same name"

2013/05/02

SQL UDF support since Teradata 13.10

here is a good news for the guys who's poor at c or java, like me

at past, when talked about UDF in Teradata , the only choice is C or Java.

since Teradata 13.10, a new feature is supported:SQL UDF

sample:


CREATE FUNCTION test.MyUDF2 (d INT, e INT, f INT)
RETURNS INT
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC
COLLATION INVOKER
INLINE TYPE 1
RETURN d + e + f;


2013/03/26

Conversion of Oracle function to its equivalent Teradata function

original link here:
http://www.javaorator.com/oracle/tutorial/conversion-of-oracle-function-to-its-equivalent-teradata-function-152.code

http://blog.csdn.net/fanxiaoliang83/article/details/4402378

Oracle FunctionsTeradata Functions
 DECODE(A,NULL,B,C) CASE WHEN A IS NULL THEN B ELSE C END
 DECODE(A,B,C,D) CASE A WHEN B THEN C ELSE D END
 GREATEST(A,B) CASE WHEN A >= B THEN A ELSE B END
 INSTR(A,B) INDEX(A,B)
 LEAST(A,B) CASE WHEN A <= B THEN A ELSE B END
 LENGTH(A) CHAR_LENGTH(A)
 LPAD(A,B) CAST(A AS CHAR(B))
 LTRIM(A,B) TRIM(LEADING B FROM A)
 MONTHS_BETWEEN(A,B) ABS(CAST((A - B MONTH) AS INTEGER))
 RTRIM(A,B) TRIM(TRAILING B FROM A)
 TO_CHAR(ANUMBER,B) CAST((ANUMBER (FORMAT B)) AS CHAR(X))
 NVL(A,B) COALESCE(A,B)
 TO_CHAR(ADATE,B) CAST((ADATE (FORMAT B)) AS CHAR(X))
 TO_CHAR(ADATE,'MM_YY') (ADATE (FORMAT'MM')) || '_' || (ADATE (FORMAT'YY'))
 TO_CHAR(X[,F]) CAST(X AS CHAR [FORMAT F])
 TRUNC(T[,F]) CAST(D AS DATE [FORMAT F])
 TO_NUM(X[,F]) CAST(X AS NUMERIC [FORMAT F])
 TO_DATE(X[,F]) CAST(X AS DATE [FORMAT F])
 TO_NUMBER(X[,F]) CAST(X AS NUMERIC [FORMAT F])
 SIGN(A) CASE A WHEN > 0 THEN 1 WHEN < 0 THEN -1 ELSE 0 END

2013/02/27

call stored procedure in stored procedure

CALL DBC.SYSEXECSQL('CALL PROC_A(''VALUE'',OUTPUT)'); http://blog.emergingbytes.com/2012/02/09/teradata-calling-a-procedure-within-a-procedure/

2013/02/22

utility to view Table Lock

it seems that the only way to view Table lock in Teradata is LOKDISP we can not view the lock info by any SQL.

Geospatial Data in Teradata

Teradata begin to support Geospatial Data since Teradata 13
here is some Interesting way to use it for BI analysis https://docs.google.com/file/d/0B7i7b3i-AZGoUnd0N09ISGxhYnM/preview

2013/02/21

USI ON PI in PPI table

normally, create a SI on PI will cause a error:


create table tbl1
( col1 char(10),
  col2 char(10),
  col3 char(10)
  )
  primary index(col1, col2)
  index(col1,col2)

CREATE TABLE Failed. 3519:  Two indexes with the same columns. 

here is some special circumstances:


create table tbl1
( col1 char(10),
  col2 DATE
  )
primary index(col1)
PARTITION BY RANGE_N(col2 BETWEEN DATE '2000-01-01' AND DATE '2020-12-31' EACH INTERVAL '1' DAY)
unique index(col1)

the reason is performance.

Teradata restart

shutdown Teradata Database: tpareset -x
check state:pdestate -a
poweroff the node: shutdown
after shutoff the node, power off all disk array
to start Teradata, check disk array:psh verify_pdisks
start database:/etc/init.d/tpa start
check state:pdestate -a

2013/01/30

Teradata Archive and Recovery Utility

yes, finally , archive strategy of our Teradata machine was made.
we will perform a full volumn database archive per week and perform a incremental archive everyday.
based on this archive strategy, In case of emergency, All DB can be restored to the last week status , Specified Key DB can be restored to the last night or current status.

https://docs.google.com/file/d/0B7i7b3i-AZGoemotSGNFbmY3T28/preview">https://docs.google.com/file/d/0B7i7b3i-AZGoemotSGNFbmY3T28/preview

2013/01/28

install,replace,remove jar on teradata


call sqlj.install_jar('CJ!/home2/tableudf/hdfsudf.jar', 'hdfsudf', 0); 
call sqlj.replace_jar('CJ!/home2/tableudf/hdfsudf.jar', 'hdfsudf'); 
call sqlj.remove_jar('hdfsudf',0); 

of course, to run the above content, appropriate accsess rights were needed

installed Jar will be shown in dbc.tables as tablekind 'D'

teradata's procedure really sucks


Abort Sessions in teradata

i used to kill sessions in teradata by pmon

there is another way : by  udf  SYSLIB.AbortSessions()


from

http://dadawarehousingwithtd.blogspot.com/2012/10/kill-session-in-teradata.html
http://www.teradataforum.com/teradata/20111122_082726.htm



1. Find the session info as:

select * from dbc.SessionInfo;

2. Execute abortsession function:                

SELECT SYSLIB.AbortSessions(1,'MyUser',0,'Y','Y');

Param1: Hostno
Param2:UserName
Param3: SessionNo
Param4: LogoffSessions
Param5: UserOverride


Where 0 in Param3 means all session of user will be aborted,you can specify sessionno to kill a specific session.
Param4=Y means logoff the session first.

drop database in teradata


we can clear a database in teradata by a "delete" easily, not have to delete all tables and views one by one.

eg.
delete database db01;
drop database db01;


as we know we can't drop a database with the any objects under it , eg table, view , trigger and so on. we need to clear a database before we drop it.

so we can delete all object under a database by run the following sql's result


sel 'drop trigger ' || trim(databasename) || '.' || trim(tablename) || ';' from dbc.tables  where databasename = 'hst_db' and tablekind = 'G'
union all
sel 'drop view ' || trim(databasename) || '.' || trim(tablename) || ';' from dbc.tables  where databasename = 'hst_db' and tablekind = 'V'
union all
sel 'drop procedure ' || trim(databasename) || '.' || trim(tablename) || ';' from dbc.tables  where databasename = 'hst_db' and tablekind = 'P'
union all
sel 'drop table ' || trim(databasename) || '.' || trim(tablename) || ';' from dbc.tables  where databasename = 'hst_db' and tablekind = 'T'