2011/12/30

NULLIF Function

as we know, Teradata provides the well-known method NULLIFZERO.
Teradata also privides us another method named 'NULLIF', that's more powerfull, more flexible,
can be used to replace some query like
CASE WHEN COL1 = VALUE1 THEN NULL ELSE COL1
simply to
NULLIF(COL1, VALUE1)

http://forums.teradata.com/forum/database/coalesce


与广为人知的NULLIFZERO相比,TERADATA提供的NULLIF功能就不那么常用了,其实NULLIF功能更强大,更灵活,某些情况下能被用来简化,替换掉一些CASE语句
如上

USE SQL ASSISTANT TO CREATE A TABLE WITH A TIMESTAMP COLUMN BUT THE COLUMN WAS CONVERTED TO CHAR(23)

we found that when we create a table using SQL assistant, and define the column as

HNKO_TIME TIMESTAMP(0)

the create table statement was successd but when we show the created table we found that it was defined in

HNKO_TIME CHAR(23) CHARACTER SET UNICODE NOT CASESPECIFIC,

thanks to the Teradata forums
http://forums.teradata.com/forum/database/cast-string-to-time
http://forums.teradata.com/forum/enterprise/timestamp-interval-not-working-in-batch-jcl-but-works-in-sql-assistant

and now we know that it was because of the  "allow use of odbc extensions" option in sql assistant
unchecked it ,and problem  is resolved

it seems that that if the option is checked, odbc driver will parse the sql query.
any way, the most important thing is , create table ,and never forget to check the definition of the table created . make sure what you created is what you wanted


最近发现,在某台机器上,通过SQL Assistant提交的DDL,如果包含有TIMESTAMP定义的字段,会被置换为CHAR(23),这显然不正确。
罪魁祸首是老旧版本的SQL Assistant中的设置项:allow use of odbc extensions。
如果选中此选项,ODBC驱动看起来会将某些查询解析掉,譬如说DDL中的TIMESTAMP定义。

2011/12/26

DUPLICATE ROW PROBLEM WHEN INSERT SELECT INTO A SET TABLE WITH UPI

IT SEEMS THAT ,WHEN DEAL WITH DUPLICATE ROW INTO A SET TABLE WITH UPI,
TERADATA GOT SOME DIFFERENCE BETWEEN MANUAL AND THE ACTUAL.

WE KNOW THAT WHEN DEFINE A SET TABLE WITH UPI, TERADATA WILL DO A UPI
CHECK AND IGNORE SET TABLE CHECK(DUPLICATE ROW CHECK).


AND TERADATA ALSO DID IT WHEN WE INSERT INTO A SET-UPI TABLE.

BUT THING GOES DIFFERENT WHEN WE DO A INSERT SELECT


TERADATA WILL NOT SHOW ANY ERROR MESSAGE BUT SIMPLY GET A SUCCESS.
OF COURSE, THE DUPLICATE ROW WILL NOT APPEAR IN THE TARGET TABLE

THAT REALLY CONFUSED ME.

I SEARCHED THE INTERNET AND FOUND SOME SIMILAR PROBLEM

http://carlosal.wordpress.com/2009/02/16/tablas-set-y-multiset-en-teradata/



按照教科书的说法,TERADATA在处理定义有UPI的SET表时,会用UPI的唯一性检查替换掉SET表的重复行检查。
(参见Teradata Factory)
但是事实上并非完全如此,至少在某些情况下,比如说INSERT SELECT
在直接执行INSERT时,对于UPI的SET表,如果insert重复行,会提示有UPI重复错误。
但是如果在INSERT SELECT中,有时候不会有任何错误发生,Teradata会默默地去掉重复行,只插入非重复的部分
而不会返回任何错误
即使有教科书,有时候也挺让人迷惑的


UPDATE 2012/6/26-------------------------------------------------------
i found the answer from the Teradata ApplicationDesignand Development manual
they said that an insert-select automatically discards duplicate rows with out any notification. even if the target table has a UPI on it
To trap this we create a multi-set table with a UPI on it. 

PERFORMANCE IMPACT WHEN TERADATA DEAL WITH "NOT IN" CLAUSE

THERE IS A NOTABLE PERFORMANCE IMPACT WHEN TERADATA DEAL WITH "NOT IN" CLAUSE 
WITHOUT NOT NULL PROPERTY SPECIFIED ON THE COLUMN


HERE IS THE  SQL SCRIPT AND explain IN TERDATA 13:

explain SELECT
       BOT_CD
      ,KAISYA_CD
      ,TKS_CD
      ,LOCA_CD
      ,COL_NO
      ,NHIN_BSHO_CD
      ,MAC_CD
      ,SHO_CD
      ,JSC_SHO_CD
      ,JSC_SHO_CD_EDA
      ,HEN_ZAIKO_SU
      ,KINAI_ZAIKO_SU
      ,HOTCOLD_FLG
FROM
    cvdtst.S_KINAIZAIKO_COL1221OU/* S_KINAIZAIKO_COL */
WHERE
    (
        BOT_CD
        ,KAISYA_CD
        ,TKS_CD
        ,LOCA_CD
        ,NHIN_BSHO_CD
        ,SHO_CD
        ,HOTCOLD_FLG
    )
    NOT IN
    (
        SELECT
            BOT_CD
            ,KAISYA_CD
            ,TKS_CD
            ,LOCA_CD
            ,NHIN_BSHO_CD
            ,SHO_CD
            ,BUNRUI_CD
        FROM
            cvdtst.S_SLIP1_SHO1221OU /* S_SLIP1_SHO */
        WHERE
            SURYOU > 0
    )
;

  1) First, we lock a distinct cvdtst."pseudo table" for read on a
     RowHash to prevent global deadlock for cvdtst.S_SLIP1_SHO1221OU.
  2) Next, we lock a distinct cvdtst."pseudo table" for read on a
     RowHash to prevent global deadlock for
     cvdtst.S_KINAIZAIKO_COL1221OU.
  3) We lock cvdtst.S_SLIP1_SHO1221OU for read, and we lock
     cvdtst.S_KINAIZAIKO_COL1221OU for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from cvdtst.S_SLIP1_SHO1221OU
          by way of an all-rows scan with a condition of (
          "cvdtst.S_SLIP1_SHO1221OU.SURYOU > 0.0") into Spool 7
          (all_amps), which is redistributed by the hash code of (
          cvdtst.S_SLIP1_SHO1221OU.BUNRUI_CD,
          cvdtst.S_SLIP1_SHO1221OU.SHO_CD,
          cvdtst.S_SLIP1_SHO1221OU.NHIN_BSHO_CD,
          cvdtst.S_SLIP1_SHO1221OU.LOCA_CD,
          cvdtst.S_SLIP1_SHO1221OU.TKS_CD,
          cvdtst.S_SLIP1_SHO1221OU.KAISYA_CD,
          cvdtst.S_SLIP1_SHO1221OU.BOT_CD) to all AMPs.  Then we do a
          SORT to order Spool 7 by row hash and the sort key in spool
          field1 eliminating duplicate rows.  The size of Spool 7 is
          estimated with no confidence to be 11,078,203 rows (
          1,074,585,691 bytes).  The estimated time for this step is
          7.68 seconds.
       2) We do an all-AMPs SUM step to aggregate from
          cvdtst.S_KINAIZAIKO_COL1221OU by way of an all-rows scan with
          no residual conditions.  Aggregate Intermediate Results are
          computed globally, then placed in Spool 4.
  5) We do an all-AMPs SUM step to aggregate from Spool 7 by way of an
     all-rows scan.  Aggregate Intermediate Results are computed
     globally, then placed in Spool 8.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by
          way of an all-rows scan into Spool 2 (all_amps), which is
          duplicated on all AMPs.
       2) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by
          way of an all-rows scan into Spool 3 (all_amps), which is
          duplicated on all AMPs.
  7) We do an all-AMPs RETRIEVE step from cvdtst.S_KINAIZAIKO_COL1221OU
     by way of an all-rows scan with no residual conditions into Spool
     6 (all_amps), which is redistributed by the hash code of (
     cvdtst.S_KINAIZAIKO_COL1221OU.BOT_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.KAISYA_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.TKS_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.LOCA_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.NHIN_BSHO_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.SHO_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.HOTCOLD_FLG) to all AMPs.  Then we
     do a SORT to order Spool 6 by row hash, and null value information
     in Spool 3 and Spool 2.  Skip this retrieve step if null exists.
     The size of Spool 6 is estimated with low confidence to be
     3,257,496 rows (218,252,232 bytes).  The estimated time for this
     step is 1.91 seconds.
  8) We execute the following steps in parallel.
       1) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of
          an all-rows scan, which is joined to Spool 7 by way of an
          all-rows scan.  Spool 6 and Spool 7 are joined using an
          exclusion merge join, with a join condition of ("(BOT_CD =
          BOT_CD) AND ((KAISYA_CD = KAISYA_CD) AND ((TKS_CD = TKS_CD)
          AND ((LOCA_CD = LOCA_CD) AND ((NHIN_BSHO_CD = NHIN_BSHO_CD)
          AND ((SHO_CD = SHO_CD) AND (HOTCOLD_FLG = BUNRUI_CD ))))))"),
          and null value information in Spool 3 and Spool 2.  Skip this
          join step if null exists.  The result goes into Spool 1
          (group_amps), which is built locally on the AMPs.  The size
          of Spool 1 is estimated with no confidence to be 3,257,496
          rows (244,312,200 bytes).  The estimated time for this step
          is 1.02 seconds.
       2) We do an all-AMPs RETRIEVE step from
          cvdtst.S_KINAIZAIKO_COL1221OU by way of an all-rows scan with
          no residual conditions into Spool 10 (all_amps), which is
          redistributed by the hash code of (
          cvdtst.S_KINAIZAIKO_COL1221OU.BOT_CD) to all AMPs.  Then we
          do a SORT to order Spool 10 by row hash, and null value
          information in Spool 3 and Spool 2.  Skip this retrieve step
          if there is no null.  The size of Spool 10 is estimated with
          low confidence to be 3,257,496 rows (218,252,232 bytes).  The
          estimated time for this step is 1.91 seconds.
  9) We do an all-AMPs RETRIEVE step from Spool 7 (Last Use) by way of
     an all-rows scan into Spool 11 (all_amps), which is redistributed
     by the hash code of (cvdtst.S_SLIP1_SHO1221OU.BOT_CD) to all AMPs.
     Then we do a SORT to order Spool 11 by row hash, and null value
     information in Spool 3 and Spool 2.  Skip this retrieve step if
     there is no null.  The size of Spool 11 is estimated with no
     confidence to be 11,078,203 rows (1,074,585,691 bytes).  The
     estimated time for this step is 7.68 seconds.
 10) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of an
     all-rows scan, which is joined to Spool 11 (Last Use) by way of an
     all-rows scan.  Spool 10 and Spool 11 are joined using an
     exclusion merge join, with a join condition of ("(BOT_CD = BOT_CD)
     AND ((KAISYA_CD = KAISYA_CD) AND ((TKS_CD = TKS_CD) AND ((LOCA_CD
     = LOCA_CD) AND ((NHIN_BSHO_CD = NHIN_BSHO_CD) AND ((SHO_CD =
     SHO_CD) AND (HOTCOLD_FLG = BUNRUI_CD ))))))"), and null value
     information in Spool 3 (Last Use) and Spool 2 (Last Use).  Skip
     this join step if there is no null.  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with no confidence to be 3,257,496 rows (
     244,312,200 bytes).  The estimated time for this step is 1.02
     seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.

TERADATA HAVE TO CONFIRM IF NULL VALUE EXISTS ON EVERY "NOT IN" COLUMN,
AND IT IS VERY COSTLY AND WILL GREATLY IMPACT THE SYSTEM'S PERFORMANCE.
WE CAN AVOID IT BY SPECIFIING "NOT NULL" PROPERTY ON THE "NOT IN" COLUMN


HERE IS A ANOTHER SAMPLE IN JAPANESE
http://d.hatena.ne.jp/tgk/20100913




-------------------------------------------------------------------------------
THE FOLLOWING IS SQL AND EXPLAIN IN TERADATA V2R6


explain SELECT
       BOT_CD
      ,KAISYA_CD
      ,LOCA_CD
      ,COL_NO
      ,MAC_CD
      ,SHO_CD
      ,HEN_ZAIKO_SU
      ,KINAI_ZAIKO_SU
      ,HOTCOLD_FLG
FROM
S_KINAIZAIKO_COL/* S_KINAIZAIKO_COL */
WHERE
    (
        BOT_CD
        ,KAISYA_CD
        ,LOCA_CD
        ,MAC_CD
        ,SHO_CD
        ,HOTCOLD_FLG
    )
    NOT IN
    (
        SELECT
            BOT_CD
            ,KAISYA_CD
            ,LOCA_CD
            ,MAC_CD
            ,SHO_CD
            ,BUNRUI_CD
        FROM
       S_SLIP1_SHO/* S_SLIP1_SHO */
        WHERE
            SURYOU > 0
    )



  1) First, we lock a distinct HWKRUN34."pseudo table" for read on a
     RowHash to prevent global deadlock for HWKRUN34.S_SLIP1_SHO.
  2) Next, we lock a distinct HWKRUN34."pseudo table" for read on a
     RowHash to prevent global deadlock for HWKRUN34.S_KINAIZAIKO_COL.
  3) We lock HWKRUN34.S_SLIP1_SHO for read, and we lock
     HWKRUN34.S_KINAIZAIKO_COL for read.
  4) We do an all-AMPs SUM step to aggregate from
     HWKRUN34.S_KINAIZAIKO_COL by way of an all-rows scan with no
     residual conditions.  Aggregate Intermediate Results are computed
     globally, then placed in Spool 3.
  5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 2 (all_amps) (compressed columns
     allowed), which is duplicated on all AMPs.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from
          HWKRUN34.S_KINAIZAIKO_COL by way of an all-rows scan with no
          residual conditions into Spool 5 (all_amps) (compressed
          columns allowed), which is redistributed by hash code to all
          AMPs.  Then we do a SORT to order Spool 5 by row hash, and
          null value information in Spool 2.  Skip this retrieve step
          if null exists.  The size of Spool 5 is estimated with low
          confidence to be 3,663,600 rows.  The estimated time for this
          step is 4.55 seconds.
       2) We do an all-AMPs RETRIEVE step from HWKRUN34.S_SLIP1_SHO by
          way of an all-rows scan with a condition of (
          "HWKRUN34.S_SLIP1_SHO.SURYOU > 0.0") into Spool 6 (all_amps),
          which is redistributed by hash code to all AMPs.  Then we do
          a SORT to order Spool 6 by row hash and the sort key in spool
          field1 eliminating duplicate rows.  The input table will not
          be cached in memory, but it is eligible for synchronized
          scanning.  The size of Spool 6 is estimated with no
          confidence to be 11,532,920 rows.  The estimated time for
          this step is 18.35 seconds.
  7) We execute the following steps in parallel.
       1) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of
          an all-rows scan, which is joined to Spool 6 by way of an
          all-rows scan.  Spool 5 and Spool 6 are joined using an
          exclusion merge join, with a join condition of ("(BOT_CD =
          BOT_CD) AND ((KAISYA_CD = KAISYA_CD) AND ((LOCA_CD = LOCA_CD)
          AND ((MAC_CD = MAC_CD) AND ((SHO_CD = SHO_CD) AND
          (HOTCOLD_FLG = BUNRUI_CD )))))"), and null value information
          in Spool 2.  Skip this join step if null exists.  The result
          goes into Spool 1 (group_amps), which is built locally on the
          AMPs.  The size of Spool 1 is estimated with index join
          confidence to be 3,663,600 rows.  The estimated time for this
          step is 1.47 seconds.
       2) We do an all-AMPs RETRIEVE step from
          HWKRUN34.S_KINAIZAIKO_COL by way of an all-rows scan with no
          residual conditions into Spool 7 (all_amps) (compressed
          columns allowed), which is redistributed by hash code to all
          AMPs.  Then we do a SORT to order Spool 7 by row hash, and
          null value information in Spool 2.  Skip this retrieve step
          if there is no null.  The size of Spool 7 is estimated with
          low confidence to be 3,663,600 rows.  The estimated time for
          this step is 4.55 seconds.
  8) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of
     an all-rows scan into Spool 8 (all_amps) (compressed columns
     allowed), which is redistributed by hash code to all AMPs.  Then
     we do a SORT to order Spool 8 by row hash, and null value
     information in Spool 2.  Skip this retrieve step if there is no
     null.  The size of Spool 8 is estimated with no confidence to be
     11,532,920 rows.  The estimated time for this step is 18.35
     seconds.
  9) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an
     all-rows scan, which is joined to Spool 8 (Last Use) by way of an
     all-rows scan.  Spool 7 and Spool 8 are joined using an exclusion
     merge join, with a join condition of ("(BOT_CD = BOT_CD) AND
     ((KAISYA_CD = KAISYA_CD) AND ((LOCA_CD = LOCA_CD) AND ((MAC_CD =
     MAC_CD) AND ((SHO_CD = SHO_CD) AND (HOTCOLD_FLG = BUNRUI_CD )))))"),
     and null value information in Spool 2 (Last Use).  Skip this join
     step if there is no null.  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with index join confidence to be 3,663,600
     rows.  The estimated time for this step is 1.47 seconds.
 10) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.



Teradata13在处理NOT IN查询的时候,如果NOT IN字段没有NOT NULL定义的话,TERADATA会确认每一个字段是否为NULL值,
这会导致数次FTS,极大的影响性能。解决方式是,给NOT IN项目定义NOT IN,或者把NOT IN改为左连接

不过在老版本的Teradata中(比如说V2R6),似乎并不存在这样的问题,目前无法完全确定,留待以后研究

A WAY TO GET ALL ROW COUNT OF ALL TABLES IN A SPECIFIED DATABASE

RUN THE FOLLOWING SQL IN DBC AND THE GENERATED SQL SCRIPT COULD BE USED FOR GETTING ALL TABLES' ROW COUNT


SELECT ' SELECT '' ' || T.TABLENAME || ' '' AS TABLENAME , COUNT(*) AS ROWSNUM FROM ' || TRIM(BOTH FROM DATABASENAME) || '.' || TRIM(BOTH FROM TABLENAME) || ' UNION ' AS X
FROM DBC.TABLES T
WHERE T.TABLEKIND ='T' AND T.DATABASENAME = 'CVDTST'
ORDER BY T. TABLENAME

FROM
https://downloads.teradata.com/forum/analytics/row-counts-for-every-table-in-a-given-database



有时候需要统计某个DB下所有表的行数,在Teradata论坛上看到了这个方法
首先从DBC中取得表名的list,拼装成一个multi statment的SQL查询。
再次执行这个生成的SQL,得到所有表的总行数