2011/12/26

PERFORMANCE IMPACT WHEN TERADATA DEAL WITH "NOT IN" CLAUSE

THERE IS A NOTABLE PERFORMANCE IMPACT WHEN TERADATA DEAL WITH "NOT IN" CLAUSE 
WITHOUT NOT NULL PROPERTY SPECIFIED ON THE COLUMN


HERE IS THE  SQL SCRIPT AND explain IN TERDATA 13:

explain SELECT
       BOT_CD
      ,KAISYA_CD
      ,TKS_CD
      ,LOCA_CD
      ,COL_NO
      ,NHIN_BSHO_CD
      ,MAC_CD
      ,SHO_CD
      ,JSC_SHO_CD
      ,JSC_SHO_CD_EDA
      ,HEN_ZAIKO_SU
      ,KINAI_ZAIKO_SU
      ,HOTCOLD_FLG
FROM
    cvdtst.S_KINAIZAIKO_COL1221OU/* S_KINAIZAIKO_COL */
WHERE
    (
        BOT_CD
        ,KAISYA_CD
        ,TKS_CD
        ,LOCA_CD
        ,NHIN_BSHO_CD
        ,SHO_CD
        ,HOTCOLD_FLG
    )
    NOT IN
    (
        SELECT
            BOT_CD
            ,KAISYA_CD
            ,TKS_CD
            ,LOCA_CD
            ,NHIN_BSHO_CD
            ,SHO_CD
            ,BUNRUI_CD
        FROM
            cvdtst.S_SLIP1_SHO1221OU /* S_SLIP1_SHO */
        WHERE
            SURYOU > 0
    )
;

  1) First, we lock a distinct cvdtst."pseudo table" for read on a
     RowHash to prevent global deadlock for cvdtst.S_SLIP1_SHO1221OU.
  2) Next, we lock a distinct cvdtst."pseudo table" for read on a
     RowHash to prevent global deadlock for
     cvdtst.S_KINAIZAIKO_COL1221OU.
  3) We lock cvdtst.S_SLIP1_SHO1221OU for read, and we lock
     cvdtst.S_KINAIZAIKO_COL1221OU for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from cvdtst.S_SLIP1_SHO1221OU
          by way of an all-rows scan with a condition of (
          "cvdtst.S_SLIP1_SHO1221OU.SURYOU > 0.0") into Spool 7
          (all_amps), which is redistributed by the hash code of (
          cvdtst.S_SLIP1_SHO1221OU.BUNRUI_CD,
          cvdtst.S_SLIP1_SHO1221OU.SHO_CD,
          cvdtst.S_SLIP1_SHO1221OU.NHIN_BSHO_CD,
          cvdtst.S_SLIP1_SHO1221OU.LOCA_CD,
          cvdtst.S_SLIP1_SHO1221OU.TKS_CD,
          cvdtst.S_SLIP1_SHO1221OU.KAISYA_CD,
          cvdtst.S_SLIP1_SHO1221OU.BOT_CD) to all AMPs.  Then we do a
          SORT to order Spool 7 by row hash and the sort key in spool
          field1 eliminating duplicate rows.  The size of Spool 7 is
          estimated with no confidence to be 11,078,203 rows (
          1,074,585,691 bytes).  The estimated time for this step is
          7.68 seconds.
       2) We do an all-AMPs SUM step to aggregate from
          cvdtst.S_KINAIZAIKO_COL1221OU by way of an all-rows scan with
          no residual conditions.  Aggregate Intermediate Results are
          computed globally, then placed in Spool 4.
  5) We do an all-AMPs SUM step to aggregate from Spool 7 by way of an
     all-rows scan.  Aggregate Intermediate Results are computed
     globally, then placed in Spool 8.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by
          way of an all-rows scan into Spool 2 (all_amps), which is
          duplicated on all AMPs.
       2) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by
          way of an all-rows scan into Spool 3 (all_amps), which is
          duplicated on all AMPs.
  7) We do an all-AMPs RETRIEVE step from cvdtst.S_KINAIZAIKO_COL1221OU
     by way of an all-rows scan with no residual conditions into Spool
     6 (all_amps), which is redistributed by the hash code of (
     cvdtst.S_KINAIZAIKO_COL1221OU.BOT_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.KAISYA_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.TKS_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.LOCA_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.NHIN_BSHO_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.SHO_CD,
     cvdtst.S_KINAIZAIKO_COL1221OU.HOTCOLD_FLG) to all AMPs.  Then we
     do a SORT to order Spool 6 by row hash, and null value information
     in Spool 3 and Spool 2.  Skip this retrieve step if null exists.
     The size of Spool 6 is estimated with low confidence to be
     3,257,496 rows (218,252,232 bytes).  The estimated time for this
     step is 1.91 seconds.
  8) We execute the following steps in parallel.
       1) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of
          an all-rows scan, which is joined to Spool 7 by way of an
          all-rows scan.  Spool 6 and Spool 7 are joined using an
          exclusion merge join, with a join condition of ("(BOT_CD =
          BOT_CD) AND ((KAISYA_CD = KAISYA_CD) AND ((TKS_CD = TKS_CD)
          AND ((LOCA_CD = LOCA_CD) AND ((NHIN_BSHO_CD = NHIN_BSHO_CD)
          AND ((SHO_CD = SHO_CD) AND (HOTCOLD_FLG = BUNRUI_CD ))))))"),
          and null value information in Spool 3 and Spool 2.  Skip this
          join step if null exists.  The result goes into Spool 1
          (group_amps), which is built locally on the AMPs.  The size
          of Spool 1 is estimated with no confidence to be 3,257,496
          rows (244,312,200 bytes).  The estimated time for this step
          is 1.02 seconds.
       2) We do an all-AMPs RETRIEVE step from
          cvdtst.S_KINAIZAIKO_COL1221OU by way of an all-rows scan with
          no residual conditions into Spool 10 (all_amps), which is
          redistributed by the hash code of (
          cvdtst.S_KINAIZAIKO_COL1221OU.BOT_CD) to all AMPs.  Then we
          do a SORT to order Spool 10 by row hash, and null value
          information in Spool 3 and Spool 2.  Skip this retrieve step
          if there is no null.  The size of Spool 10 is estimated with
          low confidence to be 3,257,496 rows (218,252,232 bytes).  The
          estimated time for this step is 1.91 seconds.
  9) We do an all-AMPs RETRIEVE step from Spool 7 (Last Use) by way of
     an all-rows scan into Spool 11 (all_amps), which is redistributed
     by the hash code of (cvdtst.S_SLIP1_SHO1221OU.BOT_CD) to all AMPs.
     Then we do a SORT to order Spool 11 by row hash, and null value
     information in Spool 3 and Spool 2.  Skip this retrieve step if
     there is no null.  The size of Spool 11 is estimated with no
     confidence to be 11,078,203 rows (1,074,585,691 bytes).  The
     estimated time for this step is 7.68 seconds.
 10) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of an
     all-rows scan, which is joined to Spool 11 (Last Use) by way of an
     all-rows scan.  Spool 10 and Spool 11 are joined using an
     exclusion merge join, with a join condition of ("(BOT_CD = BOT_CD)
     AND ((KAISYA_CD = KAISYA_CD) AND ((TKS_CD = TKS_CD) AND ((LOCA_CD
     = LOCA_CD) AND ((NHIN_BSHO_CD = NHIN_BSHO_CD) AND ((SHO_CD =
     SHO_CD) AND (HOTCOLD_FLG = BUNRUI_CD ))))))"), and null value
     information in Spool 3 (Last Use) and Spool 2 (Last Use).  Skip
     this join step if there is no null.  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with no confidence to be 3,257,496 rows (
     244,312,200 bytes).  The estimated time for this step is 1.02
     seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.

TERADATA HAVE TO CONFIRM IF NULL VALUE EXISTS ON EVERY "NOT IN" COLUMN,
AND IT IS VERY COSTLY AND WILL GREATLY IMPACT THE SYSTEM'S PERFORMANCE.
WE CAN AVOID IT BY SPECIFIING "NOT NULL" PROPERTY ON THE "NOT IN" COLUMN


HERE IS A ANOTHER SAMPLE IN JAPANESE
http://d.hatena.ne.jp/tgk/20100913




-------------------------------------------------------------------------------
THE FOLLOWING IS SQL AND EXPLAIN IN TERADATA V2R6


explain SELECT
       BOT_CD
      ,KAISYA_CD
      ,LOCA_CD
      ,COL_NO
      ,MAC_CD
      ,SHO_CD
      ,HEN_ZAIKO_SU
      ,KINAI_ZAIKO_SU
      ,HOTCOLD_FLG
FROM
S_KINAIZAIKO_COL/* S_KINAIZAIKO_COL */
WHERE
    (
        BOT_CD
        ,KAISYA_CD
        ,LOCA_CD
        ,MAC_CD
        ,SHO_CD
        ,HOTCOLD_FLG
    )
    NOT IN
    (
        SELECT
            BOT_CD
            ,KAISYA_CD
            ,LOCA_CD
            ,MAC_CD
            ,SHO_CD
            ,BUNRUI_CD
        FROM
       S_SLIP1_SHO/* S_SLIP1_SHO */
        WHERE
            SURYOU > 0
    )



  1) First, we lock a distinct HWKRUN34."pseudo table" for read on a
     RowHash to prevent global deadlock for HWKRUN34.S_SLIP1_SHO.
  2) Next, we lock a distinct HWKRUN34."pseudo table" for read on a
     RowHash to prevent global deadlock for HWKRUN34.S_KINAIZAIKO_COL.
  3) We lock HWKRUN34.S_SLIP1_SHO for read, and we lock
     HWKRUN34.S_KINAIZAIKO_COL for read.
  4) We do an all-AMPs SUM step to aggregate from
     HWKRUN34.S_KINAIZAIKO_COL by way of an all-rows scan with no
     residual conditions.  Aggregate Intermediate Results are computed
     globally, then placed in Spool 3.
  5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 2 (all_amps) (compressed columns
     allowed), which is duplicated on all AMPs.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from
          HWKRUN34.S_KINAIZAIKO_COL by way of an all-rows scan with no
          residual conditions into Spool 5 (all_amps) (compressed
          columns allowed), which is redistributed by hash code to all
          AMPs.  Then we do a SORT to order Spool 5 by row hash, and
          null value information in Spool 2.  Skip this retrieve step
          if null exists.  The size of Spool 5 is estimated with low
          confidence to be 3,663,600 rows.  The estimated time for this
          step is 4.55 seconds.
       2) We do an all-AMPs RETRIEVE step from HWKRUN34.S_SLIP1_SHO by
          way of an all-rows scan with a condition of (
          "HWKRUN34.S_SLIP1_SHO.SURYOU > 0.0") into Spool 6 (all_amps),
          which is redistributed by hash code to all AMPs.  Then we do
          a SORT to order Spool 6 by row hash and the sort key in spool
          field1 eliminating duplicate rows.  The input table will not
          be cached in memory, but it is eligible for synchronized
          scanning.  The size of Spool 6 is estimated with no
          confidence to be 11,532,920 rows.  The estimated time for
          this step is 18.35 seconds.
  7) We execute the following steps in parallel.
       1) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of
          an all-rows scan, which is joined to Spool 6 by way of an
          all-rows scan.  Spool 5 and Spool 6 are joined using an
          exclusion merge join, with a join condition of ("(BOT_CD =
          BOT_CD) AND ((KAISYA_CD = KAISYA_CD) AND ((LOCA_CD = LOCA_CD)
          AND ((MAC_CD = MAC_CD) AND ((SHO_CD = SHO_CD) AND
          (HOTCOLD_FLG = BUNRUI_CD )))))"), and null value information
          in Spool 2.  Skip this join step if null exists.  The result
          goes into Spool 1 (group_amps), which is built locally on the
          AMPs.  The size of Spool 1 is estimated with index join
          confidence to be 3,663,600 rows.  The estimated time for this
          step is 1.47 seconds.
       2) We do an all-AMPs RETRIEVE step from
          HWKRUN34.S_KINAIZAIKO_COL by way of an all-rows scan with no
          residual conditions into Spool 7 (all_amps) (compressed
          columns allowed), which is redistributed by hash code to all
          AMPs.  Then we do a SORT to order Spool 7 by row hash, and
          null value information in Spool 2.  Skip this retrieve step
          if there is no null.  The size of Spool 7 is estimated with
          low confidence to be 3,663,600 rows.  The estimated time for
          this step is 4.55 seconds.
  8) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of
     an all-rows scan into Spool 8 (all_amps) (compressed columns
     allowed), which is redistributed by hash code to all AMPs.  Then
     we do a SORT to order Spool 8 by row hash, and null value
     information in Spool 2.  Skip this retrieve step if there is no
     null.  The size of Spool 8 is estimated with no confidence to be
     11,532,920 rows.  The estimated time for this step is 18.35
     seconds.
  9) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an
     all-rows scan, which is joined to Spool 8 (Last Use) by way of an
     all-rows scan.  Spool 7 and Spool 8 are joined using an exclusion
     merge join, with a join condition of ("(BOT_CD = BOT_CD) AND
     ((KAISYA_CD = KAISYA_CD) AND ((LOCA_CD = LOCA_CD) AND ((MAC_CD =
     MAC_CD) AND ((SHO_CD = SHO_CD) AND (HOTCOLD_FLG = BUNRUI_CD )))))"),
     and null value information in Spool 2 (Last Use).  Skip this join
     step if there is no null.  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with index join confidence to be 3,663,600
     rows.  The estimated time for this step is 1.47 seconds.
 10) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.



Teradata13在处理NOT IN查询的时候,如果NOT IN字段没有NOT NULL定义的话,TERADATA会确认每一个字段是否为NULL值,
这会导致数次FTS,极大的影响性能。解决方式是,给NOT IN项目定义NOT IN,或者把NOT IN改为左连接

不过在老版本的Teradata中(比如说V2R6),似乎并不存在这样的问题,目前无法完全确定,留待以后研究

No comments:

Post a Comment