澡堂子: January 2012

2012/01/18

SOME DIFFERENCE BETWEEN BTEQ AND FASTEXPORT WHEN EXPORTING DATA

BTEQ will truncates the trailing space, but Fastexport WILL NOT

http://www.coffingdw.com/utilities/tdutil/using_bteq_to_export_data.htm

2012/01/05

DEAL WITH THE PADDING DOT WHEN CAST A DECIMAL TO CHAR

in Teradata, when cast a decimal to char directly,
the result will show out with a padding dot at the end of the number

eg. the following query will results '123.'
select cast(123. as varchar(10))

the annoying dot can be dealed by cast to Z(I) format

select cast(cast(123. as format 'Z(I)') as varchar(10))

that will bring us the desired results '123'

cast decimal to integer can also do the same thing,
select cast(cast(123. as integer) as varchar(10))

but must take care of the lenth of the decimal number
because the integer can only accommodate numbers from -2,147,483,648 through +2,147,483,647. and when cast to Z(I) the max number can be 9999999999.

USING FASTEXPORT TO EXPORT TAB-DELIMITED TEXT FILE

we must remember one thing that fastexport is used for export flat-file from Teradata and the output file is used for load to tables.
so do not try to open the exported file, that's not the purpose of fastexport.

but, sometimes, we indid have to use the exported file as a text file, eg. tab-delimited text file

if all output columns are fixed length char then we just need to add tab as dummy column between columns like

.EXPORT
OUTFILE E:/DAT3/CNV/DATA23/SPND040060.DAT
MODE RECORD
FORMAT TEXT;

SELECT
ADDRESS1(CHAR(20))
,' '||ADDRESS2(CHAR(20))
,' '||ADDRESS3(CHAR(20))
,' '||ADDRESS4(CHAR(20))
FROM
M_VM_KNRYO

.END EXPORT;

but, if any column is varchar, instead of fixed length char, we will got some trouble here
because teradata will place some indicators includes the infomation of the following varchar column, at each head of varchar column . eg. length of the varchar following

in this case, we can use some tricks
like, combine all output all columns as one, separated by tab

.EXPORT
OUTFILE E:/DAT3/CNV/DATA23/SPND040060.DAT
MODE RECORD
FORMAT TEXT;

SELECT
ADDRESS1
||' '||ADDRESS2
||' '||ADDRESS3
||' '||ADDRESS4
FROM
M_VM_KNRYO

.END EXPORT;

Although, as a solo varchar column, on the head of each line still exist the 2bytes indicators, we can easily delete it by external tools

FASTEXPORT的目的，原本便是为了输出LOAD所需的数据，并非用来导出可编辑的文本文件。
但是有时候我们不得不这么做。比如说，输出一个TAB间隔的文本文件，提供给其他系统。

如果每个字段都是定长，只需要在每两个字段间，手工添加TAB
但是如果有VARCHAR,那么事情就变得麻烦起来，Teradata在Fastexport可变长字段的时候，会在每个变长字段前添加某些标记信息，比如说变长字段的长度等
这些标记会被文本编辑器识别为乱码，这时候，我们可以把所有的字段和tab拼接成一个单独的字段，整体输出
这样只有在每行开头输出标记信息，可以容易的用外部工具删除

当然了，数据量小的时候，还是用定长，或者干脆用BTEQ来EXPORT容易些

2012/01/03

TERADATA SQL TO VIEW SKEW DATA

usally we can use the dbc.TABLESIZE table to see perm size of each AMP on a table to find that if there is any significant data skew exist

SELECT
VPROC, SUM(CURRENTPERM)
FROM
DBC.TABLESIZE
WHERE
DATABASENAME = 'CVDTST'
AND TABLENAME = 'K_CALENDER'
GROUP BY
VPROC

Vproc Sum(CurrentPerm)
0 5,632.00
1 5,632.00
2 4,608.00
3 4,608.00
4 5,120.00
5 5,632.00
.
.
.
.

81 5,120.00
82 6,144.00
83 5,632.00
84 5,120.00
85 5,120.00
86 5,120.00
87 5,120.00

More accurate method is, count table's row count on each amp but not perm size.
it is more accurate, but much more costly

SELECT
HASHAMP (HASHBUCKET(HASHROW ( BOT_CD ,KAISYA_CD ,REKI_YMD))) AS "AMP NO", COUNT(*)
FROM
CVDTST.K_CALENDER
GROUP BY
1

AMP NO Count(*)
0 57
1 60
2 45
3 43
4 51
5 55
.
.
.
.
.
80 50
81 49
82 62
83 56
84 48
85 52
86 54
87 52

一般来说，我们可以通过查看DBC.Tablesize中某个表在各个AMP上占用的PERM空间来判定是否有显著的SKEW。
更精确的方法是利用PI字段依次计算出row hash,bucket,amp，通过统计每个amp上分配的行数来精确判定SKEW。
这个方法更精确，当然也需要更大的代价