Archive for category ETL/ELT

Simple PERL script to remove the extra new line character from delimited flat files

When you use BTEQ, SQL*Plus, or Informatica to generate delimited flat file from RDBMS, the new line character embedded within a string/varchar column is quite annoying. You can add REPLACE() function to each string column in your SQL to remove these offending \n, but that’s still painful.

Let’s pipe the output of SQL into the following PERL script, and it will try its best to reconstruct the column by removing the extra new line characters. The assumption is – either the 1st row is in good shape which does not contain extra new line character, or the 1st row is the column header.

#!/usr/bin/perl -w

my $file_name = shift @ARGV;

my ($line_count, $line_fixed, $expected_delimiter_count, $count, $n, $del) = (0,0,0,0,0, '');

my @possible_delimiter = (',',"\t",';','\|', '\cA', '\cZ');

# open STDIN or file
if (defined $file_name) { open INPUTPIPE, "<" , $file_name or die $!; } 
else { open INPUTPIPE, "<&STDIN"; }

# count delimiter in the 1st line
my $line_buffer = <INPUTPIPE>;
$n++;
print $line_buffer; $line_count++; # the 1st line must be in good shape

foreach $del (@possible_delimiter){
	chomp($line_buffer);
	$count = () = ($line_buffer =~ /$del/g);
	print STDERR $count . " ($del) delimiters found in 1st row.\n";
	if ( $count > $expected_delimiter_count ){
		$expected_delimiter_count = $count;
		$delimiter = $del;
	}
}

die "Can't detect delimiter!\n" if $expected_delimiter_count <= 0;

print STDERR $expected_delimiter_count . " ($delimiter) delimiters expected.\n\n";

# read through the rest 
while( $line_buffer = <INPUTPIPE> ){
	$n++;
	$count = () = ($line_buffer =~ /$delimiter/g);
	if ( $count >= $expected_delimiter_count ){
		print $line_buffer;
		$line_count++;
		print STDERR "$n : $count : output \n";
		next;
	} else {
		while( $next_line = <INPUTPIPE> ){
			$n++;
			chomp($line_buffer);
			$line_buffer .= $next_line;
			$count = () = ($line_buffer =~ /$delimiter/g);
			# print STDERR "$n : $count : $line_buffer";
			if ( $count >= $expected_delimiter_count ){
				print $line_buffer;
				$line_count++;
				$line_fixed++;
				
				last;
			}
		} # read the next line
	}
}

# finish up
print STDERR $line_count . " rows output into STDOUT and " . $line_fixed . " rows fixed.\n";
close INPUTPIPE;

, , ,

Leave a comment

Informatica Repository Query (part 2)

Continue to post a few more queries for Informatica PowerCenter Repository. Part 1 can be found at http://it.toolbox.com/blogs/golden-orbit-blog/dig-into-informatica-powercenter-repository-29027

How to find Teradata MLoad session against a particular table

select U.SUBJ_NAME, F.TASK_NAME WORKFLOW_NAME, T.Instance_name session_name, S.INSTANCE_NAME target_name,
F.subject_id, F.TASK_ID WORKFLOW_ID, s.session_id, s.MAPPING_ID
from OPB_TASK_INST T, OPB_TASK F, OPB_SUBJECT U,
(
  select session_id, sess_widg_inst_id, version_number,
    instance_name, mapping_id,
    rank() over (partition by session_id, sess_widg_inst_id order by version_number desc) sort_id
  from OPB_SWIDGET_INST
  where widget_type = 2
    and UPPER(INSTANCE_NAME) LIKE '%MY_TABLE_NAME%'
) S
where (s.session_id, s.sess_widg_inst_id, s.version_number) in
(
  select session_id, sess_widg_inst_id, max(VERSION_NUMBER) version_number
  from OPB_EXTN_ATTR e
  where object_type = 79 and object_subtype = 315000 -- MLoad Extension
  group by SESSION_ID, sess_widg_inst_id
)
 and s.SORT_ID = 1 -- only the current version
 and s.SESSION_ID = T.task_id
 and s.VERSION_NUMBER = T.VERSION_NUMBER
 and T.WORKFLOW_ID = F.TASK_ID
 and T.VERSION_NUMBER = F.VERSION_NUMBER
 and F.subject_id = U.SUBJ_ID
order by 1,2,3;

-- It will be nice to see workflow name in View Point
update OPB_EXTN_ATTR
set ATTR_VALUE = 'SubjectArea=$PMFolderName; Workflow=$PMWorkflowName; Session=$PMSessionName; WorkflowRunId=$PMWorkflowRunId;'
where attr_id = 26 -- Query Band Expression
and object_type = 79  -- Target
and OBJECT_SUBTYPE = 315000 -- Teradata Plug-in
and (SESSION_ID, SESS_WIDG_INST_ID, VERSION_NUMBER) in 
(
    select session_id, sess_widg_inst_id, max(VERSION_NUMBER) version_number
    from OPB_EXTN_ATTR e
    where object_type = 79 and object_subtype = 315000
    group by SESSION_ID, sess_widg_inst_id
);

update OPB_EXTN_ATTR
set ATTR_VALUE = '1' 
where attr_id = 10 -- Check "Drop Log/Error Tables" option
and object_type = 79  -- Target
and OBJECT_SUBTYPE = 315000 -- Teradata Plug-in
and ATTR_VALUE = '0';  -- Option is not set yet

How to view all the expressions in a transformation

select wf.WIDGET_ID, wf.FIELD_ID, wf.FIELD_NAME, e.EXPR_ID, e.LINE_NO, e.EXPRESSION
from OPB_EXPRESSION e, OPB_WIDGET_FIELD wf, OPB_WIDGET_EXPR we
where wf.WIDGET_ID = e.WIDGET_ID
  and wf.WIDGET_ID = we.WIDGET_ID
  and wf.FIELD_ID = we.OUTPUT_FIELD_ID
  and we.EXPR_ID = e.EXPR_ID
  and wf.VERSION_NUMBER = we.VERSION_NUMBER
  and we.VERSION_NUMBER = e.VERSION_NUMBER 
  and (wf.WIDGET_ID, wf.VERSION_NUMBER) =  (
    select w.widget_id, max(w.version_number) version_number --, w.widget_type, m.mapping_id
    from OPB_SUBJECT s, OPB_MAPPING m, OPB_WIDGET w, OPB_WIDGET_INST wi
    where m.subject_id = s.subj_id
      and m.mapping_id = wi.mapping_id
      and w.widget_id = wi.WIDGET_ID
      and w.subject_id = m.subject_id
      and s.SUBJ_NAME = 'folder_name1' 
      and m.MAPPING_NAME = 'm_mapping_name1'
      and w.WIDGET_NAME = 'transformation_name1'
    group by w.widget_id
  )

Make sure “Insert Else Update” and “Update Else Insert” are checked for Dynamic Lookup

update OPB_WIDGET_ATTR
set ATTR_VALUE = '1'
where (WIDGET_ID, WIDGET_TYPE, VERSION_NUMBER) in
( select w.WIDGET_ID, w.WIDGET_TYPE, max(w.VERSION_NUMBER) VERSION_NUMBER
  from opb_widget w, opb_widget_attr wa
  where wa.ATTR_ID = 15
    and wa.ATTR_VALUE = '1' -- Dynamic Lookup Cache
    and w.WIDGET_ID = wa.WIDGET_ID
    and w.WIDGET_TYPE = wa.WIDGET_TYPE
    and w.VERSION_NUMBER = wa.VERSION_NUMBER
    and w.WIDGET_TYPE = 11 -- Lookup
  group by w.WIDGET_ID, w.WIDGET_TYPE
)
and ATTR_ID in (18,19) -- Insert Else Update, Update Else Insert
and LINE_NO = 1
and ATTR_VALUE = '0';

select w.WIDGET_NAME, w.VERSION_ID, w.WIDGET_ID,
  wa.ATTR_ID, a.ATTR_NAME, wa.LINE_NO, wa.ATTR_VALUE 
from opb_widget w, opb_widget_attr wa, opb_attr a
where w.WIDGET_NAME like ??? -- put your filter here
  and w.WIDGET_ID = wa.WIDGET_ID
  and w.WIDGET_TYPE = wa.WIDGET_TYPE
  and w.VERSION_NUMBER = wa.VERSION_NUMBER
  and wa.ATTR_ID = a.attr_id
  and wa.WIDGET_TYPE = a.OBJECT_TYPE_ID 
order by w.WIDGET_NAME, w.VERSION_NUMBER desc, wa.ATTR_ID, wa.LINE_NO;

, ,

Leave a comment