Wednesday, March 28, 2012

Problems importing text files with double-quotes as text qualifier

I have text data files from a third party and they use comma as field delimiters and enclose the text for each column in double-quotes. Not a problem for most of the data files until they start sending files where there is " within the column values. SSIS package fails with the error:

The column delimiter for column "Column 1" was not found.

Any ideas on how to resolve this issue will be greatly appreciated.

Thanks
pcp

Import the data as a flat file... don't try to parse the columns apart yet. Save the data as one giant column per row. In your data flow, use a custom script to parse it apart. Here's a modified example of mine:

' Microsoft SQL Server Integration Services user script component

' This is your new script component in Microsoft Visual Basic .NET

' ScriptMain is the entrypoint class for script components

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain

Inherits UserComponent

Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)

Dim unparsedSubString As String

Dim commaPos As Integer = 0

Dim firstQuotePos As Integer = 0

Dim nextQuotePos As Integer = 0

Dim colArray(0) As String

unparsedSubString = Row.RawColumns

commaPos = unparsedSubString.IndexOf(",", 0)

' while the comma position exists within the string, loop

While commaPos <> -1

' get the position of the first two quotes within the string

firstQuotePos = unparsedSubString.IndexOf("""", 0)

nextQuotePos = unparsedSubString.IndexOf("""", firstQuotePos + 1)

' if the comma position is between a set of quotes, grab everything within the quotes

If commaPos > firstQuotePos AndAlso commaPos < nextQuotePos Then

commaPos = unparsedSubString.IndexOf(",", nextQuotePos)

End If

' Resize the array to hold a new element.Element 0 may exist and be unfilled, which

' is required to make sure there are no null references on first byref call to Resize.

If colArray(0) <> Nothing Then

Array.Resize(colArray, colArray.Length + 1)

End If

If commaPos >= 0 Then

colArray(colArray.Length - 1) = unparsedSubString.Substring(0, commaPos).Replace("""", "")

unparsedSubString = unparsedSubString.Substring(commaPos + 1, unparsedSubString.Length - (commaPos + 1))

Else

' This clause covers the last field being qouted with a comma

colArray(colArray.Length - 1) = unparsedSubString.Replace("""", "")

unparsedSubString = String.Empty

End If

' values for next iteration of loop

commaPos = unparsedSubString.IndexOf(",", 0)

End While

' Add whatever contents exist after the last comma

' This clause covers the last field being unquoted or quoted without a comma.

If unparsedSubString.Length > 0 Then

If colArray(0) <> Nothing Then

Array.Resize(colArray, colArray.Length + 1)

End If

colArray(colArray.Length - 1) = unparsedSubString.Replace("""", "")

End If

If colArray.Length >= 2 AndAlso colArray(1) = "FIL" Then

Me.OutputBuffer.AddRow()

Me.OutputBuffer.Column1 = colArray(0)

Me.OutputBuffer.Column2 = colArray(1)

Me.OutputBuffer.Column3 = colArray(2)

Me.OutputBuffer.Column4 = colArray(3)

Me.OutputBuffer.Column5 = colArray(4)

Me.OutputBuffer.Column6 = colArray(5)

Me.OutputBuffer.Column7 = colArray(6)

Me.OutputBuffer.Column8 = colArray(7)

Me.OutputBuffer.Column9 = colArray(8)

Me.OutputBuffer.Column10 = colArray(9)

Me.OutputBuffer.Column11 = colArray(10)

End If

End Sub

End Class

I'm guessing you can figure the rest of the column mappings and such from here. Good luck.

|||Thanks Robert. This was very helpful. I just tried a variation with the Split function and it worked too. Any thoughts on the shortcomings of using the Split?|||

Hi Robert,

I tried to use a transformation Script Component to do it, but all I am getting is

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain

Inherits UserComponent

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

.....

I cannot get it to be created with Row as InputBuffer and use the RawColumns properties. InputBuffer class seems to be unrecognized somehow.

I am using SQL 2005 Developer edition with SP1. What am I missing here?

Thanks.

|||

Another question :

In 2005 SSIS, Is there an easy way to import a flat file like

"ABC", "EDFG", "123", "10/1/2006" to a SQL database table, with the last two columns as int and date fields ?

Always errored out for data type mismatch. I used SQL 2000 DTS for same data file, no problem at all, but can't get the import work in SSIS.

Thanks.

|||I tried pasting the script, but I get the following errors:

type 'UserComponent' is not defined
type 'InputBuffer' is not defined
'OutputBuffer' is not a member of 'ScriptTask_7a67..........

The dtspipeline doesn't show up as an available reference in the editer.However, it does show up when I launch a regular instance of Visual Studio.I tried copying DTSPipeline.dll from the sql server directory to the .net framework directory, but it doesn't seem to register automatically.

Any ideas?
Thanks|||i seem to be having the same problems. let me know if you have a solution to this.

No comments:

Post a Comment