SQL Server: Remove non-printable Unicode characters

When you receive data from various sources like excel, text, csv formats, frequently non-printable characters will exist. This junk should be removed first to do further steps. Because these Non-Printable Characters are not good for matching and analysis processes. To remove these Non-Printable characters I found few methods as shown in script below.
USE [MyDatabase]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
------------------------------------------------------------------------------------------------
-- Description    : Removes Non-Printable Characters from a string
-- Usage        : select N'Stringğ withħ įņvalidđ charactersŝ',dbo.fn_RemoveNonPrintableChars('Stringğ withħ įņvalidđ charactersŝ')
-------------------------------------------------------------------------------------------------
CREATE function [dbo].[fn_RemoveNonPrintableChars]
(
 @strIn as varchar(1000)
)
returns varchar(1000)
as
begin
 declare @iPtr as int
 set @iPtr = patindex('%[^ -~0-9A-Z]%', @strIn COLLATE LATIN1_GENERAL_BIN)
 while @iPtr > 0 begin
  set @strIn = replace(@strIn COLLATE LATIN1_GENERAL_BIN, substring(@strIn, @iPtr, 1), '')
  set @iPtr = patindex('%[^ -~0-9A-Z]%', @strIn COLLATE LATIN1_GENERAL_BIN)
 end
 return @strIn
end

Below is the result


No comments:

Post a Comment