Recently I was cleaning up some data as part of a data analysis project, and ran into a frustrating problem with stray backslashes in text fields. Here’s how I ended up solving it.
Overview of the problem
The data I was working on was stored in a MariaDB database. During testing some variants from expected values had crept in, which was showing up as noise in the analysis. Essentially, I was trying to group data by a few expected values, for example test
, google
, facebook
, and so on. However, some of the expected values had extra characters, for example google1
, google\\'123
. To compound the problem, these values were embedded in a longer text string, they were not standalone fields, meaning that I needed to match them using LIKE
. Most of them were simple to change to the expected values by means of the REPLACE
function. The problem came when cleaning up the backslash character, because it’s the standard escape character.
Non-working version
The obvious query to fix them didn’t work:
UPDATE `table_a` SET `field_a` = REPLACE(`field_a`, 'google\\\\\'123', 'google') WHERE `field_a` LIKE '%google\\\\\'123%';
I tried all kinds of variants of the backslashes, but couldn’t get them to match. So I decided to break down the problem into a SELECT
, and then add the UPDATE...REPLACE
later.
Wildcard SELECT WHERE LIKE
I managed to SELECT
the correct rows by means of a wildcard like so:
SELECT `field_a` FROM `table_a` WHERE `field_a` LIKE '%google%123%';
This did get me the correct matches, but I couldn’t use the wildcard in the REPLACE
, and since it’s not an exact match, in theory it could pick up the wrong rows. It seemed a good idea to fix the LIKE
string in order to get an exact match.
LIKE
with non-default ESCAPE
The correct solution to the SELECT
turned out to be to change the wildcard from the default \
. I ended up with this expression:
SELECT `field_a` FROM `table_a` WHERE `field_a` LIKE '%google\\_\'123%' ESCAPE '_';
While the syntax is still a bit of a mystery to me (why is the backslash before the single quote still needed?), this worked. Unfortunately I still couldn't use it with the REPLACE
function. That led me to the second part of the solution.
REPLACE
with character code
Since I couldn’t get the escaping to work with the backslash character, I ended up using its character code (92) instead of the character itself, and then joining the string together with CONCAT
. The end result looked like this:
UPDATE `table_a` SET `field_a` = REPLACE(`field_a`, CONCAT('google', CHAR(92), CHAR(92), '\'123'), 'google') WHERE `field_a` LIKE '%google\\_\'123%' ESCAPE '_';