How to use MariaDB REPLACE with LIKE on data containing backslashes

Recently I was cleaning up some data as part of a data analysis project, and ran into a frustrating problem with stray backslashes in text fields. Here’s how I ended up solving it.

Overview of the problem

The data I was working on was stored in a MariaDB database. During testing some variants from expected values had crept in, which was showing up as noise in the analysis. Essentially, I was trying to group data by a few expected values, for example test, google, facebook, and so on. However, some of the expected values had extra characters, for example google1, google\\'123. To compound the problem, these values were embedded in a longer text string, they were not standalone fields, meaning that I needed to match them using LIKE. Most of them were simple to change to the expected values by means of the REPLACE function. The problem came when cleaning up the backslash character, because it’s the standard escape character.

Non-working version

The obvious query to fix them didn’t work:

UPDATE `table_a` SET `field_a` = REPLACE(`field_a`, 'google\\\\\'123', 'google') WHERE `field_a` LIKE '%google\\\\\'123%';

I tried all kinds of variants of the backslashes, but couldn’t get them to match. So I decided to break down the problem into a SELECT, and then add the UPDATE...REPLACE later.

Wildcard SELECT WHERE LIKE

I managed to SELECT the correct rows by means of a wildcard like so:

SELECT `field_a` FROM `table_a` WHERE `field_a` LIKE '%google%123%';

This did get me the correct matches, but I couldn’t use the wildcard in the REPLACE, and since it’s not an exact match, in theory it could pick up the wrong rows. It seemed a good idea to fix the LIKE string in order to get an exact match.

LIKE with non-default ESCAPE

The correct solution to the SELECT turned out to be to change the wildcard from the default \. I ended up with this expression:

SELECT `field_a` FROM `table_a` WHERE `field_a` LIKE '%google\\_\'123%' ESCAPE '_';

While the syntax is still a bit of a mystery to me (why is the backslash before the single quote still needed?), this worked. Unfortunately I still couldn't use it with the REPLACE function. That led me to the second part of the solution.

REPLACE with character code

Since I couldn’t get the escaping to work with the backslash character, I ended up using its character code (92) instead of the character itself, and then joining the string together with CONCAT. The end result looked like this:

UPDATE `table_a` SET `field_a` = REPLACE(`field_a`, CONCAT('google', CHAR(92), CHAR(92), '\'123'), 'google') WHERE `field_a` LIKE '%google\\_\'123%' ESCAPE '_';

Add new comment

CAPTCHA