Preventing cross-site scripting in mixed PHP/JavaScript

In my security work, I spend a large proportion of my time making sure that user input is properly escaped. This is essential to prevent SQL injection and cross-site scripting (XSS) attacks. Thanks to prepared statements, SQL injection is easy to avoid*, even in old code bases. Unfortunately XSS can be more difficult to catch when dealing with PHP. This is due to the potential mix of single quotes, double quotes, backticks, PHP syntax, JavaScript syntax, and HTML syntax.

There’s a strong argument to be made that user input should not be allowed in inline JavaScript. In practice, when dealing with legacy code bases, it’s not always feasible to refactor things to completely disallow it. Consider this simple example:

<a onclick="alert('<?= $_GET['a'] ?? 'default'; ?>');">Click</a>

Since there is no sanitisation or escaping of the GET parameter, it’s trivial to put XSS in there. We need to filter it but PHP has no single function that will escape all possible values that might break the JavaScript or HTML. It’s further complicated by the fact that we could switch the ' and " quotes, and it would still be valid HTML, JavaScript and PHP. We could even use ` backticks in the JavaScript instead of quotes and it would still be valid, while opening up even more vulnerabilities since JavaScript potentially evaluates code in template literals.

Of course, it is possible to sanitise the input, but there is no one-size-fits-all solution. We need to be aware of the context. Here are PHP functions to escape user input for use in each of the possible cases:

<?php
/**
 * Filter a string for output as a JavaScript string enclosed in single quotes
 *
 * @param ?string $input String to be sanitised
 * @return string Input with single quotes escaped, other risky characters converted to their HTML code equivalent, wrapped in single quotes
 */
function escape_js_single(?string $input): string
{
    return "'" . addslashes(htmlentities($input ?? '', ENT_COMPAT)) . "'";
}

/**
 * Filter a string for output as a JavaScript string enclosed in double quotes
 *
 * @param ?string $input String to be sanitised
 * @return string Input with all risky characters converted to their HTML code equivalent, single and double quotes escaped, wrapped in double quotes
 */
function escape_js_double(?string $input): string
{
    return '"' . htmlentities(addslashes($input ?? ''), ENT_QUOTES) . '"';
}

/**
 * Filter a string for output as a JavaScript string enclosed in backticks
 *
 * @param ?string $input String to be sanitised
 * @return string Input with template literal characters escaped, other risky characters converted to their HTML code equivalent, wrapped in backticks
 */
function escape_js_backtick(?string $input): string
{
    return '`' . htmlentities(str_replace(array('`', '$'), array('\\`', '\\$'), $input ?? ''), ENT_QUOTES) . '`';
}
?>

It’s worth mentioning here that if the expected data type is not a string, then it makes more sense to cast it to the expected type, for example with intval().

* On a side note, there’s a parallel between this problem of inline JavaScript, and SQL injection. Prepared statements are a simple way to prevent injection into data values. However, they don’t prevent injection into other parts of SQL, such as commands, table names, and field names. Much as with inline JavaScript, it is best to avoid all user input in SQL statements except for values. Once again though, it’s not always practical to completely prevent this in a legacy code base. Once legitimate reason to allow user input in field names is to permit the user to sort by columns. Some possible ways to mitigate the risk of SQL injection in this case, in order of most to least preferable, are:

  1. Validate against an allow list of acceptable values.
  2. Filter with preg_replace('\\W', '', $field) (this only allows alphanumerics and _ underscores).
  3. Filter with alphanumerics, _ underscores, . separators, , commas, and   spaces.

Add new comment

CAPTCHA