
Path traversal vulnerabilities appear straightforward: an application uses user input to build a file path, the user submits ../../etc/passwd, and the application reads a file outside its intended directory. Filter for ../ sequences and the problem is solved. That mental model is the reason path traversal remains in the OWASP Top 10 after twenty-five years of awareness. The actual attack surface is far larger than the literal string ../.
The file system sees paths after all decoding operations complete. The input filter sees paths before decoding. The gap between these two processing points is where every bypass technique lives.
URL Encoding Bypasses
URL encoding represents special characters as percent-encoded sequences. The forward slash / becomes %2F, the dot . becomes %2E. A path traversal sequence ../ can be represented as %2E%2E%2F, or ..%2F, or %2E.%2F.
Input filters that check for the literal string ../ miss these encoded variants. The web server or application framework decodes the URL before passing the path parameter to application code — so the application receives the decoded value — but if the filter checked the raw encoded input, it never saw the traversal sequence. This is one of the most common path traversal bypass patterns in bug bounty reports and has been documented in Apache, nginx, Tomcat, and dozens of application frameworks.
Double encoding adds another layer: %252E%252E%252F is the URL encoding of the already-URL-encoded %2E%2E%2F. Systems that decode twice — some middleware chains decode once, some application code decodes again — see the fully resolved traversal sequence only after both decoding passes. Filters applied after the first decode miss it.
Unicode and UTF-8 Normalization
Unicode normalization equivalences create bypass opportunities on Windows and some Linux configurations. The Unicode codepoint U+FF0E (FULLWIDTH FULL STOP) is the full-width equivalent of the ASCII period. Some file system APIs, particularly on Windows NTFS via .NET, normalize Unicode inputs before path resolution. U+FF0E U+FF0E U+2215 (DIVISION SLASH) — visually similar to ../ but not the literal ASCII characters — traverses the directory on affected systems while bypassing filters that check for ASCII path traversal sequences.
Unicode overlong encoding in non-strict UTF-8 decoders is another historical bypass: the two-byte sequence 0xC0 0xAE is an overlong encoding of the ASCII period 0x2E. Strict UTF-8 parsers reject overlong encodings, but legacy systems and some C-based parsers accept them, creating the same filter bypass gap.
Null Byte Truncation
In C and C-based languages, strings are null-terminated — the string ends at the first \0 byte. Applications written in languages with C underpinnings (PHP, older versions of C extensions for Python and Ruby) pass file paths to operating system calls that also use C string handling. Injecting a null byte truncates the string at the OS level.
Classic exploit: an application expects file downloads to have a .pdf extension and validates that the filename ends in .pdf. The attacker submits ../../etc/passwd%00.pdf. The application checks for .pdf — present. It passes the full string to the file open call. The C runtime truncates at the null byte, opening ../../etc/passwd.
Null byte vulnerabilities are largely absent from modern Python 3, Java, and .NET applications (which use length-prefixed strings internally), but they remain in PHP code and in any application that passes paths to native C extensions or system calls.
Windows Path Syntax Variations
Windows path syntax provides additional bypass surface that Linux-focused developers miss. The backslash \ is a directory separator equivalent to forward slash on Windows. Applications that filter ../ but not ..\ are bypassed on Windows deployments. UNC paths (\\server\share) can redirect file access to network locations on Windows. Drive letter prefixes (C:\) can escape relative path restrictions entirely.
Windows short file names (8.3 format) create another entry point: PROGRA~1 resolves to Program Files. Applications that check display names but not short names can be bypassed on systems where short name generation is enabled (the default on many Windows installations).
Runtime Detection at the File System Layer
All of these bypass techniques share a single characteristic: after all decoding and normalization, the operating system receives a request to open a file path that exits the application's intended directory. Raven.io's agent instruments file system calls at the level closest to the OS — after all application-layer decoding, URL normalization, and path resolution have occurred.
The agent checks every file open or read operation against the application's baseline file access patterns. If the resolved file path falls outside the directories the application has ever read from during the baseline period — and /etc/passwd is not in any web application's normal read baseline — the operation is blocked and logged. The encoding technique that bypassed the input filter is irrelevant. The file system operation is blocked at the OS call level.
This is why runtime detection is the correct layer for path traversal defense. Input validation is a valuable first layer that reduces the volume of attempts that reach the application, but encoding bypasses are too varied and too consistent to rely on input validation alone. The definitive check happens at the point where the file system operation is actually requested, not at the point where user input is first received.
The Path Canonicalization Requirement
If you are implementing input validation for file paths, the essential step is canonicalization before checking. Convert the input to its canonical form — resolved to an absolute path, with all ../ sequences, encoding sequences, and symlinks resolved — and then verify that the canonical path begins with the allowed directory prefix. In Java, File.getCanonicalPath() does this. In Python, os.path.realpath() does it. In Node.js, path.resolve() followed by a prefix check.
The check must happen after canonicalization, not before. Checking before canonicalization is what every bypass technique exploits. This is a coding pattern that is easy to get wrong under time pressure, which is why runtime detection as a backstop remains important even for applications that implement input validation correctly.
Block Path Traversal at the OS Call Layer
Raven.io monitors every file system operation against the application's normal access baseline. Traversal attempts that bypass input filters are caught at the OS call level, before the file is opened.
See It in Action