A few weeks ago, I published the first episode in this “Serverless Security” blog series, which discussed the reasons why WAFs are a terrific technology, but are completely irrelevant when attempting to secure serverless applications. Today, I’d like to discuss a different technology, SAST (Static Analysis Security Testing).
According to Gartner’s definition:
“Static application security testing (SAST) is a set of technologies designed to analyze application source code, byte code and binaries for coding and design conditions that are indicative of security vulnerabilities. SAST solutions analyze an application from the “inside out” in a non-running state.”
So how does static analysis for security actually works? The main concept behind static code analysis is data flow analysis.
The static analyzer builds a sort of a flow map that represents how data flows inside your program’s source code - from the early point in which it enters the program, and all the way down to the end. From a security point of view, this allows the analyzer to detect an uninterrupted flow of potentially malicious input from the entry point, where the user supplies the input, and all the way to the sensitive and potentially risky use of the data.
Let’s look at a simple SQL Injection example:
txtUserId = request.getParameter("UserId");
txtSQL = "SELECT * FROM tbl WHERE UserId" + txtUserId;
In the example above, the “source” for the input is an HTTP request parameter called UserID. The sensitive call (referred to as “sink”) is the call to db.Execute(). If we trace the flow of data in this program, we see:
- [Step a] Input is taken from the HTTP parameter UserID and assigned to the variable txtUserId
- [Step b] The input is then concatenated to a static string, and assigned to the variable txtSQL (making it potentially tainted)
- [Step c] A sensitive database function call, uses that potentially hazardous input
At this point, the static analyzer reaches the conclusion that potentially malicious data can flow uninterrupted from outside of the program, and all the way to the sensitive function call - and so it will report that a SQL Injection vulnerability exists in the code.
As you can imagine, the program presented above is overly simplistic, and things might become more complex as data traverses larger applications that span across larger program code, external libraries and so forth. However, in recent years, many advancements in this field have improved analyzers, which are now extremely capable of accurately detecting application layer vulnerabilities in the majority of popular and common programming languages.
But let’s take a look at how serverless architectures are built. In general, I can think of two main types of serverless applications -
- A single function application, which performs a distinct action upon invocation
- A compound application, made up of multiple serverless functions, each representing a nano-service, responsible for a distinct task. These functions are glued together through cloud services, and are loosely coupled through the different event trigger types that make their logic
In the case where the entire logic is contained within a single function - it’s safe to say that SAST tools can provide tremendous accuracy in locating injection-based vulnerabilities. After all, input enters the serverless function through the event data, and may reach a sensitive function call inside the same function.
But what happens when our application is made up of multiple such functions? How can a SAST tool model data flow in the application, without making potentially inaccurate assumptions?
Consider the following scenario:
- A user creates a file on a cloud storage bucket
- As a result of the file creation, a serverless function (A) is invoked
- The function (a) takes the file name as input, and performs some action on the file, storing the results inside a NoSQL database table
- A second serverless function (B) is invoked by the addition of a line to the database, and performs an action on the data inside the table
- User data reaches a sensitive function call inside function (B)
Even in traditional software (i.e. not serverless), the flow above might pose problems and inaccuracies for static analyzers, since many key parts of the “puzzle” are still missing while the code is static. For example:
- It’s quite possible that prior to execution, we don’t know the name of the cloud storage bucket, or the filename, or even in which region it is located
- After the first function performs its task, it places the data inside a NoSQL database table. Prior to execution, we don’t know the actual location of the data. We might not have the “key” which was used to store the data inside the table
- Since we don’t know the actual “key” under which the data is stored in the NoSQL database table, we don’t know whether the second function is using potentially hazardous data, or perhaps just some static data in the same database
The problem is actually exacerbated in serverless environment, in which the coupling (or stitching) of functions is done through cloud-based services, and serverless event data, which adds even more unknowns when the code is still static.
As you can see, while analyzing data flow across different functions or classes of software is quite accurate, things become more “approximated” when you try to analyze data flow across nano-services/serverless functions. This approximation eventually leads to inaccurate results - usually false positives as a result of over-approximation, or even worse - false negatives as a result of under-approximation.
In order to support serverless architectures and provide serverless developers with proper tools for accurately detecting software vulnerabilities, SAST vendors will need to invest in research and improve the way static analysis is done for serverless functions, taking a more holistic and all-encompassing serverless view of software, rather than concentrate on single function analysis.
At the moment, it seems like SAST technologies are still doing things traditionally and leave us all yearning for a serverless-centric analysis.