I’ve been waiting for a while now for the AWS folks to open AWS Serverless Application Repository to the general public. What could be more fun than getting access to even more serverless applications, seeing what people are up to, and finding even more ideas on what you can build using serverless architectures?
While I was window-shopping the repository a few days ago, I stumbled upon a trivial serverless application named ‘aws-serverless-twitter-event-source’. You can probably imagine what this little application does, but here’s the official description:
“This serverless app turns a twitter search query into an AWS Lambda event source by invoking a given lambda function to process tweets found by search. It works by periodically polling the freely available public Twitter Standard Search API and invoking a lambda function you provide to process tweets found.”
As I was looking at the architecture of this application, it reminded me of a similar serverless application I recently built, which performs daily searches on the Shodan search engine, looking for, ehhh, ehhh, never mind…;-)
Both applications are triggered periodically by a CloudWatch Events Rule (scheduler), which in turn means they don’t really get any kind of input from the event itself. If you’re like me, and you’re worried about application security (which you should be), this already reduces the attack surface by at least half, right? No need to worry about event input validation, or injection attacks.
A closer look at both functions reveals that although the event doesn’t serve as an entry point to app logic, there’s something else that introduces untrusted data into our code. An API call.
Here’s a snippet from my own Lambda function. Pay attention to the Shodan API calls in lines 13 and 20:
In essence, these Shodan API calls are the real entry points to my application - that’s the weak link in my chain, and I shouldn’t trust any data coming in from these API calls.
Let’s stop to think about what are the risks involved, and how can we protect ourselves against them. From the risks perspective, it all boils down to how we use the data coming in from the API call response. It goes without saying that we must first validate the data, and in no way use it directly in any interpreter or shell process. However, we don’t always have control over imported, third party libraries and how they use this data, so we should always be prepared for the worst and make sure that we only accept legitimate input.
But what if we can’t always validate the input effectively?
What if the input itself contains values so crazy (Twitter, Shodan, or other untrusted sources) that applying any kind of pattern matching will most likely not work as well as expected?
Behavioral Protection to the Rescue
Traditionally, application firewalls scanned input and attempted to detect malicious payloads sent to application entry points. However, they were never meant to scan data coming in as a result of an outbound request from the application - which is also the reason they never provided a good enough protection for Remote File Injection.
Since we can’t trust the API’s response data, it's critical to have the ability to “contain” and monitor data when being used by our application, and make sure it does not cause any unauthorized, malicious actions.
The third principle in the “6 Principles of a Good Serverless Security Solution” guide, states that a good, serverless security solution should be future-proof and provide behavioral-based protection that learns and adapts over time. Such a behavioral-based protection is exactly the kind of protection that should help us to contain untrusted API response data.
It goes without saying that developers should always perform input validation and sanitization to ensure that any data entering the application is “sterile”. However, it’s unreasonable to assume that developers will write code for monitoring potentially hazardous data coming in, and contain it. Moreover, developers are not always in control of everything that’s going on in the application - e.g. they can’t control the third party libraries imported.
In order to hermetically secure serverless applications beyond input validation, PureSec developed a unique, proprietary behavioral protection engine for serverless applications, which is part of the PureSec Tesseract SSRE solution.
The behavioral protection engine provides the kind of protection and assurance that your function will always work the way it was designed to, even when it consumes untrusted data. It is a second layer of protection, which goes way beyond input validation by safeguarding your application from within, to prevent malicious data from subverting application logic.
From time to time I get to meet folks who are 100% certain that their serverless functions are secure since they don’t use any data arriving through the event invocation (see reaction #5 in my previous blog post ‘The Six Most Common Reactions to The Words “Serverless Security”’).
Keep in mind that event-data is not the only way to consume potentially malicious data. Make sure you always sanitize and validate data coming in through API calls, and if you’d like further assurance and confidence in the security posture of your serverless applications, contact PureSec for a demo of our PureSec Tesseract serverless security solution.
This reminds me, on March 27th we are hosting a live webinar titled “Protecting Serverless Functions from Application Layer Attacks”, in which I will discuss serverless application security and demonstrate the PureSec Tesseract solution in action. Make sure you sign up, since we have limited capacity on this webinar.