BlogSeptember 17, 2020
AWS Lambda: Overcoming the Cold Start
AWS Lambda lets you build an app without thinking about servers. Amazon manages them for you - you just upload your code.
In many ways, it makes things pretty simple. But not everything comes easier with Lambda. In a recent project, it presented challenges that took quite some effort to overcome.
I’ll start this post by briefly listing out reasons why Lambda is great. Then I’ll focus on the main challenge it presented us: the cold start. I’ll define what a “cold start” is and discuss what you can do about it.
The Advantages of Lambda
When developing an app, Lambda’s appeal is pretty clear. You don’t have to manage servers, and you only pay for what you use. It’s basically the easiest way to get your code up and running in the cloud.
Lambda has many awesome features. A few of them:
1. It scales automatically – if you have a sudden spike in traffic, lambda will adjust.
2. It natively supports most major programming languages: Java, Python, C#, Node.js, Ruby and Go.
3. It is event-driven, so you can trigger your function a ton of different ways.
4. You can monitor requests to detect when anything goes wrong.
Really, Lambda just lets you focus on what you do best: writing code. That’s what’s great about it. But before you make Lambda the base of your architecture, you should prepare for any complication it may present.
For our project, this complication was the “cold start.” With an upfront understanding of what this is, what causes it, and what you can do about it, you will be well-equipped for success with AWS Lambda.
The Challenge of Cold Starts
Our main challenge with Lambda centered around latency. More specifically, our lambdas got really slow on initial invocations (the cold starts). Let’s go into detail.
Experiencing the Cold
Cold starts happen on a new request to a lambda. The new request spins up an instance, which then stays alive for a short time period (it varies, but usually around 10 minutes).
So, if you haven’t called your lambda in more than 10 minutes, then the next call spins up a new instance. That’s when you get a “cold start.” You can also get a cold start if you need additional instances upon scaling. The cold start makes the request take longer. And sometimes, it could be a lot longer.
We saw many cold starts take half a minute or more. In fact, API Gateway times out after just 29 seconds. So for certain Lambdas, the first request from the UI would always time out. For us, this was incredibly frustrating. But this isn’t to say your cold start will take 30 seconds. Many variables played into our issues: the decision to use Java, the complexity of our functions, the size of our deployment packages, which I’ll get into later in the article. For now, let’s talk about what we did to counteract the cold start.
Bringing in the Heaters
Thankfully, there are ways to reduce a cold start’s impact. The band-aid fix is to simply have a timer periodically hit the lambda and “warm” it. This was our first approach.
Turns out that when we tried this technique, it just didn’t warm the functions enough. It got the time down a little bit, but not as much as we wanted. Some requests still took 30 seconds. Manual warming is also a bit hacky and costly. When you “warm” the functions, you don’t want to write to a database or do anything else substantial, so you have to add new logic. This means you pay for additional effort, as well as for additional requests.
Thankfully, Amazon is well-aware of the cold start issue and has been working hard to alleviate the pain. In September 2019, AWS announced improvements to VPC networking for lambda functions. This improved performance and fixed a portion of the cold start issues. It didn’t help our lambdas too much though.
At re:Invent 2019, AWS announced a new feature to tackle Cold Starts called Provisioned Concurrency. This basically keeps instances warm for you by setting up the execution environment and initializing your functions before they are called.
Provisioned Concurrency is a great feature. It’s cleaner, easier and more scalable than manually warming your functions, and it has helped us reduce cold start times pretty significantly.
We were able to eliminate the API gateway timeouts using Amazon’s new feature; the lambdas no longer took 30 seconds, but they weren’t suddenly fast by any standard, with most still running for a few seconds.
With Provisioned Concurrency, your function’s initialization piece is up to you as the developer. Things like heavy frameworks or SSM requests can really slow your lambda. You have to structure your code to maximize what’s done in initialization and you have to know how many instances to provision. If you need more than what you provision, you will run into a cold start.
While Provisioned Concurrency required a significant effort, and can be quite expensive, the mitigation techniques reduced our cold start times a good deal.
Designing to Lessen the Effect
I’ve discussed the ways we reduced the severity of cold starts, but they have taken a lot of time and effort, and they haven’t fully eliminated the issue. That’s why the best methods come with decisions made upfront – before you start coding. If cold starts are a concern, I suggest considering the following before beginning development work.
1. Think hard before choosing Java or C#.
Favor Python or Node.js instead.
We wrote most of our code in Java, which was a big reason for cold start issues. For Java and C#, Lambda has to do a lot of work in initialization. It has to bootstrap a large VM and language runtime, as well as load classes into memory and initialize them. We had some heavy classes that took a long time to load.
Python and Node are interpreted languages with a light runtime. As such, the cold start doesn’t affect them nearly as much. We eventually switched from Java to Python and saw drastic improvements in cold start performance. We didn’t need to use Provisioned Currency to “warm” the Lambdas anymore.
2. Structure your functions intentionally.
Don’t have your Lambda invoke five other Lambdas, put a message on and poll an SQS queue, and read and write to a database. Each of these can take a lot of time and will make your cold start colder.
If you need to invoke another Lambda, you should do it asynchronously. If you read frequently from a database, consider using ElastiCache to improve performance. Also, minimize your deployment package size. We had some Java packages that were over 20MB in size, which caused functions to be extra slow.
You can read more about Lambda best practices here. I suggest planning to adhere to all of them. Without them your cold start will be a frustrating beast.
Using Lambda for the Front-end
Lambda is great for backend processes for all of the reasons I listed at the beginning of this article. It is simple to use and deploy, and it is highly scalable.
Apps without a UI in front of them are the perfect fit for the service. You get all of the benefits, and you hardly have to care about speed. Nevertheless, you don’t want your user sitting in front of a screen, watching a loader spin for 15 seconds – or even three seconds. You can reduce latency with some workarounds or upfront design decisions, as I’ve discussed, but it does take careful planning and may involve significant effort.
If you can’t be sure to adhere to best practices, then don’t use lambda for front-end requests. Or if you’re set on coding with Java, prepare for major cold start challenges.
Lambda itself doesn’t have to be slow, but there are a number of careful considerations needed to use it effectively, which are particularly important for UI-facing apps.
AWS releases new features constantly, and many of them have improved the longstanding issue of cold starts. Someday, cold starts will likely be a thing of the past, but for now, they’re still here to a noticeable and somewhat frustrating extent.
When looking at all requests – not just cold starts – serverless is reportedly a little slower than simply using Elastic Beanstalk. “Warming” techniques can be expensive and complicated.
If consistently low latency is a priority, then Lambda will present its fair share of challenges. If you do use Lambda, then know what to expect, and adhere to best practices.