Zero to Launch Ready with Performance Testing
By Sai Yerraguntla
How exactly do you prepare for a huge spike of new users from a partnership launch?
When we received news that we would be launching two major partnerships with Kaiser Permanente and American Express in just a few days, the first thought that crossed my mind was “HOW EXCITING! THIS IS SO HUGE!” The second thought - “Can we support it?!”
We had to find out quickly.
Our Performance Testing Tool
We introduced Locust, a performance testing tool, to load test our own systems to figure out if we would be able to support these massive launches and what we can scale and improve to better prepare. With Locust, we were able to generate a high load to find out exactly how many users we could currently support, how many requests per second our systems could handle, and latency statistics of our different API calls.
Using the data from Locust, our “API Scalability Task Force” (around five engineers from various engineering teams) set out to make sure we were able to handle such massive launches. Spoiler alert: we pulled it off. Bonus: it only took a few days to get Locust running to debug performance and validate our improvements.
Massive Wins
With the latency data of API calls we were making in the onboarding flows for these launches, we set out to optimize the endpoints being called. We used the approach of figuring out what’s going to give us the most bang for our buck - starting with a major refactor of endpoints that return content.
We made improvements to how we were caching, introduced PGBouncer to help with database scaling, as well as removed unnecessary calls to third party services. One of the most drastic improvements we saw was a reduction in our API call to fetch user information from 22,000 ms to 110 ms. Yup, a 200x improvement. Craziness. (Why was that API call so slow under load? It turned out an external API call had snuck in on certain edgecases.)
Additionally, having data from load testing helped us determine exactly how to scale up containers and database resources in anticipation of the incoming load. Anticipating new user volume, testing our changes, and scaling ahead of the incoming load allowed us to be well prepared for launch day.
Launch Day and Beyond
I am happy to say that with this project, we were able to make sure the launches went smoothly with zero issues while we saw major spikes in signups and new users.
As we continue to grow and scale, we are continuing to have latency and performance of our systems front of mind. As the next step, we have integrated load testing with Locust into Jenkins CI and will continue to use it as a tool to monitor performance of our systems.
We are hiring - Come join us!