100x scaling with Gevent

Ipinder Singh
4 min readNov 26, 2022

A case study on how we were able to scale an insurance aggregator framework up to 100 times.

Photo by Yiorgos Ntrahas on Unsplash

Insurance is a tricky field, especially motor insurance, where the quotation for a particular vehicle may change frequently. On a very high level, when a user searches for a quote, we need to hit all insurers’ APIs and fetch the quotes from there, store them in the DB and then show those on theUI.

For example — If there are 15 insurers, then for a single quote request, we need to make 15 API calls. Traditionally, there would be a queuing system in place to distribute the load among workers, such that each worker only works for a single insurer. These are python celery workers.

The worker picks up the task from the queue, hits the insurer’s API, and then saves the result to a database. If we break down each worker process, the following steps take place on each worker for a task:

Tasks in a worker
  • Pre-processing/massaging of data to insurer format
  • Calling insurer’s API
  • Wait for response
  • Store the result

The bulk of the time is consumed in API calls to the insurer. There are some insurers that reply quickly and there are some which take time. Apart from the I/O calls, there is not much CPU work happening over here.

In a synchronous worker mode (prefetch pool), the number of workers on a machine is usually n+1 the number of cores on the machine. For example — in an 8 core machine, we would have 9 workers. This means that we would be able to handle 9 concurrent requests. Even if we double the number of workers, which would further put a strain on the CPU for context switching between processes, we would still be able to handle only 18 concurrent requests.

This is not enough even for 2 concurrent quote requests!

In order to serve more concurrent requests, we would need to increase the number of workers, which would be a costly operation.

There is another way out, spawning threads is a much more efficient way to increase capacity for I/O-bound tasks. We can either use gevent or eventlet execution pools. In these execution pools, threads run in the same process as the Celery worker itself. To be precise, both eventlet and gevent use greenlets and not threads.

Greenlets are threads, which are not managed by the kernel and live in the application space. Greenlets voluntarily or explicitly give up control to one another at specified points in code. These excel at running a huge number of non-blocking tasks. Our application can now schedule things much more efficiently.

Multiple concurrent tasks (denoted by different colored arrows) can be executed using gevents.

For us, the benefit of using a gevent or eventlet pool is that our Celery worker can do more work than it could before. This means we do not need as much RAM or CPU to scale up. This optimizes the utilization of our workers.

We can now run a celery worker with 100 concurrency, which can handle 100 concurrent requests easily. Typically, while using a gevent worker pool, we keep the concurrency to a high number so that we can handle scale!

There are some things to take care of as well —

  • If the concurrency is set too high, then the process keeps on picking the next task from the queue, without getting to finish the previous task. This would cause the program to appear as if it is stuck or non-responsive. The correct value of concurrency can be achieved with some trial and error.
  • Even after using gevent, it might seem that we are not getting the expected scale. If this is the scenario, please check other places which could be bottlenecks. In our case, our database also needed to be scaled to support high loads.
  • There are some Python libraries that can’t be monkey-patched using gevent, like mysql-python. The I/O calls from these libraries would remain synchronous. The alternative here is to use those packages which leverage python only code and not rely on underlying C code.
  • Running multiple workers with lower concurrency instead of a single with huge concurrency might work out better in some cases.

Useful links:

--

--

Ipinder Singh

A Software Engineer, who loves to read. Starting my writing, still early-game!