API calls optimization report

A preliminary profiling (see appendix) confirmed most of the time was being spend on API calls, with other important time consuming tasks being run only once in an execution (libraries importing, excel data exporting, etc).

The following measures were tested:

Local API without authentication

The profiler revealed SSL sockets to be more expensive than regular sockets. As the API was already being consumed via localhost, the authentication was removed in order to use the latter.

Sessions

Python requests module supports HTTP sessions, which reuse the sockets removing the overhead of opening and closing for each request. A direct comparison with the current approach shows a significant improvement:

Querying multiple points at once

For this experiment, a set of 1662 points known to be valid were used, different chunk sizes were tried, logarithmically 1, 6, 40, 260 and 1662.

In the end, there was no significant improvement from this approach. Even when the medium sized chunks performed slightly better, the trend seems linear enough.

Multi-process

When running these experiments, it was noticed that running tests in parallel affected the performance of the requests, thus all the tests were run sequentially. In order to gain performance with a multi-process approach it might be wise to increase the resources of the API server before testing (e.g. increasing the number of workers given that local requests bypass the Nginx proxy).