Why my tests are slower when I run more parallel CI nodes (jobs)?
Let's say you run 10 parallel jobs (parallel CI nodes) on your CI server.
Your slowest test file
spec/my_slow_spec.rb takes 2 minutes to execute on the CI server.
You decided to run more parallel nodes and you changed configuration on your CI server from 10 parallel jobs to 20 parallel jobs and now you noticed that the slowest test file takes now more than 2 minutes. Why this could happen? There could be a few reasons.
low performance of CI server
If you run all parallel jobs (parallel CI nodes) on a single powerful server then adding more parallel jobs would at some point overload your CI server resources (CPU, RAM, disk). Simply speaking adding more parallel tasks to your CI server would kill its performance.
Some of CI providers would run all parallel jobs on a single machine. For instance, Jenkins could run all parallel steps on a single machine unless you configured it to spin off parallel steps on AWS in separate containers.
tests written in a way that prevents parallelism
You need to verify if your tests would not slow down if you run them in parallel. Let's say your tests connect to some shared resource that's shared across parallel CI nodes. The more parallel tests you run the more likely they all may try to connect with the shared resource and they would wait on each other.
For instance, you may have an elastic search database that's shared across parallel jobs. Or maybe your tests are connecting to external API (for instance you have tests using Stripe API for payment and more parallel tests try to access the same sandbox in Stripe).
how to verify if our CI nodes slowed down
You can verify if the tests are running slower when you have more parallel CI nodes (to do it go to user dashboard in Knapsack Pro to see how much time the slowest test files took for each CI build with different parallel CI nodes settings). If the test files take much more time than on average then you have CI performance problem.
Take into account that some of your slowest test files may have naturally varied in time execution because they use a browser (E2E tests) and sometimes take more or less time. But if you see a really big difference from what's normally you would get then something is wrong with CI machine performance (overloaded CI machine).
verify if boot of CI node is fast enough
Another thing to verify is to check steps in your CI build if they got slower. For instance, you have steps to install npm packages or create DB and install some Ruby gems. Check how much time they took.
limited number of parallel CI nodes
You can also verify if you have an available pool of parallel CI nodes. Some CI providers have plans with a fixed number of parallel nodes. If you try to run more parallel jobs then you paid for then your CI build will start with part of parallel CI nodes instead of all of them at the same time. This could slow down your CI build time.