- Elixirator

Published by Vitalii Kozitskiy • Jun 30, 2017

In our project, we are using Sidekiq as a background processing and PostgreSQL as a database. Our scheduler worker runs some task which sends emails for users and repeats each 15 seconds. Sometimes, in a case when we have many big tasks, that spend more than 15 seconds there are some conflicts between different threads in the Sequel. READMORE

Imagine that our sidekiq is busy for 5 minutes, during this time 5*4=20 tasks of mails were created. At some moment 5 big tasks got finished and we have empty 5 threads concurrency. This means at that moment we started 5 mails tasks. In our code of worker, we have some select from the database. And after we work with that data and run personal worker for each mail. But PostgreSQL allows to use many selects in one moment and lock our table when we change some data. Imagine, that we have one mail which is ready to deliver. So we got 1 mail in one moment 5 times, for 5 parallel threads concurrent and are going working with them. Certainly, we update our data but we do this after select, and, actually, we update one entry 5 times. As a result, we got a mail from the database and create 5 the same workers for them. After that we started running personal worker for mail, if it was successful we’ll change the state of mail. But usually each mail sending time is nearly 500 milliseconds, and this is too much time for the server. So in one moment, we have in order 5 same mails with same data that are pending to be delivered. As a result, we will send 5 times one mail and our user will get 5 clone-mails.

Fix with unique jobs:

Firstly we thought: ‘Why do we create so many workers?”. We can create only one worker and wait until it will be finished. This is a good solution and we are using this way to work. After some time we’ve found some gem that does that instead of us. It's called sidekiq-unique-jobs. That's a good gem which in sidekiq_options has different attributes and checks the unique index of the worker where this attribute was added. Actually, we are using:

sidekiq_options unique: :until_executed

This is a normal worker and can be tested but imagine that we have many servers and each one of them runs mails schedule. And we don’t have the same workers at the same moment of time on the one server but we can have the same worker on other server and do the same actions concurrently. So the problem with reading the data is remaining. And we’ve started looking for a decision to fix this trouble by PostgreSQL again.

Lock in PostgreSQL

In PostgreSQL, we have some great method called “lock”. So what this method is doing? This method locks all table for actions until they would be finished. The basic syntax looks like:

LOCK TABLE mails IN EXCLUSIVE MODE;

But, I must emphasize, it does not enough. Because PostgreSQL locks the table only in some transaction. So every lock must be in the transaction, and we can write the next base:

BEGIN WORK;
  LOCK TABLE mails IN EXCLUSIVE MODE;
  ***doing some actions with mails***
COMMIT WORK;

When we give an attention to method lock we can notice some mode. What does it mean? Actually, it’s our limitation. Because PostgreSQL has 8 different modes such as: ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE. And each of them locks the table for some actions. For example, the ACCESS SHARE command acquires a lock of this mode on referenced tables. In general, any query that only reads a table and does not modify it will acquire this lock mode. EXCLUSIVE allows only concurrent ACCESS SHARE locks, i.e., it only reads from the table that can proceed in parallel with a transaction holding this lock mode. So you can choose the mode which is better for you. When we speak about locking we must remember about pg_locks. It’s a special view in PostgreSQL which shows what action is locking at the moment. And when you use your background worker which saves data of scheduling in PostgreSQL you can use this view to synchronize worker running. This view has many functions which can get different information about the order at some moment. But this functions only par information about some special lock type with a relation or something else which is included in that view. Our sidekiq works with redis and this is the main reason why we cannot use that functionality for synchronization.

So we have described the good opportunity for the locking, but what about the downside. I can show the code in ruby using Sequel:

DB.transactions do
  Models::Mail.lock(“EXCLUSIVE”) { sleep(120) }
end

Imagine that we are doing some actions for 120 seconds. And it doesn’t matter how many different parallel workers run concurrently. They will wait in line. And only one process will work with data in the mails table. Each user who creates new email has to wait for the finish of this worker. That's because we blocked all actions with this table for 120 seconds. This is the bad side of the locking. Certainly, you can choose a different mode that only creates or updates data. And it's better to take the mode which has less locking for action that is used by other processes.

Have we fixed this problem?

In our code, we have some select with the limit method and after we update some data in the table. This is why we use two queries. Yes! Two queries it’s a bad solution, but in many times this works quicker rather than one query with many joins and distinct. Because when you have millions of data in some tables and to join them with similar data it takes more time rather than when you use 2 queries. And sometimes, this solution optimizes your code tenfold. But this is a topic for another article. So in our code, we returned some id after actions and then we create new jobs. We have decided to create one big query to update all data which includes another query. Because it locks our table for less time.

Conclusion

You can use PostgreSQL as synchronizer between the concurrent jobs. If you are using many queries in background worker, you must be sure that your worker is unique. In other way lock the table or create one big query to be ensured that your jobs are working with correct data.

Like the article? Spread the world