{"data":{"allMarkdownRemark":{"edges":[{"node":{"id":"cd170267-3e38-5f47-8461-d6f6583c5bed","frontmatter":{"category":"Coding","title":"Our love story with Deadlocks (PostgreSQL Edition)","date":"2019-03-15","summary":"How to get rid of deadlocks","thumbnail":{"relativePath":"pages/event-store-deadlock/thumbnail.png","childImageSharp":{"resolutions":{"base64":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAVCAYAAABG1c6oAAAACXBIWXMAAAsSAAALEgHS3X78AAAFFklEQVQ4y5WUeUzTZxjHi+AWnTuYc1mAKaiAIgoI4tR5henqgSBEoOOmCINxKAIiRPFCpK304CyUUikUtIisHC0UhHoy8ZoaY2Lcodv8w3+WxcyMtu937/ubOLNliXuSN2/yvu/zeb/P8b483t8WvjlkjTopTmCMiQzrXBm4uIyu8T94d0ZGdERYf3Ji/JWIUL4hThB1QZiW8dvuvD0kJzv7aSh/Qy89F8p71QKW+Krr1VrUdF3A4RYzxB0jqNYPQ9Oqg+a0AeJTFhxoNqNCN4zx+49x+f4TCKU9yKruQ233JVTVNcBz7hwlB5s21SFd096FhMo+69xY0YR3gsTmFlVuDS9tmzDd+IGs3aO2zYsVWb3ixXbPeIltUXKlLb+uh6h6x8iCBDFxF1RYhXKjtUVvwEezZpbxYmN2XKnoOI/Z0cdsyzOqEJxRRXyTT6DZOE6iDmqJd7yILM+sQlC6nCz7UkEC0+RwjylHUUM/aeq/ShanVOLjqDL7oVYLkhLifuRlZ2f/mlXdD+YYlK7gDmzIb4SyZwzugnJQZfClawwUsFPGzUvp+HD7YdR0X4ZQpCdUDMlXGiEUpjzh5eTkPs/kgBUkiCpYmCjBF0d1OHP+Dg5qzJxTaHEzfJMrqUoF/IRSrN1VD/mZixi99RBKwxiZtf2wvVw3gpgdEdd5aanC7/aqBuAZJ7KzkGhekCruxND1B8iQnsUIdWofvgX/VBnohVSxCKKOUbQN3YS08wIkp0bJbIHI2thtQbD/okZe2OaN3dLTFtCEW4MzFPBmQEknDp00w21HGVbn1EHVdxUbC1TwS5WyiznVm4vUEIr1pETVTz7JVdlqmlpAayzgec1xKVGeNiIwo9YamCb7SyEFitpHOThTxPK5rUQDll9vmtNd1QbsVw8g8fgpUqTss2dW9WFXTtZTCpzFOmedRF6DrQfaaUFOEAohu2t6UKwygcFZqE3945xCBmSFYd3QbLpGMqVdEOkvWRXtJvj7eKom+/qt3OysR4UqM1Nj90mSEJZDRddFzNxWiq371FAbx7GEFoNVlxWGqf5K3g3T1fvEdPMR8gqLnlOO+8uXsio4oLHhrAU+KTIrKwztQ9I+cgdFzaM43m5Bfl0vPATHQfc4aGC6nKoXEcvdx7bbt29j2htOJorZREfQJHN9pbwa0cfO2OZEHSHLMmuJXPs1KS0uxIOH3xPd8C0SkqdkKrnG9t8pY00O7fC3RNs9aC8+Ino2ZLmIrVu2WHkAHDji6pXmjqFrWLe37Q/d0A20tWhsRaVHJ3bX9k60nrtrL240sgpTUBVYrwqO6IiFttQh7Qj6xx+i6WQrPOfPa2KsKS9UekVGRv6k1XcjYnv47/UNKlju/QJJSw8GBs3IrTawgnFhe9Nw02UGyGWyZ1s+/+zKquVBndQ/5tUPZxLqSkciHd5sDvBdoJk5480TwpSkewUNA1zRgmmOPePFthK1GSlJiZfpOYdJCI2Wx/Px8fkn9F+2v6S4a59qEF4JYittGcJ+nQrdOURHhvW+AE1539nZ8aWDq6srN0+fPt3Bzc3N0cnJycHFxcWxp880la0X5OW0HmgepK9JbF1Bf5758ZKJ+i4LVgQuVrB9Pp/vyHsdCwsPd+LmTSF7taZvsCBFYaXtQ2LFPVBqdOyp8dm+h4fH6wGdnZ0n8zM7v6Dw55PGMSj0o7bGlg6sX/Np/YtwHXj/x9575+3J21P9Fi1E8FI/pkzxX+f/BBhq9luuZDT4AAAAAElFTkSuQmCC","width":315,"height":325,"src":"/static/fff962ac6bfdb5c2156f04c8855b288b/b3029/thumbnail.png","srcSet":"/static/fff962ac6bfdb5c2156f04c8855b288b/b3029/thumbnail.png 1x,\n/static/fff962ac6bfdb5c2156f04c8855b288b/8d141/thumbnail.png 1.5x"}}},"authorName":"Core Services Team","authorDescription":"Core Services team is responsible for infrastructure and platform development at AUTO1","authorAvatar":null,"headerImage":{"relativePath":"pages/event-store-deadlock/header.png","childImageSharp":{"resolutions":{"base64":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAHCAYAAAAIy204AAAACXBIWXMAAAsSAAALEgHS3X78AAABoUlEQVQoz2MIjM+pDy/sfhxb3HkyNL/zXlbD9O+h2Y0XPD3c1RmAwM07gElKSooBhiUlxBlklDQYsAGDhIkMDP45Xf+t8hb9L5y66792/NT/wbUr/hTOOPDfN76whIEAMIjv5zRMmsJkmDiJH8QH0gwMftmdD21zZv8vmrT5l0v+7L/pXev/5XSv+hMUneLFzMCgr6ikrCApKckGdJ0QEAtKyynx8jEwcOiG1csZJk62B7rKDojVgIa5gm2JKWyf5V29+n9G19r/zkADg2pX/p+2bNtNDSVZM1ExcRNpaWkzoEFuQBwKxL5SUtJBEqLCrspmni7G6XMcDeInhAMNlDJMnJhpkDhRlCE6Jcc4pWrio/krNr4Pqlzw1zZz2r8J0+dddffwdAJZaGJqxg51HTcQqwANlBfhYmBTNvcWNEqdJQX0tiDQQFXDhEnyQFfKwoKDvTLJ28A9u/+jR/6M7xmemtIgwczMLCYGUkFxTSsjjB2bXbk3pahhE4i98/oXsLiysgoDcixLScswiAtwMQBdyGCUOpMR6EJQZDACXQnGAOs+imBvfQhTAAAAAElFTkSuQmCC","width":500,"height":168,"src":"/static/2d78359581a2f1861a243566e527e27a/7d852/header.png","srcSet":"/static/2d78359581a2f1861a243566e527e27a/7d852/header.png 1x"}}}},"html":"<h2>TL;DR</h2>\n<p>Always try to keep your transactions short and write your queries to deal with records in a deterministic order with respect to other transactions.</p>\n<h2>Full Story</h2>\n<p>Almost every non-trivial system needs some kind of task (job, event, whatever you call it) scheduling functionality at some point. Having more than 250 micro-services, ours was no exception.\nAt some point, it made sense to us to provide such functionality as a service for our developers. With <code class=\"language-text\">event-store</code> service, one simply can schedule a job to be executed at a specific time in future. There are already several ready to use tools for this specific purpose (e.g. BigBen, Quartz), however, because of several reasons (which are beyond the scope of this post) we decided to implement our own from scratch.</p>\n<p>Considering the subject of the post which is <code class=\"language-text\">Deadlocks</code>, we need a brief introduction on <code class=\"language-text\">event-store</code>'s implementation.</p>\n<p>An event on the server side is simply a record in the database. A simplified view of <code class=\"language-text\">events</code> table is as follow:</p>\n<table>\n<thead>\n<tr>\n<th>id</th>\n<th>event_type</th>\n<th>due_time</th>\n<th>start_process_time</th>\n<th>end_process_time</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>'5c55ea79-4774-49ae-96e2-975af7a41ce6'</td>\n<td>'user-cache-flush'</td>\n<td>'2019-01-01 00:00:00'</td>\n<td>null</td>\n<td>null</td>\n</tr>\n</tbody>\n</table>\n<p>The following is a bird's-eye view of what happens on the service:</p>\n<p><strong>Thread #1</strong><br>\na. The service periodically fetches a bunch of unprocessed events (those that are due and have no process time, neither start nor end) from the database (and marks them as <em>progressing</em> by filling the <code class=\"language-text\">start_process_time</code> column with the current timestamp). Let's call this <em>fetch-unprocessed</em> query.</p>\n<p>b. The service processes the events and marks them as <em>processed</em> (by filling the <code class=\"language-text\">end_process_time</code> column with the current timestamp). Let's call this <em>save-processed</em> query.</p>\n<p><strong>Thread #2</strong><br>\na. The service periodically checks the table and looks for stale events (those that are due and have <code class=\"language-text\">start_process_time</code> but no <code class=\"language-text\">end_process_time</code>). Let's call this <em>fetch-stale</em> query.</p>\n<p>b. The service marks them back as <em>unprocessed</em> (by setting <code class=\"language-text\">start_process_time</code> to <code class=\"language-text\">null</code>). Let's call this <em>release-stale</em> query.</p>\n<p>The very first MVP was a simple read-modify-write (anti-)pattern (with the help of Spring, JPA, and Hibernate). It is not hard to guess to what issue this implementation is vulnerable: Deadlocks.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">ERROR: deadlock detected\nDetail: Process 5234 waits for ShareLock on transaction 3465; blocked by process 467845.\n        Process 467845 waits for ShareLock on transaction 96575; blocked by process 5234.\nHint: See server log for query details.\nWhere: while updating tuple (14954,4) in relation \"events\"</code></pre></div>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">ERROR: deadlock detected\nDetail: Process 10438 waits for ExclusiveLock on tuple (14954,4) of relation 19118 of database 19113; blocked by process 31501.\n        Process 31501 waits for ShareLock on transaction 763124271; blocked by process 28450.\n        Process 28450 waits for ShareLock on transaction 763124277; blocked by process 28873.\n        Process 28873 waits for ExclusiveLock on tuple (14954,4) of relation 19118 of database 19113; blocked by process 10438.\nHint: See server log for query details.\nWhere: while locking tuple (6984,19) in relation \"events\"</code></pre></div>\n<p>In the beginning it was not a big deal, because deadlock errors were not frequent and we could live with it. However, very soon, after getting more clients, the deadlock issue (among other problems, e.g. lost-updates) started hurting the quality of the service (e.g. firing events twice).</p>\n<p>For the first attempt, we refactored the code to get rid of read-modify-write antipattern:</p>\n<div class=\"gatsby-highlight\" data-language=\"sql\"><pre class=\"language-sql\"><code class=\"language-sql\"><span class=\"token comment\">-- fetch-unprocessed query</span>\n<span class=\"token keyword\">UPDATE</span> events event_to_update\n<span class=\"token keyword\">SET</span> start_process_time <span class=\"token operator\">=</span> <span class=\"token function\">now</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span>\n<span class=\"token keyword\">FROM</span> <span class=\"token punctuation\">(</span>\n    <span class=\"token keyword\">SELECT</span> id\n    <span class=\"token keyword\">FROM</span> events      \n    <span class=\"token keyword\">WHERE</span> end_process_time <span class=\"token operator\">is</span> <span class=\"token boolean\">NULL</span>\n            <span class=\"token operator\">AND</span> start_process_time <span class=\"token operator\">IS</span> <span class=\"token boolean\">NULL</span>\n            <span class=\"token operator\">AND</span> <span class=\"token punctuation\">(</span>due_time <span class=\"token operator\">BETWEEN</span> ? <span class=\"token operator\">AND</span> ?<span class=\"token punctuation\">)</span>\n<span class=\"token punctuation\">)</span> matched_event_for_update\n<span class=\"token keyword\">WHERE</span> event_to_update<span class=\"token punctuation\">.</span>id <span class=\"token operator\">=</span> matched_event_for_update<span class=\"token punctuation\">.</span>id\n<span class=\"token keyword\">RETURNING</span> <span class=\"token operator\">*</span><span class=\"token punctuation\">;</span>\n\n<span class=\"token comment\">-- release-stale query (including fetch-stale query)</span>\n<span class=\"token keyword\">UPDATE</span> events event_to_release\n<span class=\"token keyword\">SET</span> start_process_time <span class=\"token operator\">=</span> <span class=\"token boolean\">NULL</span>\n<span class=\"token keyword\">FROM</span> <span class=\"token punctuation\">(</span>\n    <span class=\"token keyword\">SELECT</span> id\n    <span class=\"token keyword\">FROM</span> events\n    <span class=\"token keyword\">WHERE</span> end_process_time <span class=\"token operator\">IS</span> <span class=\"token boolean\">NULL</span> <span class=\"token operator\">AND</span> start_process_time <span class=\"token operator\">&lt;</span> ?\n<span class=\"token punctuation\">)</span> stuck_event\n<span class=\"token keyword\">WHERE</span> event_to_release<span class=\"token punctuation\">.</span>id <span class=\"token operator\">=</span> stuck_event<span class=\"token punctuation\">.</span>id\n<span class=\"token keyword\">RETURNING</span> <span class=\"token operator\">*</span><span class=\"token punctuation\">;</span></code></pre></div>\n<p>Obviously this is an improvement, though reduced the number of deadlocks, it didn't solve the issue completely.\nThe quickest solution to eliminate deadlocks was to change the isolation level to <code class=\"language-text\">Serializable</code>. However this is the last solution we wanted since it would hurt the performance and the quality of the service (think what would happen if one service start a transaction, fetch the events to process, but for some reason become stale).</p>\n<p>The better way to deal with deadlock is to analyze transactions and queries to prevent transaction interference by:</p>\n<ol>\n<li>making transactions short in terms of time</li>\n<li>writing queries to deal with records in a deterministic order</li>\n</ol>\n<p>By the first refactoring, we made transactions a bit shorter. But there's still some room for improvement. To apply the second best practice from the above we refactored the queries as follow (essentially ordering and acquiring a lock):</p>\n<div class=\"gatsby-highlight\" data-language=\"sql\"><pre class=\"language-sql\"><code class=\"language-sql\"><span class=\"token comment\">-- fetch-unprocessed query</span>\n<span class=\"token keyword\">UPDATE</span> events event_to_update\n<span class=\"token keyword\">SET</span> start_process_time <span class=\"token operator\">=</span> <span class=\"token function\">now</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span>\n<span class=\"token keyword\">FROM</span> <span class=\"token punctuation\">(</span>\n    <span class=\"token keyword\">SELECT</span> id\n    <span class=\"token keyword\">FROM</span> events      \n    <span class=\"token keyword\">WHERE</span> <span class=\"token punctuation\">.</span><span class=\"token punctuation\">.</span><span class=\"token punctuation\">.</span> <span class=\"token comment\">-- as before</span>\n    <span class=\"token keyword\">ORDER</span> <span class=\"token keyword\">BY</span> ID\n    <span class=\"token keyword\">FOR</span> <span class=\"token keyword\">NO</span> <span class=\"token keyword\">KEY</span> <span class=\"token keyword\">UPDATE</span>\n<span class=\"token punctuation\">)</span> matched_event_for_update\n<span class=\"token keyword\">WHERE</span> <span class=\"token punctuation\">.</span><span class=\"token punctuation\">.</span><span class=\"token punctuation\">.</span> <span class=\"token comment\">-- as before</span>\n<span class=\"token keyword\">RETURNING</span> <span class=\"token operator\">*</span><span class=\"token punctuation\">;</span>\n\n<span class=\"token comment\">-- release-stale query (including fetch-stale query)</span>\n<span class=\"token keyword\">UPDATE</span> events event_to_release\n<span class=\"token keyword\">SET</span> start_process_time <span class=\"token operator\">=</span> <span class=\"token boolean\">NULL</span>\n<span class=\"token keyword\">FROM</span> <span class=\"token punctuation\">(</span>\n    <span class=\"token keyword\">SELECT</span> id\n    <span class=\"token keyword\">FROM</span> events\n    <span class=\"token keyword\">WHERE</span> <span class=\"token punctuation\">.</span><span class=\"token punctuation\">.</span><span class=\"token punctuation\">.</span> <span class=\"token comment\">-- as before</span>\n    <span class=\"token keyword\">ORDER</span> <span class=\"token keyword\">BY</span> ID\n    <span class=\"token keyword\">FOR</span> <span class=\"token keyword\">NO</span> <span class=\"token keyword\">KEY</span> <span class=\"token keyword\">UPDATE</span>\n<span class=\"token punctuation\">)</span> stuck_event\n<span class=\"token keyword\">WHERE</span> <span class=\"token punctuation\">.</span><span class=\"token punctuation\">.</span><span class=\"token punctuation\">.</span> <span class=\"token comment\">-- as before</span>\n<span class=\"token keyword\">RETURNING</span> <span class=\"token operator\">*</span><span class=\"token punctuation\">;</span></code></pre></div>\n<p>However, the deadlocks were still appearing intermittently:</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">ERROR: deadlock detected\nDetail: Process 7534 waits for ShareLock on transaction 93897000; blocked by process 13376.\n        Process 13376 waits for ShareLock on transaction 93896994; blocked by process 7534.\nHint: See server log for query details.\nWhere: while rechecking updated tuple (134131,3) in relation \"events\"</code></pre></div>\n<p>Of course! we still had one transaction/query dealing with records in a non deterministic way: <code class=\"language-text\">JpaRepository.save(Iterable&lt;S> entities)</code>. In the code, after fetching events to be processed, we flag (<code class=\"language-text\">UPDATE</code>) the process events as <code class=\"language-text\">processed</code>, and also save (<code class=\"language-text\">INSERT</code>) new events spawned by the processed events. Here's the catch: the event processing is done in <em>parallel</em>. So the resulting list of events (both processed and newly created ones) is not ordered with respect to other transactions. So the third refactoring was to restore the order of processed events:</p>\n<div class=\"gatsby-highlight\" data-language=\"java\"><pre class=\"language-java\"><code class=\"language-java\"><span class=\"token comment\">/**\n * To process events.\n * The result list will be ordered with respect to the given list.\n * All new events will be added at the end of the result list.\n * @param events events to be processed\n * @return list of processed and created events\n */</span>\n<span class=\"token keyword\">private</span> <span class=\"token class-name\">List</span><span class=\"token generics\"><span class=\"token punctuation\">&lt;</span><span class=\"token class-name\">Event</span><span class=\"token punctuation\">></span></span> <span class=\"token function\">doProcessEvents</span><span class=\"token punctuation\">(</span><span class=\"token class-name\">List</span><span class=\"token generics\"><span class=\"token punctuation\">&lt;</span><span class=\"token class-name\">Event</span><span class=\"token punctuation\">></span></span> events<span class=\"token punctuation\">)</span> <span class=\"token punctuation\">{</span>\n    <span class=\"token class-name\">ProcessingResult</span> result <span class=\"token operator\">=</span> eventProcessor<span class=\"token punctuation\">.</span><span class=\"token function\">processInParallel</span><span class=\"token punctuation\">(</span>events<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n\n    <span class=\"token class-name\">List</span><span class=\"token generics\"><span class=\"token punctuation\">&lt;</span><span class=\"token class-name\">Event</span><span class=\"token punctuation\">></span></span> successfulEvents <span class=\"token operator\">=</span> result<span class=\"token punctuation\">.</span><span class=\"token function\">getSucceeded</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span> <span class=\"token comment\">// may contain new events;</span>\n    <span class=\"token class-name\">List</span><span class=\"token generics\"><span class=\"token punctuation\">&lt;</span><span class=\"token class-name\">Event</span><span class=\"token punctuation\">></span></span> failedEvents <span class=\"token operator\">=</span> result<span class=\"token punctuation\">.</span><span class=\"token function\">getFailed</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n    <span class=\"token class-name\">List</span><span class=\"token generics\"><span class=\"token punctuation\">&lt;</span><span class=\"token class-name\">Event</span><span class=\"token punctuation\">></span></span> allEvents <span class=\"token operator\">=</span> <span class=\"token class-name\">ListUtils</span><span class=\"token punctuation\">.</span><span class=\"token function\">union</span><span class=\"token punctuation\">(</span>successfulEvents<span class=\"token punctuation\">,</span> failedEvents<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n\n    <span class=\"token class-name\">Map</span><span class=\"token generics\"><span class=\"token punctuation\">&lt;</span>UUID<span class=\"token punctuation\">,</span> <span class=\"token class-name\">Integer</span><span class=\"token punctuation\">></span></span> eventIndices <span class=\"token operator\">=</span> <span class=\"token function\">range</span><span class=\"token punctuation\">(</span><span class=\"token number\">0</span><span class=\"token punctuation\">,</span> events<span class=\"token punctuation\">.</span><span class=\"token function\">size</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">.</span><span class=\"token function\">boxed</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span>\n            <span class=\"token punctuation\">.</span><span class=\"token function\">collect</span><span class=\"token punctuation\">(</span><span class=\"token class-name\">CollectorUtils</span><span class=\"token punctuation\">.</span><span class=\"token function\">toMap</span><span class=\"token punctuation\">(</span>i <span class=\"token operator\">-></span> events<span class=\"token punctuation\">.</span><span class=\"token function\">get</span><span class=\"token punctuation\">(</span>i<span class=\"token punctuation\">)</span><span class=\"token punctuation\">.</span><span class=\"token function\">getId</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> i <span class=\"token operator\">-></span> i<span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n\n    <span class=\"token comment\">// This should sort according to the original order and put all new events at the end.</span>\n    allEvents<span class=\"token punctuation\">.</span><span class=\"token function\">sort</span><span class=\"token punctuation\">(</span><span class=\"token function\">comparingInt</span><span class=\"token punctuation\">(</span>event <span class=\"token operator\">-></span> eventIndices<span class=\"token punctuation\">.</span><span class=\"token function\">getOrDefault</span><span class=\"token punctuation\">(</span>event<span class=\"token punctuation\">.</span><span class=\"token function\">getId</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> <span class=\"token class-name\">Integer</span><span class=\"token punctuation\">.</span>MAX_VALUE<span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n\n    <span class=\"token keyword\">return</span> allEvents<span class=\"token punctuation\">;</span>\n<span class=\"token punctuation\">}</span></code></pre></div>\n<p>After this last change, we saw the game for deadlocks was over: <em>\"Goodbye Deadlock!\"</em></p>","fields":{"slug":"/event-store-deadlock/","tags":["deadlock","sql","transaction","postgresql"]}}}]}},"pageContext":{"slug":"/tags/deadlock","tag":"deadlock","categories":["Architecture","Coding","DevOps","Engineering","ProjectManagement","QA","Social","TechRadar"]}}