Project

General

Profile

Actions

Bug #10133

closed

Massive db deadlocks in postgres from hosts_counter updates with counter_cache_fix.rb

Added by Chuck Schweizer almost 9 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Database
Target version:
Difficulty:
Triaged:
Fixed in Releases:
Found in Releases:

Description

https://gist.github.com/csschwe/4cc4d9be58e1cb96ec6c

After updating the Foreman 1.8 rc3 I am seeing a massive amount of DB Deadlocks. This issue was not present in 1.7


Related issues 5 (0 open5 closed)

Related to Foreman - Bug #5692: Puppet environment counters not updatedClosedTomer Brisker05/13/2014Actions
Related to Foreman - Bug #12241: Counter cache update didn't pick up changes from after_commit callbackClosedTomer Brisker10/21/2015Actions
Related to Foreman - Bug #7246: Remove counter workaround for #5692 on upgrade to rails 4.xRejected08/25/2014Actions
Has duplicate Chef - Bug #11232: Occassional error in tasks when importing facts from foreman-chefDuplicateTomer Brisker07/28/2015Actions
Has duplicate Foreman - Bug #5990: multiple calls to create or update domain throws deadlock errorDuplicate05/29/2014Actions
Actions #1

Updated by Chuck Schweizer almost 9 years ago

This is in a 40K node environment.

Actions #3

Updated by Tomer Brisker almost 9 years ago

  • Related to Bug #5692: Puppet environment counters not updated added
Actions #4

Updated by Tomer Brisker almost 9 years ago

  • Category set to Database

Which PostgreSQL version are you using?
This sounds like it might be related to a problem that was fixed in 9.3: http://mina.naguib.ca/blog/2010/11/22/postgresql-foreign-key-deadlocks.html

Actions #5

Updated by Lukas Zapletal almost 9 years ago

There are three users in the comments complaining that 9.3 version is even worse and it was not fixed for them :-(

Alvaro Herrera describes the solution in introducing new keyword SELECT ... FOR KEY. That would mean you need both new PostgreSQL 9.3 and newer Rails which takes advantage of that approach? Or some change in Foreman would be required I assume.

Actions #6

Updated by Tomer Brisker almost 9 years ago

This specific deadlock should be prevented when we upgrade to Rails 4, as it is caused by a workaround for a bug in cached counters that existed only in Rails 3

Actions #7

Updated by Lukas Zapletal almost 9 years ago

Oh I see. Maybe to make this workaround optional so users with heavy load can turn it off?

Actions #8

Updated by Tomer Brisker almost 9 years ago

  • Status changed from New to Assigned
  • Assignee set to Tomer Brisker

The counter_cache fix was already in 1.7, so I'm trying to understand what caused this.
Chuck, what operation causes the deadlocks? Did you upgrade anything other then the foreman?

Actions #9

Updated by Tomer Brisker almost 9 years ago

Digging into the log it would seem the deadlock is caused by a race between the counter_cache_fix and rails' original update_counters trying to update the same counter at the same time. Will continue investigating.

Actions #10

Updated by Chuck Schweizer almost 9 years ago

My environment is a fully updated RHEL 6 install using the foreman installer.

foreman 1.8 rc3
postgres 8.4

The foreman server is only setup to receive reports and facts from the puppet masters, it is not acting as a puppet server or external node configurator.

From what I can tell the uploading of reports and facts from the 40K nodes, through the puppet masters, is causing the deadlocks. Commenting out the logic that updates the DB in counter_cache_fix.rb made the deadlocks stop.

The only thing that was change going from foreman 1.7.1 to 1.8 rc3 was foreman. Nothing else on the system was updated or changed. The foreman install was run after installing the 1.8 rc3 rpms.

Actions #11

Updated by The Foreman Bot almost 9 years ago

  • Status changed from Assigned to Ready For Testing
  • Pull request https://github.com/theforeman/foreman/pull/2362 added
  • Pull request deleted ()
Actions #12

Updated by Daniel Lobato Garcia almost 9 years ago

Hi Chuck,

Tomer has prepared a proposed fix for this issue - https://github.com/theforeman/foreman/pull/2362
Could you report if it works for your case?

Thanks!

Actions #13

Updated by Chuck Schweizer almost 9 years ago

After reducing the number of puppet masters in my environment I have been unable to reproduce the issue.

Daniel Lobato Garcia wrote:

Hi Chuck,

Tomer has prepared a proposed fix for this issue - https://github.com/theforeman/foreman/pull/2362
Could you report if it works for your case?

Thanks!

Actions #14

Updated by Dominic Cleal almost 9 years ago

  • Status changed from Ready For Testing to New
  • Assignee deleted (Tomer Brisker)

If anybody reproduces this, we'll retry the patch.

Actions #15

Updated by Tomer Brisker over 8 years ago

  • Has duplicate Bug #11232: Occassional error in tasks when importing facts from foreman-chef added
Actions #16

Updated by The Foreman Bot over 8 years ago

  • Status changed from New to Ready For Testing
Actions #17

Updated by Marek Hulán over 8 years ago

  • Assignee set to Tomer Brisker
  • translation missing: en.field_release set to 72
Actions #18

Updated by Anonymous over 8 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100
Actions #19

Updated by Tomer Brisker over 8 years ago

  • Has duplicate Bug #5990: multiple calls to create or update domain throws deadlock error added
Actions #20

Updated by Tomer Brisker over 8 years ago

  • Related to Bug #12241: Counter cache update didn't pick up changes from after_commit callback added
Actions #21

Updated by Tomer Brisker almost 8 years ago

  • Related to Bug #7246: Remove counter workaround for #5692 on upgrade to rails 4.x added
Actions

Also available in: Atom PDF