Project

General

Profile

Actions

Bug #15675

closed

reports:expire is slow and affects the performance of the application

Added by Nacho Barrientos almost 8 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Target version:
Difficulty:
Triaged:
Fixed in Releases:
Found in Releases:

Description

Hi,

We're currently running Foreman 1.11.2 and it's impossible for us to expire Puppet reports using the standard way to do it (via the Rake task 'reports:expire') as the SQL queries that it issues are slow and generate locks in the reports table. This situation slows down our whole Puppet infrastructure as the masters start to wait too much for Foreman to process reports that they don't have time to digest new requests. This increments the backlog, increasing the catalog compilation time and the latency of basically all the requests that the agents perform.

We have a query killer in place in the database (a procedure, basically) that kills queries longer than 5 minutes, therefore every night the rake task is killed and no reports are cleaned-up. We do this essentially to protect our Puppet infra.

The query itself that's killed is the following:

DELETE FROM `logs` WHERE `logs`.`id` IN (SELECT id FROM (SELECT `logs`.`id` FROM `logs` INNER JOIN `reports` ON `reports`.`id` = `logs`.`report_id` WHERE `logs`.`report_id` IN (SELECT `reports`.`id` FROM `reports`  WHERE `reports`.`type` IN ('ConfigReport') AND (reports.created_at < '2016-07-04 03:30:08'))  ORDER BY logs.id) __active_record_temp)

which I guess comes from:

    Log.joins(:report).where(:report_id => where(cond)).delete_all (app/models/report.rb:L82)

Our current workaround is the following:

1) Disable the query killer
2) Initiate a transaction and use a simplified query:
2.1) START TRANSACTION
2.2) DELETE FROM `logs` WHERE `logs`.`report_id` IN (SELECT `reports`.`id` FROM `reports` WHERE `reports`.`type` IN ('ConfigReport') AND (reports.created_at < '2016-07-04 03:30:08'));
2.3) COMMIT the transaction (this does not generate any lock until the transaction can be committed and the performance is not degraded during the 'preparing' stage)
3) Now that all the logs of expired exports are deleted, execute again reports:expire so it has less work to do. This normally succeeds and creates no impact.
4) Enable the query killer

With this strategy we can kind of work around the problem but, can you think of any way to make it more efficient (and less harmless) to the application to expire reports using the Rake task?

Another option that we have is to declare a downtime every time that we expire reports but we'd rather not to go for this :)

The rake task is triggered by a nightly cron in our case:

30 5 * * * (/usr/sbin/foreman-rake reports:expire ) >> /var/log/foreman/reports-expire.log 2>&1

We're adding ~10 million entries to 'logs' per day and 340k to 'reports'. We're currently unable to expire 24 hours worth of reports without putting the infrastructure at risk.

Thanks for your help and time!

Actions

Also available in: Atom PDF