Project

General

Profile

Actions

Feature #22876

closed

Change message digest from SHA1/VARCHAR(40) to XXHASH/BIGINT

Added by Lukas Zapletal about 6 years ago. Updated almost 6 years ago.

Status:
Rejected
Priority:
Normal
Category:
Performance
Target version:
-
Difficulty:
Triaged:
No
Fixed in Releases:
Found in Releases:

Description

Using 120bits SHA1 for detecting dupes is ultra-overkill. Simple 64bit number with hash function like CRC64 (https://github.com/postmodern/digest-crc) can do the trick, there will be no collisions for tens or even hundreds of millions of strings imported. But index on a number on 64bit system is much faster than index on VARCHAR, also this will save a lot of memory/space on the SQL server.

This would need:

  • changing the digest from string to int64
  • rehashing all entries
  • code changes
  • benchmark to verify it performs better (I will setup production instance with real data to get real numbers)

Related issues 1 (0 open1 closed)

Related to Foreman - Refactor #22875: Limit digest fields to 40 charactersClosedMarek Hulán03/13/2018Actions
Actions #1

Updated by Lukas Zapletal about 6 years ago

Actions #2

Updated by The Foreman Bot about 6 years ago

  • Status changed from New to Ready For Testing
  • Assignee set to Lukas Zapletal
  • Pull request https://github.com/theforeman/foreman/pull/5319 added
Actions #3

Updated by Lukas Zapletal about 6 years ago

  • Subject changed from Change message digest from SHA1/VARCHAR(40) to CRC64/BIGINT to Change message digest from SHA1/VARCHAR(40) to XXHASH/BIGINT

Changed from CRC64 to XXHASH which serves the same purpose, but it's faster and there are many more Ruby implementation than CRC64 (I only found pure Ruby one which was super slow).

Actions #4

Updated by Lukas Zapletal almost 6 years ago

  • Status changed from Ready For Testing to Rejected
  • Triaged set to No

Unable to prove it improves the performance.

Actions

Also available in: Atom PDF