Project

General

Profile

Actions

Feature #22876

closed

Change message digest from SHA1/VARCHAR(40) to XXHASH/BIGINT

Added by Lukas Zapletal about 6 years ago. Updated almost 6 years ago.

Status:
Rejected
Priority:
Normal
Category:
Performance
Target version:
-
Difficulty:
Triaged:
No
Fixed in Releases:
Found in Releases:

Description

Using 120bits SHA1 for detecting dupes is ultra-overkill. Simple 64bit number with hash function like CRC64 (https://github.com/postmodern/digest-crc) can do the trick, there will be no collisions for tens or even hundreds of millions of strings imported. But index on a number on 64bit system is much faster than index on VARCHAR, also this will save a lot of memory/space on the SQL server.

This would need:

  • changing the digest from string to int64
  • rehashing all entries
  • code changes
  • benchmark to verify it performs better (I will setup production instance with real data to get real numbers)

Related issues 1 (0 open1 closed)

Related to Foreman - Refactor #22875: Limit digest fields to 40 charactersClosedMarek Hulán03/13/2018Actions
Actions

Also available in: Atom PDF