Java

해시코드

ta_chan 2024. 1. 13. 21:38

해시코드란, 객체의 고유한 정수값이다. 

Object.hashCode는 객체에 대한 해시값을 정수로 반환한다.
이 해시코드는 상태가 같더라도 인스턴스마다 고유한 값을 가진다

static void hashCodeTest(){  
    TmpObject object1 = new TmpObject(1);  
    TmpObject object2 = new TmpObject(1);  
    System.out.println("object1 = " + object1.hashCode());  
    System.out.println("object2 = " + object2.hashCode());  
}  
  
static class TmpObject{  
    int num;  
  
    public TmpObject(int num) {  
        this.num = num;  
    }  
}

---
object1 = 793589513
object2 = 824909230


객체의 equals,hashcode를 오버라이드하면 상태가 같은 객체는 같은 해시값을 반환하게 만들 수 있다.

static class TmpObject{  
    int num;  
  
    public TmpObject(int num) {  
        this.num = num;  
    }  
  
    @Override  
    public boolean equals(Object o) {  
        if (this == o) {  
            return true;  
        }  
        if (o == null || getClass() != o.getClass()) {  
            return false;  
        }  
        TmpObject tmpObject = (TmpObject) o;  
        return num == tmpObject.num;  
    }  
  
    @Override  
    public int hashCode() {  
        return Objects.hash(num);  
    }  
}


---
object1 = 32
object2 = 32

 


이와는 별개로 System.identityHashCode를 사용하면 
오버라이딩 이전의 원본 해시코드값을 알아낼 수 있다.

System

/**  
 * Returns the same hash code for the given object as * would be returned by the default method hashCode(), * whether or not the given object's class overrides * hashCode(). * The hash code for the null reference is zero. * * @param x object for which the hashCode is to be calculated  
 * @return  the hashCode  
 * @since   1.1  
 * @see Object#hashCode  
 * @see java.util.Objects#hashCode(Object)  
 */@IntrinsicCandidate  
public static native int identityHashCode(Object x);
```


```java
TmpObject object1 = new TmpObject(1);  
TmpObject object2 = new TmpObject(1);  
System.out.println("object1 = " + object1.hashCode());  
System.out.println("object2 = " + object2.hashCode());  
int OriginalObject1Hash = System.identityHashCode(object1);  
int OriginalObject2Hash = System.identityHashCode(object2);  
System.out.println("OriginalObject1Hash = " + OriginalObject1Hash);  
System.out.println("OriginalObject2Hash = " + OriginalObject2Hash);

---
object1 = 32
object2 = 32
OriginalObject1Hash = 664223387
OriginalObject2Hash = 824909230


hashCode의 알고리즘에 대해서 궁금해서 찾아보니 
난수, 메모리 기반 생성등 다양한  주장들이 있었는데, 
이것에 대해서 명확하게 하고자 알아보았다.

Object

@IntrinsicCandidate  
public native int hashCode();


native가 선언되어있어 IDE에서는 코드를 확인할 수 는 없었지만, github에서 찾아 볼 수 있었다.

 

다음은 해시코드에 대한 C++ 코드이다.
HashCode의 값에 따라서 java의 Object.hashCode의 해시코드 생성 전략이 달라진다.

hashCode == 0 : 난수 반환
hashCode == 1 : 객체의 메모리주소에 추가연산 후 반환
hashCode == 2 : 상수 1을 반환
hashCode == 3 : 전역 카운터를 사용하여 반환
hashCode == 4 : 메모리 주소 직접 반환
hashCode == else : 스레드-로컬 난수값 반환 

// hashCode() generation :
//
// Possibilities:
// * MD5Digest of {obj,stw_random}
// * CRC32 of {obj,stw_random} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
//   2654435761 = 2^32 * Phi (golden ratio)
//   HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stw_random ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stw_random) is appealing, but can result
//   in undesirable regularity in the hashCode values of adjacent objects
//   (objects allocated back-to-back, in particular).  This could potentially
//   result in hashtable collisions and reduced hashtable efficiency.
//   There are simple ways to "diffuse" the middle address bits over the
//   generated hashCode values:

static inline intptr_t get_next_hash(Thread* current, oop obj) {
  intptr_t value = 0;
  if (hashCode == 0) {
    // This form uses global Park-Miller RNG.
    // On MP system we'll have lots of RW access to a global, so the
    // mechanism induces lots of coherency traffic.
    value = os::random();
  } else if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations.  This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
    value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
  } else if (hashCode == 2) {
    value = 1;            // for sensitivity testing
  } else if (hashCode == 3) {
    value = ++GVars.hc_sequence;
  } else if (hashCode == 4) {
    value = cast_from_oop<intptr_t>(obj);
  } else {
    // Marsaglia's xor-shift scheme with thread-specific state
    // This is probably the best overall implementation -- we'll
    // likely make this the default in future releases.
    unsigned t = current->_hashStateX;
    t ^= (t << 11);
    current->_hashStateX = current->_hashStateY;
    current->_hashStateY = current->_hashStateZ;
    current->_hashStateZ = current->_hashStateW;
    unsigned v = current->_hashStateW;
    v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
    current->_hashStateW = v;
    value = v;
  }

  value &= markWord::hash_mask;
  if (value == 0) value = 0xBAD;
  assert(value != markWord::no_hash, "invariant");
  return value;
}

 

HashCode값은 jdk8 이하 버전에서는 확인이 가능하나, 이후 버전부터는 확인이 불가능 한 것으로 보인다.

 

하지만 else 구문 주석에서 볼 수 있듯이, 아마 else부분이 디폴트 옵션으로 동작하지 않을까 싶다.

jdk8 이하에서는 jdk 버전/bin 에 들어가 
실제 HashCode 값을 확인 할 수 있다.
다음을 명령을 실행한다.

java -XX:+PrintFlagsFinal -version


다음과 같이 HashCode값이 5라는 것을 확인 할 수 있다.



해시코드 로직에 대한 답이 다양해서 이해가 잘 안되었는데
메모리주소, 난수 모두 가능하며, 꽤 오래전부터 해시코드는 난수로 생성한다는것을 알 수 있었다.



참고자료

https://shipilev.net/jvm/anatomy-quarks/26-identity-hash-code/#_footnotedef_4


https://github.com/openjdk/jdk/blob/4927ee426aedbeea0f4119bac0a342c6d3576762/src/hotspot/share/runtime/synchronizer.cpp#L760-L798

https://bugs.openjdk.org/browse/JDK-8199394

https://stackoverflow.com/questions/49172698/default-hashcode-implementation-for-java-objects/49175508#49175508