ThreadLocal

The ThreadLocal class helps us maintain 'per thread' semantics. Thus, even if two threads are executing the same code, and the code has a reference to a ThreadLocal variable, then the two threads cannot see each other's values. This is especially helpful when we are using a 3rd party utility library that isn't thread safe. ThreadLocal allows multiple threads to work on it without causing any side effect due to interleaving

One of the commonest use cases is date-string conversion (or vice versa) using SimpleDateFormat. To convert a string to java.util.Date object, we use SimpleDateFormat like this :

    SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");
    Date date = format.parse(dateString);

Note that SimpleDateFormat inner implementation isnt threadsafe. Consider the code snippet below - where two threads are using two different date formats to parse two different strings using the same SimpleDateFormat object.

public class Test {

    private static SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");

    public static void main(String[] args) throws Exception {

        Thread t1 = new Thread() {
            public void run() {
                 format.applyPattern("yyyy-MM-dd");;
                 try {
                    Date date = format.parse(args[0]);
                 } catch (ParseException e) {
                    e.printStackTrace();
                 }
            };
        };
        Thread t2 = new Thread() {
            public void run() {
                 format.applyPattern("yy-MM-dd");;
                 try {
                    Date date = format.parse(args[1]);
                 } catch (ParseException e) {
                    e.printStackTrace();
                 }
           };
        };        
        t1.start();
        t2.start();
    }
}

This is definitely going to cause a problem and there's a high possibility that at least one of the threads interpret the date incorrectly.

A form of mutual exclusion like synchronized or ReentrantLock will solve the problem but will make the code very slow - the will be a memory barrier before and after and data needs to be copied from main memory to cache and vice versa. Also the section protected by mutual exclusion essentially becomes single threaded. Alternately, we can change the SimpleDateFormat instance to be a local variable but that might mean too many short lived objects being created and destroyed immediately (Consider an algorithmic trading application with a couple of million trades per hour - this would mean couple of million objects being created and garbage collected almost immediately)

ThreadLocal attempts to solve the problem by providing a registry service with 'per thread semantics'. With a Threadlocal reference, each thread is working on its own instance of the underlying unsafe code.

private static ThreadLocal<DateFormat> threadLocalDateFormatWrapper = new ThreadLocal<DateFormat>() {

    @Override 
    protected DateFormat initialValue() {
        return new SimpleDateFormat("yyyy-MM-dd");
    }

};

Think of a ThreadLocal object as a map like structure which maintains a separate value per thread
A thread can get or set its own version of the object using the following methods :

public T get()
public void set(T value)

When a thread calls get(), if get() returns null, it invokes initialValue() this is an useful construct for setting a default value :

public T get() {
    Thread t = Thread.currentThread();
    ThreadLocalMap map = getMap(t);
    if (map != null) {
        ThreadLocalMap.Entry e = map.getEntry(this);
        if (e != null) {
            @SuppressWarnings("unchecked")
            T result = (T)e.value;
            return result;
        }
    }
    return setInitialValue();
}

It's important that when we declare a class variable of type ThreadLocal, we declare it as static. Else, instead of maintaining 'value per thread' semantics, we might be accidentally maintaining 'value-per-thread-per-instance' semantics.

So how is ThreadLocal implemented internally ?

Before we look the source code, lets try thinking about possible ways to implement it:

  • Fundamentally ThreadLocal maintains a unique value per thread. What if we maintain a map with thread ids as keys ? The problem here is that while JVM ensures that at any point of time all threads will have unique thread ids, there's a possibility that when a thread (lets call it T1) dies and another thread (lets call it T2) is spawned later, T2 can have same id as T1.

  • What if we use the thread objects as keys themselves ? Now we have a problem where a threadlocal variable has a live reference to a thread and hence the Thread object cant be garbage collected even though it's done with execution and is already terminated

  • What if we use WeakHashMap instead of regular HashMap ? That will ensure that the Thread object references in the ThreadLocal object are all weak references and hence wont prevent garbage collection of the Thread object. However, WeakHashMap is not thread safe and there could be problems if two threads call set( ) simultaneously and the values are being put in the same bucket (or when the WeakHashMap internally undergoes resizing when threads are still writing to it)

  • We can use Collections.synchronizedMap to wrap the map containing Weak References to Threads as keys. That way we are neither preventing garbage collection nor have a race condition problem. The only concern here is performance as synchronization is a really slow process (especially slower on multi core machines)

Now that we know of the potential challenges while trying to design ThreadLocal class, let's look at the source code to see how Java does it :

Clearly we have two main challenges here - ensuring that ThreadLocal reference doesn't prevent garbage collection and ensuring that multiple threads can work on the same ThreadLocal object without causing a race condition. Java attempts to solve the problem with a little but of out-of-the-box-thinking. Instead of modelling ThreadLocal as threads mapped to values, each thread maintains a map of thread local instances and corresponding values. This automatically solves the concurrency problem as each thread now has a separate map. This also solves the GC problem as ThreadLocal instances now have no direct references to Threads.

Let's look at snippets from the source code :

Thread class keeps a reference to a ThreadLocal.ThreadLocalMap instance, which is built using weak references to the keys.

public class Thread implements Runnable {
    ThreadLocal.ThreadLocalMap threadLocals = null;
}

This is what ThreadLocalMap implementation looks like

static class ThreadLocalMap {

    static class Entry extends WeakReference<ThreadLocal<?>> {
        /** The value associated with this ThreadLocal. */
        Object value;

        Entry(ThreadLocal<?> k, Object v) {
            super(k);
            value = v;
        }
    }

...
}

And the Threadlocal class is implemented as :

public class ThreadLocal<T> {

    ThreadLocalMap getMap(Thread t) {
        return t.threadLocals;
    }

    public T get() {
        Thread t = Thread.currentThread();
        ThreadLocalMap map = getMap(t);
        if (map != null) {
            ThreadLocalMap.Entry e = map.getEntry(this);
            if (e != null)
                return (T) e.value;
         }
         return setInitialValue();
    }

    private T setInitialValue() {
        T value = initialValue();
        Thread t = Thread.currentThread();
        ThreadLocalMap map = getMap(t);
        if (map != null)
            map.set(this, value);
        else
            createMap(t, value);
        return value;
    }
}

comments powered by Disqus