Special thread features
Published by Kaustubh Saha on December 24th, 2018
Shutdown Hook :
Shutdown Hooks are pre-registered (but yet-to-be-started) threads which are run by JVM when an application is shutting down. There can be more that one shutdown hooks registered with a JVM, JVM does not give any guarantee on the order in which shutdown hooks are started (in case there are multiple shutdown hooks)
We can register a shutdown hook by using the following method of Runtime class :
public void addShutdownHook(Thread hook) {
SecurityManager sm = System.getSecurityManager();
if (sm != null) {
sm.checkPermission(new RuntimePermission("shutdownHooks"));
}
ApplicationShutdownHooks.add(hook);
}
Let's try out a sample shutdownhook implementation
public static void main(String[] args) {
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " : executing shutdown hook logic");
}
});
System.out.println("About to shut down JVM");
}
Output:
About to shut down JVM
Thread-0 : executing shutdown hook logic
Clearly the shutdown hook thread was executed only after the last line of main method was executed
Shutdownhook gets triggered even in case of abnormal termination of program
public static void main(String[] args) {
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " : executing shutdown hook logic");
}
});
throw new RuntimeException();
}
Output:
Exception in thread "main" java.lang.RuntimeException
at Test.main(Test.java:21)
Thread-0 : executing shutdown hook logic
Shutdown hook is triggered even when the application is programmatically terminated using System.exit :
public static void main(String[] args) {
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " : executing shutdown hook logic");
}
});
System.exit(1);
}
Output:
Thread-0 : executing shutdown hook logic
However if we use Runtime.halt instead of System.exit or Runtime.exit, shutdown hooks are not triggered. Extreme caution should be used while using Runtime.halt as it doesn't invoke finalize methods on the surviving objects either.
public static void main(String[] args) {
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " : executing shutdown hook logic");
}
});
Runtime.getRuntime().halt(1);
}
Running this won't cause shutdown hook thread to be triggered. So, nothing will be printed on console.
Can a shutdown thread prevent application shutdown by running forever ?
public static void main(String[] args) {
Thread t = new Thread() {
@Override
public void run() {
while(true) {
try {
TimeUnit.MINUTES.sleep(2);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
};
t.setName("ShutdownHookThread");
Runtime.getRuntime().addShutdownHook(t);
Runtime.getRuntime().exit(1);
}
JVM doesn't treat a shutdown hook thread differently from other threads. So the above code will continue to run forever
Interestingly, if a shutdown hook thread is configured to be a daemon thread, JVM still waits for it to finish execution. For example, in the code snippet below, the shutdown hook thread sleeps for 5 seconds before printing something on console and terminating. JVM continues to run until the shutdown hook thread is done even though the shutdown hook thread is a daemon thread.
public static void main(String[] args) {
Thread t = new Thread() {
@Override
public void run() {
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + " : executing shutdown hook logic");
}
};
t.setDaemon(true);
Runtime.getRuntime().addShutdownHook(t);
System.out.println("About to shut down JVM");
}
Output:
About to shut down JVM
Thread-0 : executing shutdown hook logic
What if a shutdown hook thread is taking too long ? Is there a way to forcibly terminate it ?
The easiest way is to have another shutdown hook thread - lets call it the bailout shutdown hook thread configured to run Runtime.halt after a certain amount of time has elapsed
In the following code snippet, we have two shutdown hook threads - t1 and t2. t1 prints something, then sleeps for 5 seconds and then again prints something and terminates. t2 is the bailout shutdown hook that executes Runtime.halt() after 2 seconds. JVM shuts down without waiting for t1 to complete.
public static void main(String[] args) {
Thread t1 = new Thread() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " : executing shutdown hook logic");
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + " : done executing shutdown hook logic");
}
};
Thread t2 = new Thread() {
@Override
public void run() {
try {
TimeUnit.SECONDS.sleep(2);
} catch (InterruptedException e) {
e.printStackTrace();
}
Runtime.getRuntime().halt(0);
}
};
Runtime.getRuntime().addShutdownHook(t1);
Runtime.getRuntime().addShutdownHook(t2);
System.out.println("About to shut down JVM");
}
Output:
About to shut down JVM
Thread-0 : executing shutdown hook logic
From the console output, we can clearly see that JVM terminated before 'Thread-0' could complete execution.
Shutdown hooks are also applicable to process terminations triggered from outside JVM. Note that this isn't always guaranteed to work. If a process is forcefully terminated (for example with the SIGKILL signal on Unix or the TerminateProcess call on Microsoft Windows), JVM wont have a chance to clean up and run shutdown hooks.
Out-of-thin-air-safety
When a thread reads a variable without synchronization, it may see a stale value, but at least it sees a value that was actually placed there by some thread rather than some random value. This safety guarantee is called out-of-thin-air safety. The only exceptions to this rule are non-volatile variables of type long and double (64 bit datatypes). The Java Memory Model requires fetch and store operations to be atomic, but for nonvolatile long and double variables, the JVM is permitted to treat a 64 bit read/write is essentially two reads/writes - one for upper 32 bits and another for lower 32 bits. If the reads and writes occur in different threads, it is therefore possible to read a nonvolatile long and get back the high 32 bits of one value and the low 32 bits of another.
Thus, even if you don't care about stale values, it is not safe to use shared mutable long and double variables in multithreaded programs unless they are declared volatile or guarded by a lock.
busy spin/adaptive spin
On a multi core system, L1 and L2 cache are typically never shared across processor/core whereas L3 and L4 cache may or may not be shared depending on the processor architecture. This means that when a thread loses CPU quantum and is later picked up by scheduler again, it might be picked up for processing by another core requiring all the cache data to be copied from one core's cache to another - thus making context switching a really expensive affair. Even though most modern day schedulers try to ensure that if a thread was being serviced by CPU1 earlier (and has its state stored in CPU1's L1/L2 cache) and now both CPU1 and CPU2 are available to service the thread, CPU1 is preferred over CPU2, it's done on a best-effort-basis and is definitely not a guarantee. Considering that typically a thread's quantum is a very small amount of time, many a times, it makes more sense to actually wait in a loop and waste a few CPU cycles than to go through a context switch. This technique of waiting in a loop instead of relinquishing CPU in order to avoid losing cache contents (and having to reload cache) is termed as busy spinning. Of course a thread doesn't go through busy spin phase indefinitely. The actual duration for which a thread spins before actually releasing CPU varies from application to application and is determined by heuristics. That's why busy spinning is also termed as adaptive spinning.
Busy spin is the default behavior with synchronized blocks/methods since JDK 1.6 (however Lock/ReantrantLock idiom doesn't involve busy spin)
Hyperthreading(HT)/Simultaneous Multi Threading(SMT) :
Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
Hyperthreading is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on x86 microprocessors.
The main function of hyper-threading is to increase the number of independent instructions in the pipeline.
For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. So for a 7th generation Intel i7 processor, which has 4 cores, enabling hyper-threading would make it behave like an 8 core processor (with some caveats). In fact, Rumtime.getRuntime().avalableProcessors() returns 4 and 8 on 7th generation i7 processor with hyperthreading turned off and on respectively.
Let's look at architecture diagram for an Intel x86 based processor
![alt txt] (
http://www.qdpma.com/Arch_files/Microarchitecture_SandyBridge_IDF.png
"Intel x86 architecture diagram")
Enabling hyperthreading essentially means that for every port, now there are two threads processing instructions in the pipeline.
Note that hyper-threading can be properly utilized only with an operating system specifically optimized for it and Intel recommends disabling Hyperthreading when using operating systems unaware of this hardware feature
Semaphores :
In computer science parlance, a semaphore (also called a counting semaphore) is an abstraction which is used to control simultaneous access to a common resource by multiple threads in a concurrent system. The value of a semaphore is the number of access permits to the resource that are currently available. A semaphore of size 1 behaves exactly like a lock and provides mutual exclusion. A semaphore of size 1 is also called a binary semaphore since it has only two states - LOCK_AVAILABLE and LOCK_NOT_AVAILABLE.
The Semaphore (java.util.concurrent.Semaphore) class in Java mimics the concept of a counting semaphore and maintains a set of permits.
Permits can be associated with a Semaphore instance through it's constructor
public Semaphore(int permits) {
sync = new NonfairSync(permits);
}
Interestingly, Semaphore class in Java supports negative number of permits as well. A negative count indicates that permits must be released to the Semaphore before using the Semaphore.
A thread can acquire a permit from a Semaphore by calling its acquire method :
public void acquire() throws InterruptedException
There's also an overloaded form which allows acquiring multiple permits at one go:
public void acquire(int permits) throws InterruptedException
If the requested number of permits are available, the call to acquire returns immediately (and the available count of permits is reduced internally) else it blocks until permits are available (released by another thread)
Note that no actual permits are being used. The semaphore just keeps a track of the number of permits available and acts accordingly. For the same season, there is actually nothing that prevents a rogue thread from accessing the resource without acquiring the permit first.
There are other flavours of the acquire operation as well like (the signatures are self explanatory and in line with other concurrency related classes introduced in Java 1.5)
public void acquireUninterruptibly()
or
public boolean tryAcquire()
or
public boolean tryAcquire(long timeout, TimeUnit unit)
A thread can release a permit back to a Semaphare by invoking the release method :
public void release()
Just as was the case with acquire, a thread can release multiple permits at one go as well :
public void release(int permits)
Since the notion of permits isn't backed by anything physical, there is nothing to prevent a thread from calling release without having called acquire. While this can be a problem, this also provides Semaphore an additional flexibility over locks.
At any point, we can check the number of currently available permits by invoking :
public int availablePermits()
This is particularly useful while debugging/testing multithreaded applications using semaphores.
A very good usecase for Semaphores is building a connection pool (or a similar resource pool) where the pool is initialized with fixed number of resources and different threads can acquire resources and release them when done.
work stealing:
In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. In a work stealing scheduler, each processor in a computer system has a queue of computational tasks to perform. Each work item consists of a series of instructions, to be executed sequentially, but in the course of its execution, a work item may also spawn new work items that can feasibly be executed in parallel with its other work. These new items are initially put on the queue of the processor executing the work item. When a processor runs out of work, it looks at the queues of other processors and "steals" their work items. In effect, work stealing distributes the scheduling work over idle processors, and as long as all processors have work to do, no scheduling overhead occurs. In order to avoid conflicts, a processor picks tasks from one end of its own queue while it uses a different end to pick tasks from other processors' queues. So the 'queue' here is not a FIFO buffer in the traditional sense but a double ended queue (popularly called deque)
Java implements Work stealing through the fork join framework. In fork/join thread pool, each task monitors whether it has to work for a long time. If yes it forks it's own task. 'Fork' is splitting/decomposing of a task and pushing the half (ideally) of it to the parent pool so that other threads can take up (steal) that split task. The term 'join' is combining of results of the tasks submitted to the pool. Tasks are recursively split into smaller parts, run in parallel and recombined. Fork/Join framework and API support was introduced in Java as part of JDK 1.7
LockSupport
Provides basic thread blocking primitives for creating locks and other synchronization classes. Most of the methods in LuckSupport class are designed to be used as tools for creating higher-level synchronization utilities, and are not in themselves useful for most concurrency control applications.
Let's take a look into LockSupport API :
public static void park()
LockSupport associates a permit with each thread that uses it. Just like a Semaphore, a LockSupport permit is an abstract concept and not backed by any resource.
If permit is available, a call to park() consumes the permit and returns immediately
Else the thread calling park() goes off scheduling until one of the following events occur :
-
Someone invokes unpark for this thread
-
The thread is interrupted
There are other favours park operation as well, like :
public static void parkNanos(long nanos)
This is essentially a timed version of park with a configurable timeout period
public static void parkUntil(long deadline)
Provides an absolute time (in miliseconds since epoch) when timeout should occur
Just like the park method, there's also the unpark method :
public static void unpark(Thread thread)
Unpark makes permit available (if not already available) for a particular thread. Unpark has no effect the argument doesnt point to a live thread. If thread t is blocked on park(), unpark(t) will cause thread t to unblock. If t isnt already blocked, unpark(t) will ensure that the next call to park() from thread t wont block.
The following code demonstrates how LockSupport behaves :
public static void main(String[] args) throws Exception {
Thread t1 = new Thread() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " calling park ");
LockSupport.park();
System.out.println(Thread.currentThread().getName() + " consumed permit ");
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + " calling park ");
LockSupport.park();
System.out.println(Thread.currentThread().getName() + " consumed permit ");
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + " calling park ");
LockSupport.parkNanos(1_000_000);
System.out.println(Thread.currentThread().getName() + " consumed permit ");
}
};
Thread t2 = new Thread() {
public void run() {
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
e.printStackTrace();
}
LockSupport.unpark(t1);
}
};
t1.start();
LockSupport.unpark(t1);
t2.start();
}
We have a thread T1 which
calls park
Sleeps for 5 second, then calls park again
Sleeps for 5 seconds then calls park again with timeout
Main thread calls LockSupport.unpark(t1); just when t1 starts, so the first call to park is unblocked.
Thread t2 calls
LockSupport.unpark(t1); after sleeping for a while, so the second call to park is unblocked
The third call to park gets unblocked after timeout period elapses
Output:
Thread-0 calling park
Thread-0 consumed permit
Thread-0 calling park
Thread-0 consumed permit
Thread-0 calling park
Thread-0 consumed permit
TLAB :
TLAB stands for Thread Local Allocation Buffer. When we do new Object(), the runtime environment has to allocate storage for the instance in question. Memory allocation is an expensive process and simultaneous requests for memory allocation coming from multiple threads makes it even harder for the memory manager. So to optimize this, JVM instead allows threads to allocate the entire blocks of memory for their needs, and only transit to VM to get a new block. In Hotspot, these blocks are called Thread Local Allocation Buffers (TLABs). Assuming that a TLAB can hold 100 objects, a thread would only need to aquire a lock for claiming more memory when allocating the 101 object. Without TLABs, this would be required for every object. The downside is of course that you potentially waste space.
You can set the size of a TLAB using the -XX:TLABSize flag. Its default size is 0, which means use heuristics based dynamic calculation. TLABs are enabled by default. For the sake of experiment, we can also turn TLAB machinery off with -XX:-UseTLAB