Understanding GoLang's Concurrency ⚡️
GoLang stands out in its multithreading capabilities. Read this essay to find out how?
Understanding GoRoutines
Nature: GoRoutines are Go's version of threads. They operate over OS threads but are directly managed by Go's runtime, not the operating system.
Memory Efficiency: A GoRoutine's memory footprint starts at a mere 2KB stack, in stark contrast to OS threads, which can consume up to 8MB in some Linux versions.
Stack Flexibility: They utilise a segmented stack architecture. Only the active segment holds data, and as the stack's needs change, segments are either added or released.
Go Runtime’s Scheduling Smartness
Runtime Management: GoRoutines' scheduling is handled by Go's runtime. This design offers a more informed decision-making process due to the runtime's deeper contextual understanding of GoRoutines. For instance, since the GC runs using its own set of GoRoutines, the scheduler needs to make smart decisions to manage the scheduling chaos. It often context-switches a GoRoutine that wants to touch the heap with those that don’t during GC. Thus, when GC is running, massive scheduling decisions are underway.
Cooperative Over Preemptive: GoRoutines lean more towards cooperative scheduling. Instead of abrupt stops, they're signalled by the runtime to pause at the nearest "safe point," ensuring a consistent state for later resumption.
Preemptive Scheduling: A system-enforced pause and switch of threads.
Cooperative Scheduling: Threads are signalled to pause at predefined safe points, ensuring resumption without hitches.
Local Run Queues: Every processor (Go Runtime creates an Internal Processor) has a local queue for ready-to-run GoRoutines specific to it. These GoRoutines are then scheduled on the OS threads bound to that Processor.
Global Run Queue: Apart from local queues, a shared global queue exists across all Processor's. Newly created GoRoutines or those from a full local queue may find themselves here.
Work Stealing: A Processor with an empty local queue might "steal" GoRoutines from other Processor’s local queues or the global queue, ensuring all processors remain active. This work-stealing nature of the scheduler is crucial. It ensures that no CPU Thread moves into a waiting state, preventing OS context-switches off the Core and maintaining P’s productivity, P is the Process which hold all the GoRoutines to be run on a CPU Thread. This also helps in balancing GoRoutines across all the P’s, optimising work distribution.
Sharing Information: Each Processor mainly focuses on its local GoRoutines, but the scheduling system allows any Processor to access info about any runnable GoRoutine inside any other Processor or even the global queue.
Network Poller
The network poller ensures GoRoutines wait efficiently for I/O operations, such as reading or writing, without occupying a whole OS thread.
Efficiency with Spinlocks
What are Spinlocks? Spinlocks are used for short critical sections where lock contention is low. When a thread tries to acquire a lock, it waits in a loop ("spin"), constantly checking the lock's availability.
Implementation: Spinlocks use atomic instructions to test and acquire the lock. Below is a Go example:
type SpinLock struct {
locked int32
}
func (sl *SpinLock) Lock() {
while !atomic.CompareAndSwapInt32(&sl.locked, 0, 1) {}
}
func (sl *SpinLock) Unlock() {
atomic.StoreInt32(&sl.locked, 0)
}
Conclusion
GoRoutines are a leap over traditional OS threads in terms of cost-efficiency and performance. This efficiency is attributed to GoRoutines bypassing system calls, kernel transitions, and certain memory operations, as they function in the user space. Moreover, Go manages to transform IO/Blocking work into CPU-bound tasks at the OS level, since it simply doesn’t let the OS thread context switch, by keeping it engaged with the core.
Traditional context switches cost about 12k instructions on average. In Go, these switches are reduced to roughly 2.4k instructions, saving precious CPU time. The Go scheduler’s design intuition is exemplary considering the dynamics between the OS and the hardware. This ability of Go's design significantly enhances CPU capacity utilisation over time.
Thanks for Reading.
Connect with me on LinkedIn. Aadhar Chandiwala