A minimal (standards compliant) mutex implementation requires 2 basic ingredients:
- A way to atomically convey a state change between threads (the 'locked' state)
- memory barriers to enforce memory operations protected by the mutex to stay inside the protected area.
There is no way you can make it any simpler than this because of the 'synchronizes-with' relationship the C++ standard requires.
A minimal (correct) implementation might look like this:
class mutex { std::atomic<bool> flag{false};public: void lock() { while (flag.exchange(true, std::memory_order_relaxed)); std::atomic_thread_fence(std::memory_order_acquire); } void unlock() { std::atomic_thread_fence(std::memory_order_release); flag.store(false, std::memory_order_relaxed); }};
Due to its simplicity (it cannot suspend the thread of execution), it is likely that, under low contention, this implementation outperforms a std::mutex
.But even then, it is easy to see that each integer increment, protected by this mutex, requires the following operations:
- an
atomic
store to release the mutex - an
atomic
compare-and-swap (read-modify-write) to acquire the mutex (possibly multiple times) - an integer increment
If you compare that with a standalone std::atomic<int>
that is incremented with a single (unconditional) read-modify-write (eg. fetch_add
),it is reasonable to expect that an atomic operation (using the same ordering model) will outperform the case whereby a mutex is used.