If you have a counter for which atomic operations are supported, it will be more efficient than a mutex.
Technically, the atomic will lock the memory bus on most platforms. However, there are two ameliorating details:
- It is impossible to suspend a thread during the memory bus lock, but it is possible to suspend a thread during a mutex lock. This is what lets you get a lock-free guarantee (which doesn't say anything about not locking - it just guarantees that at least one thread makes progress).
- Mutexes eventually end up being implemented with atomics. Since you need at least one atomic operation to lock a mutex, and one atomic operation to unlock a mutex, it takes at least twice long to do a mutex lock, even in the best of cases.