What if it was a library that is inlined? When the thread is created, a syscall registers an address. The functions to “signal” entering/exiting a critical section simply write to the registered address.
Alternatively, a byte could be reserved in every thread local memory for this purpose. This would eliminate the syscall and the address lookup.