NODE_FAIL on Slurm compute nodes Thursday 15th July 2021 12:58:44


We are currently experiencing an issue on a few Slurm compute nodes where communication is lost to the node resulting in a NODE_FAIL error on the running job. We have removed the nodes that are affected by this issue and are investigating.

The compute node issues resulting in the NODE_FAIL error has been resolved.