GPU as gres
Refer to
Accounting and Limits
Refer to
Core as consumable resource:
sed -i 's/#SelectType=select\/cons_res/SelectType=select\/cons_res/g' /etc/slurm/slurm.conf
sed -i '/SelectType=select\/cons_res/a SelectTypeParameters=CR_Core' /etc/slurm/slurm.conf
Refer to:
Manual resume a State=DOWN node of a cluster
scontrol update NodeName=node10 State=RESUME
Refer to How to “undrain” slurm nodes in drain state
Reference
Reference
Comments: 0
NFS failed to restart due to below mount error
### Error as below:
May 14 00:03:30 rbx06 systemd[1]: dev-disk-by\x2duuid-62ccfba0\x2d6394\x2d42c0\x2dbd38\x2d3da2ea4893b6.device: Job dev-disk-by\x2duuid-62ccfba0\x2d6394\x2d42c0\x2dbd38\x2d3da2ea4893b6.device/start timed out.
May 14 00:03:30 rbx06 systemd[1]: Timed out waiting for device /dev/disk/by-uuid/62ccfba0-6394-42c0-bd38-3da2ea4893b6.
May 14 00:03:30 rbx06 systemd[1]: Dependency failed for /dev/disk/by-uuid/62ccfba0-6394-42c0-bd38-3da2ea4893b6.
### or like this:
Oct 31 20:28:20 thutmose kernel: EXT4-fs (sdc1): mounting ext3 file system using the ext4 subsystem
Oct 31 20:28:20 thutmose kernel: EXT4-fs (sdc1): warning: maximal mount count reached, running e2fsck is recommended
Oct 31 20:28:20 thutmose kernel: EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 20:28:20 thutmose sudo[18994]: pam_unix(sudo:session): session closed for user root
Oct 31 20:28:20 thutmose systemd[1]: mnt-attorney.mount: Unit is bound to inactive unit dev-disk-by\x2dlabel-attorney.device. Stopping, too.
Oct 31 20:28:20 thutmose systemd[1]: Unmounting /mnt/attorney...
Oct 31 20:28:21 thutmose systemd[1]: Unmounted /mnt/attorney.
Solution:
# run this command
systemd daemon-reload
## restart rpcbind and nfs
exportfs -a # /etc/exports was updated
systemctl restart rpcbind nfs
Refer to:
Comments: 0