Top

Tags: Linux

SLURM administration

Dec 29, 2020 | 764 views

#Linux #HPC


GPU as gres


Refer to

Accounting and Limits 


Refer to

Core as consumable resource:  

sed -i 's/#SelectType=select\/cons_res/SelectType=select\/cons_res/g'  /etc/slurm/slurm.conf
sed -i '/SelectType=select\/cons_res/a SelectTypeParameters=CR_Core'  /etc/slurm/slurm.conf

Refer to:


Manual resume a State=DOWN node of a cluster
scontrol update NodeName=node10 State=RESUME

Refer to How to “undrain” slurm nodes in drain state 


Reference


Reference

Comments: 0

NFS and mount failed problem

Dec 29, 2020 | 713 views

#Linux #Storage


NFS failed to restart due to below mount error


### Error as below: 
May 14 00:03:30 rbx06 systemd[1]: dev-disk-by\x2duuid-62ccfba0\x2d6394\x2d42c0\x2dbd38\x2d3da2ea4893b6.device: Job dev-disk-by\x2duuid-62ccfba0\x2d6394\x2d42c0\x2dbd38\x2d3da2ea4893b6.device/start timed out.
May 14 00:03:30 rbx06 systemd[1]: Timed out waiting for device /dev/disk/by-uuid/62ccfba0-6394-42c0-bd38-3da2ea4893b6.
May 14 00:03:30 rbx06 systemd[1]: Dependency failed for /dev/disk/by-uuid/62ccfba0-6394-42c0-bd38-3da2ea4893b6.

### or like this:
Oct 31 20:28:20 thutmose kernel: EXT4-fs (sdc1): mounting ext3 file system using the ext4 subsystem
Oct 31 20:28:20 thutmose kernel: EXT4-fs (sdc1): warning: maximal mount count reached, running e2fsck is recommended
Oct 31 20:28:20 thutmose kernel: EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 20:28:20 thutmose sudo[18994]: pam_unix(sudo:session): session closed for user root
Oct 31 20:28:20 thutmose systemd[1]: mnt-attorney.mount: Unit is bound to inactive unit dev-disk-by\x2dlabel-attorney.device. Stopping, too.
Oct 31 20:28:20 thutmose systemd[1]: Unmounting /mnt/attorney...
Oct 31 20:28:21 thutmose systemd[1]: Unmounted /mnt/attorney.

Solution:


# run this command
systemd daemon-reload

## restart rpcbind and nfs
exportfs -a # /etc/exports was updated
systemctl restart rpcbind nfs

Refer to:


Comments: 0