Infiniband 网卡驱动安装的坑
应用介绍
再去启动openibd,熟悉的错误还是出现了! 上面第5、6列部分就是设备号,前面是 关于 可以看到0~233是固定分配出去了,234-254 是预留动态分配部分,从254,253,252往小的方向分配使用。 这里有个问题,如果234-254 不够动态分配了怎么办呢? 4.15之前的kernel是不管三七二十一,继续往小的方向去使用。 commit:https://github.com/torvalds/linux/commit/a5d31a3f81c6fb13b381951bf6163444c0257e8b#diff-3b17f7c08e0e1995904f19fbfff59700e41d1fe60a11ab3a40d4e0675a12c732 解决方案:如果你跟我一样不幸,是用的古董级别的内核(4.15之前的版本),又恰好碰到了231这个device number被占用了,导致IB驱动启动失败,唯一的方案就是先卸载占用这个number的内核模块,然后加载IB驱动,再重新加载导致问题的这个内核模块,然后就可以正常工作了。
1.登录官网选择适合操作系统的驱动版本
https://www.mellanox.com/products/ethernet-drivers/linux/mlnx_en
注意:这里需要注意一下ConnectX-3 Pro以及ConnectX-3 的硬件
在 5.x-x.x.x.x版本的驱动中就不再支持了;
因此这里需要下载LTS 版本的驱动
2.官方提供两种格式的驱动包:ISO以及tag包,可以根据个人喜好下载
解压:tar -xvzf MLNX_OFED_LINUX-4.9-2.2.4.0-rhel7.7-x86_64.tgz
安装:cd MLNX_OFED_LINUX-4.9-2.2.4.0-rhel7.7-x86_64
./mlnxofedinstall
3.启动加载驱动以及opensm子网管理器
驱动加载:sudo /etc/init.d/openibd restart
启动子网管理:sudo /etc/init.d/opensmd restart
(在IB网络里面需要子网管理组件,由于这里没有带管理的IB交换机,
因此需要在一台机器上面启动opensmd)
# /etc/init.d/openibd start
Loading Mellanox MLX5_IB HCA driver: [FAILED]
Loading HCA driver and Access Layer: [FAILED]
Please run /usr/sbin/sysinfo-snapshot.py to collect the debug information
//support.mellanox.com/SupportWeb/service_center/SelfService :
[三 12月 30 14:24:41 2020] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11
[三 12月 30 14:24:41 2020] user_verbs: couldn't register device number
[三 12月 30 14:24:41 2020] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11
[三 12月 30 14:24:41 2020] user_mad: couldn't register device number
[三 12月 30 14:24:41 2020] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11
[三 12月 30 14:24:41 2020] user_verbs: couldn't register device number
1.Download the x.509 public key.
# wget http://www.mellanox.com/downloads/ofed/mlnx_signing_key_pub.der
2.Add the public key to the MOK list using the mokutil utility.
# mokutil --import mlnx_signing_key_pub.der
user_verbs: couldn't register device number
user_mad: couldn't register device number
user_verbs: couldn't register device number
[root ~]# ls /dev/ -l
总用量 0
crw------- 1 root root 10, 235 10月 27 16:55 autofs
crw------- 1 root root 10, 60 10月 27 16:55 network_latency
crw------- 1 root root 10, 59 10月 27 16:55 network_throughput
-rw-r--r-- 1 root root 0 12月 29 23:17 nill
crw-rw-rw- 1 root root 1, 3 10月 27 16:55 null
crw-rw-rw- 1 root root 195, 0 10月 27 16:58 nvidia0
crw-rw-rw- 1 root root 195, 1 10月 27 16:58 nvidia1
crw-rw-rw- 1 root root 195, 2 10月 27 16:58 nvidia2
drwxr-xr-x 2 root root 80 10月 27 16:58 nvidia-caps
crw-rw-rw- 1 root root 195, 255 10月 27 16:58 nvidiactl
crw-rw-rw- 1 root root 195, 254 10月 27 17:31 nvidia-modeset
brw-rw---- 1 root disk 8, 0 10月 27 16:55 sda
brw-rw---- 1 root disk 8, 1 10月 27 16:55 sda1
brw-rw---- 1 root disk 8, 2 10月 27 16:55 sda2
lrwxrwxrwx 1 root root 15 10月 27 16:55 stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root root 15 10月 27 16:55 stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root root 15 10月 27 16:55 stdout -> /proc/self/fd/1
crw------- 1 root root 10, 58 11月 2 16:29 tgt
crw-rw-rw- 1 root tty 5, 0 12月 29 20:32 tty
crw--w---- 1 root tty 4, 0 10月 27 16:55 tty0
crw--w---- 1 root tty 4, 1 10月 27 16:55 tty1
crw--w---- 1 root tty 4, 10 10月 27 16:55 tty10
crw--w---- 1 root tty 4, 11 10月 27 16:55 tty11
crw--w---- 1 root tty 4, 12 10月 27 16:55 tty12
major
,后面是minor
。 major
号表示设备所使用的驱动,而minor
号则表示具体的设备。在上图中,tty
的驱动都是driver 4
,而利用minor
号区别不同的tty
设备。另外,通过/proc/devices
文件也可以看到设备所使用的驱动,即major
号:
~]# cat /proc/devices
Character devices:
1 mem
4 /dev/vc/0
4 tty
4 ttyS
5 /dev/tty
........
dev_t
,major
和minor
号定义如下:
/* <linux/types.h>: */
typedef __u32 __kernel_dev_t;
typedef __kernel_dev_t dev_t;
/* <linux/kdev_t.h> */
dev_t
占32 bit
长,其中高12
位是major
,低20
位是minor
。
int register_chrdev_region(dev_t from, unsigned count, const char *name)
from
包含major
和minor
,通常情况下minor
指定为0
。count
指定连续设备号的数量,name
指定设备的名字。register_chrdev_region
实现如下:
/**
* register_chrdev_region() - register a range of device numbers
* @from: the first in the desired range of device numbers; must include
* the major number.
* @count: the number of consecutive device numbers required
* @name: the name of the device or driver.
*
* Return value is zero on success, a negative error code on failure.
*/
int register_chrdev_region(dev_t from, unsigned count, const char *name)
{
struct char_device_struct *cd;
dev_t to = from + count;
dev_t n, next;
for (n = from; n < to; n = next) {
next = MKDEV(MAJOR(n)+1, 0);
if (next > to)
next = to;
cd = __register_chrdev_region(MAJOR(n), MINOR(n),
next - n, name);
if (IS_ERR(cd))
goto fail;
}
return 0;
fail:
to = n;
for (n = from; n < to; n = next) {
next = MKDEV(MAJOR(n)+1, 0);
kfree(__unregister_chrdev_region(MAJOR(n), MINOR(n), next - n));
}
return PTR_ERR(cd);
}
register_chrdev_region
即是把from
开始连续count
个设备号(dev_t
类型,包含major
和minor
)都注册。/drivers/tty/tty_io.c
):
register_chrdev_region(MKDEV(TTYAUX_MAJOR, 1), 1, "/dev/console")
int alloc_chrdev_region(dev_t *dev, unsigned int firstminor, unsigned int count, char *name);
dev
是传出参数,为动态获得的设备号;firstminor
指定第一个minor
;count
和name
同register_chrdev_region
的参数定义。alloc_chrdev_region
实现如下:
/**
* alloc_chrdev_region() - register a range of char device numbers
* @dev: output parameter for first assigned number
* @baseminor: first of the requested range of minor numbers
* @count: the number of minor numbers required
* @name: the name of the associated device or driver
*
* Allocates a range of char device numbers. The major number will be
* chosen dynamically, and returned (along with the first minor number)
* in @dev. Returns zero or a negative error code.
*/
int alloc_chrdev_region(dev_t *dev, unsigned baseminor, unsigned count,
const char *name)
{
struct char_device_struct *cd;
cd = __register_chrdev_region(0, baseminor, count, name);
if (IS_ERR(cd))
return PTR_ERR(cd);
*dev = MKDEV(cd->major, cd->baseminor);
return 0;
}
/drivers/watchdog/watchdog_dev.c
):
alloc_chrdev_region(&watchdog_devt, 0, MAX_DOGS, "watchdog");
void unregister_chrdev_region(dev_t first, unsigned int count);
user_mad: couldn't register device number
# ls ./MLNX_OFED_LINUX-4.9-2.2.4.0-rhel7.7-x86_64/src/
MLNX_OFED_SRC-4.9-2.2.4.0.tgz
# tar xvf MLNX_OFED_SRC-4.9-2.2.4.0.tgz
# cd MLNX_OFED_SRC-4.9-2.2.4.0/SRPMS
# rpm2cpio mlnx-ofa_kernel-4.9-OFED.4.9.2.2.4.1.src.rpm | cpio -iv
# tar xvf mlnx-ofa_kernel-4.9.tgz
# cd mlnx-ofa_kernel-4.9
# ls
backport_includes code-metrics.txt compat-2.6.18 compat_base_tree_version COPYING Documentation LINUX_BASE_BRANCH mlnx-ofa_kernel.spec ofed_scripts
backports compat compat_base compat_version debian drivers makefile Module.supported README
block compat-2.6.16 compat_base_tree configure devtools include Makefile net scripts
先看一下user_mad这个模块是如何注册device
# cat drivers/infiniband/core/user_mad.c
......
MODULE_AUTHOR("Roland Dreier");
MODULE_DESCRIPTION("InfiniBand userspace MAD packet access");
MODULE_LICENSE("Dual BSD/GPL");
enum {
IB_UMAD_MAX_PORTS = RDMA_MAX_PORTS,
IB_UMAD_MAX_AGENTS = 32,
IB_UMAD_MAJOR = 231,
IB_UMAD_MINOR_BASE = 0,
IB_UMAD_NUM_FIXED_MINOR = 64,
IB_UMAD_NUM_DYNAMIC_MINOR = IB_UMAD_MAX_PORTS - IB_UMAD_NUM_FIXED_MINOR,
IB_ISSM_MINOR_BASE = IB_UMAD_NUM_FIXED_MINOR,
};
......
static const dev_t base_umad_dev = MKDEV(IB_UMAD_MAJOR, IB_UMAD_MINOR_BASE);
......
static int __init ib_umad_init(void)
{
int ret;
ret = register_chrdev_region(base_umad_dev,
IB_UMAD_NUM_FIXED_MINOR * 2,
umad_class.name);
if (ret) {
pr_err("couldn't register device number\n");
goto out;
}
# /proc/devices
227 mlx5_fpga_tools
228 nvidia-uvm
229 nvidia-nvswitch
230 nvidia-nvlink
231 nvidia-caps
232 mei
233 ipmidev
234 ttyVS
235 cambr-msg
236 cambr-rpc
237 ttyMS
238 cn-mbox-test
239 cmsg
240 aux
241 megaraid_sas_ioctl
242 ptp
243 pps
244 dimmctl
245 ndctl
246 hidraw
247 usbmon
248 bsg
249 hmm_device
250 watchdog
251 iio
252 rtc
253 dax
254 tpm
# cat Documentation/admin-guide/devices.txt
首先231确实是被分配给了IB这个device
231 char InfiniBand
0 = /dev/infiniband/umad0
1 = /dev/infiniband/umad1
...
63 = /dev/infiniband/umad63 63rd InfiniBandMad device
64 = /dev/infiniband/issm0 First InfiniBand IsSM device
65 = /dev/infiniband/issm1 Second InfiniBand IsSM device
...
127 = /dev/infiniband/issm63 63rd InfiniBand IsSM device
128 = /dev/infiniband/uverbs0 First InfiniBand verbs device
129 = /dev/infiniband/uverbs1 Second InfiniBand verbs device
...
159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device
预留给动态分配部分:
234-254 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major number will
take numbers starting from 254 and downward.
kernel 4.15后新增:
384-511 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major
number will take numbers starting from 511 and downward,
once the 234-254 range is full.
# cat fs/char_dev.c
if (major == 0) {
for (i = ARRAY_SIZE(chrdevs)-1; i > 0; i--) {
if (chrdevs[i] == NULL)
break;
}
if (i < CHRDEV_MAJOR_DYN_END)
pr_warn("CHRDEV \"%s\" major number %d goes below the dynamic allocation range\n",
name, i);
if (i == 0) {
ret = -EBUSY;
static int find_dynamic_major(void)
{
int i;
struct char_device_struct *cd;
for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
if (chrdevs[i] == NULL)
return i;
}
for (i = CHRDEV_MAJOR_DYN_EXT_START;
i > CHRDEV_MAJOR_DYN_EXT_END; i--) {
for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next)
if (cd->major == i)
break;
if (cd == NULL || cd->major != i)
return i;
}
return -EBUSY;
}
......
if (major == 0) {
ret = find_dynamic_major();
if (ret < 0) {
pr_err("CHRDEV \"%s\" dynamic allocation region is full\n",
name);
goto out;
}
major = ret;
}
/* fs/char_dev.c */
/* Marks the bottom of the first segment of free char majors */
/* Marks the top and bottom of the second segment of free char majors */
©版权声明:本文内容由互联网用户自发贡献,版权归原创作者所有,本站不拥有所有权,也不承担相关法律责任。如果您发现本站中有涉嫌抄袭的内容,欢迎发送邮件至: [email protected] 进行举报,并提供相关证据,一经查实,本站将立刻删除涉嫌侵权内容。
转载请注明出处: apollocode » Infiniband 网卡驱动安装的坑
文件列表(部分)
名称 | 大小 | 修改日期 |
---|
发表评论 取消回复