2025-08-17

pipe系统调用及示例

我们继续学习 Linux 系统编程中的重要函数。这次我们介绍 pipe 函数，它用于创建匿名管道，这是一种在相关进程（如父子进程）之间进行单向数据通信的重要机制。

pipe 函数介绍

1. 函数介绍

pipe 是一个 Linux 系统调用，用于创建一个匿名管道（Anonymous Pipe）。管道是一种半双工（单向）的进程间通信（IPC）机制，数据只能在一个方向上流动。

管道通常用于具有亲缘关系的进程之间通信，最常见的场景是父进程和子进程之间的数据传递。创建管道后，会得到两个文件描述符：一个用于读取（read end），一个用于写入（write end）。写入端写入的数据会被内核缓冲，然后可以从读取端读取出来。

管道是 Unix/Linux “一切皆文件” 哲学的体现，管道的两端都可以像普通文件一样使用 read() 和 write() 系统调用进行操作。

重要特性:

匿名性: 管道没有名字，不能通过文件系统路径访问。
单向性: 数据只能从写端流向读端。
亲缘性: 通常用于有亲缘关系的进程间通信。
阻塞性: 默认情况下，读写操作可能会阻塞。
缓冲性: 内核提供缓冲区来存储管道中的数据。

2. 函数原型

#include <unistd.h> // 必需

int pipe(int pipefd&#91;2]);

3. 功能

创建管道: 在内核中创建一个管道缓冲区。

返回文件描述符: 通过 pipefd 数组返回两个文件描述符：

pipefd[0]: 管道的读取端（read end）。
pipefd[1]: 管道的写入端（write end）。

4. 参数

int pipefd[2]: 一个包含两个整数的数组，用于接收返回的文件描述符。

pipefd[0]: 管道的读取端文件描述符。进程可以使用 read(pipefd[0], buffer, size) 从此端读取数据。
pipefd[1]: 管道的写入端文件描述符。进程可以使用 write(pipefd[1], buffer, size) 向此端写入数据。
注意: 创建管道后，通常会使用 fork() 创建子进程，然后父子进程分别关闭不需要的端。例如，父进程负责写入，则应关闭 pipefd[0]；子进程负责读取，则应关闭 pipefd[1]。

5. 返回值

成功时: 返回 0。

失败时:

返回 -1，并设置全局变量 errno 来指示具体的错误原因：

EFAULT: pipefd 数组指针无效。
EMFILE: 进程已打开的文件描述符数量达到上限 (RLIMIT_NOFILE)。
ENFILE: 系统范围内已打开的文件数量达到上限。

6. 相似函数，或关联函数

pipe2(int pipefd[2], int flags): pipe 的扩展版本，允许设置额外的标志，如 O_CLOEXEC（执行时关闭）或 O_NONBLOCK（非阻塞模式）。
mkfifo(const char *pathname, mode_t mode): 创建命名管道（FIFO），它是一个存在于文件系统中的特殊文件，可以让无亲缘关系的进程通信。
socketpair(int domain, int type, int protocol, int sv[2]): 创建一对相互连接的套接字，可以实现全双工通信。
popen(const char *command, const char *type), pclose(FILE *stream): 高级函数，创建一个管道并启动一个 shell 来执行命令，方便地实现程序间的数据交换。

7. 示例代码

示例 1：基本的父子进程管道通信

这个例子演示了最经典的管道使用场景：父进程向子进程发送数据。

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/wait.h>

#define BUFFER_SIZE 256

int main() {
    int pipefd&#91;2];
    pid_t pid;
    char buffer&#91;BUFFER_SIZE];

    printf("=== 基本父子进程管道通信 ===\n");

    // 1. 创建管道
    if (pipe(pipefd) == -1) {
        perror("pipe 创建失败");
        exit(EXIT_FAILURE);
    }
    printf("管道创建成功: 读端=%d, 写端=%d\n", pipefd&#91;0], pipefd&#91;1]);

    // 2. 创建子进程
    pid = fork();
    if (pid == -1) {
        perror("fork 失败");
        // 清理管道文件描述符
        close(pipefd&#91;0]);
        close(pipefd&#91;1]);
        exit(EXIT_FAILURE);
    }

    if (pid == 0) {
        // 子进程：读取数据
        printf("子进程 (PID: %d) 开始读取数据...\n", getpid());
        
        // 关闭写端（子进程不需要写入）
        close(pipefd&#91;1]);
        
        // 从管道读取数据
        ssize_t bytes_read = read(pipefd&#91;0], buffer, BUFFER_SIZE - 1);
        if (bytes_read > 0) {
            buffer&#91;bytes_read] = '\0'; // 确保字符串以 null 结尾
            printf("子进程读取到数据: %s", buffer);
        } else if (bytes_read == 0) {
            printf("子进程读取结束 (写端已关闭)\n");
        } else {
            perror("子进程读取失败");
        }
        
        // 关闭读端
        close(pipefd&#91;0]);
        printf("子进程结束\n");
        exit(EXIT_SUCCESS);
        
    } else {
        // 父进程：写入数据
        printf("父进程 (PID: %d) 开始写入数据...\n", getpid());
        
        // 关闭读端（父进程不需要读取）
        close(pipefd&#91;0]);
        
        // 向管道写入数据
        const char *message = "Hello from parent process through pipe!\n";
        ssize_t bytes_written = write(pipefd&#91;1], message, strlen(message));
        if (bytes_written == -1) {
            perror("父进程写入失败");
        } else {
            printf("父进程写入 %ld 字节数据\n", bytes_written);
        }
        
        // 关闭写端，这会通知读端数据已写完
        close(pipefd&#91;1]);
        
        // 等待子进程结束
        int status;
        waitpid(pid, &status, 0);
        if (WIFEXITED(status)) {
            printf("子进程正常退出，退出码: %d\n", WEXITSTATUS(status));
        } else {
            printf("子进程异常退出\n");
        }
        
        printf("父进程结束\n");
    }

    return 0;
}

代码解释:

首先调用 pipe(pipefd) 创建管道，得到读端 pipefd[0] 和写端 pipefd[1]。

调用 fork() 创建子进程。此时，父子进程都拥有管道两端的文件描述符副本。

在子进程中：

关闭不需要的写端 pipefd[1]。
使用 read(pipefd[0], …) 从管道读取数据。
读取完成后关闭读端 pipefd[0]。

在父进程中：

关闭不需要的读端 pipefd[0]。
使用 write(pipefd[1], …) 向管道写入数据。
写入完成后关闭写端 pipefd[1]，这会通知读端没有更多数据。
使用 waitpid() 等待子进程结束。

示例 2：双向管道通信

这个例子演示如何使用两个管道实现父子进程之间的双向通信。

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/wait.h>

#define BUFFER_SIZE 256

int main() {
    int parent_to_child_pipe&#91;2];  // 父进程向子进程发送数据
    int child_to_parent_pipe&#91;2];  // 子进程向父进程发送数据
    pid_t pid;
    char buffer&#91;BUFFER_SIZE];

    printf("=== 双向管道通信 ===\n");

    // 1. 创建两个管道
    if (pipe(parent_to_child_pipe) == -1 || pipe(child_to_parent_pipe) == -1) {
        perror("管道创建失败");
        exit(EXIT_FAILURE);
    }
    printf("管道创建成功\n");
    printf("父到子管道: 读端=%d, 写端=%d\n", parent_to_child_pipe&#91;0], parent_to_child_pipe&#91;1]);
    printf("子到父管道: 读端=%d, 写端=%d\n", child_to_parent_pipe&#91;0], child_to_parent_pipe&#91;1]);

    // 2. 创建子进程
    pid = fork();
    if (pid == -1) {
        perror("fork 失败");
        close(parent_to_child_pipe&#91;0]);
        close(parent_to_child_pipe&#91;1]);
        close(child_to_parent_pipe&#91;0]);
        close(child_to_parent_pipe&#91;1]);
        exit(EXIT_FAILURE);
    }

    if (pid == 0) {
        // 子进程
        printf("子进程 (PID: %d) 启动\n", getpid());
        
        // 关闭不需要的文件描述符
        close(parent_to_child_pipe&#91;1]);  // 子进程不写入父到子管道
        close(child_to_parent_pipe&#91;0]);  // 子进程不读取子到父管道
        
        // 1. 从父进程接收消息
        printf("子进程等待接收父进程消息...\n");
        ssize_t bytes_read = read(parent_to_child_pipe&#91;0], buffer, BUFFER_SIZE - 1);
        if (bytes_read > 0) {
            buffer&#91;bytes_read] = '\0';
            printf("子进程收到消息: %s", buffer);
            
            // 2. 向父进程发送回复
            const char *reply = "Hello parent! I'm child process.\n";
            ssize_t bytes_written = write(child_to_parent_pipe&#91;1], reply, strlen(reply));
            if (bytes_written == -1) {
                perror("子进程发送回复失败");
            } else {
                printf("子进程发送回复 (%ld 字节)\n", bytes_written);
            }
        }
        
        // 关闭管道
        close(parent_to_child_pipe&#91;0]);
        close(child_to_parent_pipe&#91;1]);
        printf("子进程结束\n");
        exit(EXIT_SUCCESS);
        
    } else {
        // 父进程
        printf("父进程 (PID: %d) 启动\n", getpid());
        
        // 关闭不需要的文件描述符
        close(parent_to_child_pipe&#91;0]);  // 父进程不读取父到子管道
        close(child_to_parent_pipe&#91;1]);  // 父进程不写入子到父管道
        
        // 1. 向子进程发送消息
        const char *message = "Hello child! I'm parent process.\n";
        printf("父进程向子进程发送消息...\n");
        ssize_t bytes_written = write(parent_to_child_pipe&#91;1], message, strlen(message));
        if (bytes_written == -1) {
            perror("父进程发送消息失败");
        } else {
            printf("父进程发送消息 (%ld 字节)\n", bytes_written);
        }
        
        // 2. 等待并接收子进程的回复
        printf("父进程等待子进程回复...\n");
        ssize_t bytes_read = read(child_to_parent_pipe&#91;0], buffer, BUFFER_SIZE - 1);
        if (bytes_read > 0) {
            buffer&#91;bytes_read] = '\0';
            printf("父进程收到回复: %s", buffer);
        }
        
        // 关闭管道
        close(parent_to_child_pipe&#91;1]);
        close(child_to_parent_pipe&#91;0]);
        
        // 等待子进程结束
        int status;
        waitpid(pid, &status, 0);
        printf("父进程结束\n");
    }

    return 0;
}

代码解释:

创建两个管道：一个用于父进程向子进程发送数据，另一个用于子进程向父进程发送数据。2. 在父子进程中分别关闭不需要的管道端。3. 通过协调读写操作，实现双向通信。

示例 3：管道与错误处理

这个例子重点演示管道的错误处理和一些特殊情况。

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/wait.h>
#include <fcntl.h>

#define BUFFER_SIZE 256

// 演示管道错误处理
void demonstrate_pipe_errors() {
    printf("=== 管道错误处理演示 ===\n");
    
    // 1. 传递无效指针
    printf("1. 传递无效指针给 pipe()...\n");
    if (pipe(NULL) == -1) {
        printf("   错误: %s\n", strerror(errno));
        if (errno == EFAULT) {
            printf("   说明: pipe() 参数不能为 NULL\n");
        }
    }
    
    printf("\n");
}

// 演示管道读写特性
void demonstrate_pipe_characteristics() {
    printf("=== 管道特性演示 ===\n");
    
    int pipefd&#91;2];
    if (pipe(pipefd) == -1) {
        perror("pipe 创建失败");
        return;
    }
    
    pid_t pid = fork();
    if (pid == -1) {
        perror("fork 失败");
        close(pipefd&#91;0]);
        close(pipefd&#91;1]);
        return;
    }
    
    if (pid == 0) {
        // 子进程
        close(pipefd&#91;1]); // 关闭写端
        
        char buffer&#91;BUFFER_SIZE];
        printf("子进程: 尝试从空管道读取 (会阻塞)...\n");
        
        // 读取数据
        ssize_t bytes_read = read(pipefd&#91;0], buffer, BUFFER_SIZE - 1);
        if (bytes_read > 0) {
            buffer&#91;bytes_read] = '\0';
            printf("子进程: 读取到数据: %s", buffer);
        } else if (bytes_read == 0) {
            printf("子进程: 读取到文件结束 (所有写端都已关闭)\n");
        } else {
            perror("子进程: 读取失败");
        }
        
        close(pipefd&#91;0]);
        exit(EXIT_SUCCESS);
    } else {
        // 父进程
        close(pipefd&#91;0]); // 关闭读端
        
        // 写入一些数据
        const char *message = "Message from parent\n";
        printf("父进程: 向管道写入数据...\n");
        write(pipefd&#91;1], message, strlen(message));
        
        // 关闭写端，通知子进程结束
        printf("父进程: 关闭写端，通知子进程结束...\n");
        close(pipefd&#91;1]);
        
        wait(NULL);
    }
    
    printf("\n");
}

// 演示管道容量和阻塞行为
void demonstrate_pipe_capacity() {
    printf("=== 管道容量和阻塞行为演示 ===\n");
    printf("注意: 这个演示可能需要较长时间\n");
    
    int pipefd&#91;2];
    if (pipe(pipefd) == -1) {
        perror("pipe 创建失败");
        return;
    }
    
    // 获取管道容量（Linux 特定）
    int pipe_capacity = fcntl(pipefd&#91;1], F_GETPIPE_SZ);
    if (pipe_capacity != -1) {
        printf("管道容量: %d 字节\n", pipe_capacity);
    }
    
    pid_t pid = fork();
    if (pid == -1) {
        perror("fork 失败");
        close(pipefd&#91;0]);
        close(pipefd&#91;1]);
        return;
    }
    
    if (pid == 0) {
        // 子进程：读取端
        close(pipefd&#91;1]); // 关闭写端
        
        sleep(2); // 让父进程先填满管道
        
        char buffer&#91;1024];
        int total_read = 0;
        ssize_t bytes_read;
        
        printf("子进程: 开始读取数据...\n");
        while ((bytes_read = read(pipefd&#91;0], buffer, sizeof(buffer))) > 0) {
            total_read += bytes_read;
            printf("子进程: 读取 %ld 字节，总计 %d 字节\n", bytes_read, total_read);
        }
        
        printf("子进程: 读取完成，总计 %d 字节\n", total_read);
        close(pipefd&#91;0]);
        exit(EXIT_SUCCESS);
    } else {
        // 父进程：写入端
        close(pipefd&#91;0]); // 关闭读端
        
        char data&#91;1024];
        memset(data, 'A', sizeof(data) - 1);
        data&#91;sizeof(data) - 1] = '\0';
        
        int total_written = 0;
        ssize_t bytes_written;
        
        printf("父进程: 开始写入大量数据...\n");
        // 写入数据直到管道满（会阻塞）
        for (int i = 0; i < 100; i++) {
            bytes_written = write(pipefd&#91;1], data, strlen(data));
            if (bytes_written == -1) {
                perror("父进程: 写入失败");
                break;
            }
            total_written += bytes_written;
            printf("父进程: 写入 %ld 字节，总计 %d 字节\n", bytes_written, total_written);
        }
        
        printf("父进程: 写入完成，总计 %d 字节\n", total_written);
        close(pipefd&#91;1]);
        wait(NULL);
    }
    
    printf("\n");
}

// 演示 pipe2 和非阻塞管道
void demonstrate_pipe2_and_nonblocking() {
    printf("=== pipe2 和非阻塞管道演示 ===\n");
    
#ifdef __linux__
    int pipefd&#91;2];
    
    // 使用 pipe2 创建非阻塞管道
    if (pipe2(pipefd, O_NONBLOCK) == -1) {
        perror("pipe2 创建失败");
        printf("可能是因为系统不支持 pipe2\n");
        return;
    }
    
    printf("使用 pipe2 创建了非阻塞管道: 读端=%d, 写端=%d\n", pipefd&#91;0], pipefd&#91;1]);
    
    // 尝试从空的非阻塞管道读取
    char buffer&#91;10];
    ssize_t bytes_read = read(pipefd&#91;0], buffer, sizeof(buffer));
    if (bytes_read == -1) {
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            printf("从空的非阻塞管道读取: %s\n", strerror(errno));
            printf("说明: 非阻塞模式下，没有数据时立即返回错误\n");
        } else {
            perror("读取失败");
        }
    }
    
    close(pipefd&#91;0]);
    close(pipefd&#91;1]);
#else
    printf("pipe2 在此系统上不可用\n");
#endif
    
    printf("\n");
}

int main() {
    printf("管道 (pipe) 函数演示程序\n");
    printf("当前进程 PID: %d\n\n", getpid());
    
    demonstrate_pipe_errors();
    demonstrate_pipe_characteristics();
    // demonstrate_pipe_capacity(); // 这个演示可能需要较长时间，可选择性运行
    demonstrate_pipe2_and_nonblocking();
    
    printf("=== 总结 ===\n");
    printf("管道 (pipe) 关键知识点:\n");
    printf("1. 单向通信: 数据只能从写端流向读端\n");
    printf("2. 亲缘进程: 通常用于父子进程通信\n");
    printf("3. 阻塞特性: 默认情况下读写可能阻塞\n");
    printf("4. 文件结束: 当所有写端关闭时，读端返回 0\n");
    printf("5. 缓冲机制: 内核提供缓冲区存储数据\n");
    printf("6. 错误处理: 注意 EFAULT, EMFILE, ENFILE 等错误\n");
    printf("7. 资源清理: 使用完后必须关闭文件描述符\n");
    printf("8. 双向通信: 需要创建两个管道\n\n");
    
    printf("最佳实践:\n");
    printf("- 及时关闭不需要的管道端\n");
    printf("- 正确处理读写返回值\n");
    printf("- 考虑使用 pipe2 设置 O_CLOEXEC 标志\n");
    printf("- 对于复杂通信，考虑使用命名管道 (FIFO) 或套接字\n");
    
    return 0;
}

代码解释:

demonstrate_pipe_errors 演示了传递无效参数给 pipe() 的错误处理。2. demonstrate_pipe_characteristics 演示了管道的基本读写行为和文件结束条件。3. demonstrate_pipe_capacity 演示了管道的容量限制和阻塞行为（注释掉了，因为可能运行时间较长）。4. demonstrate_pipe2_and_nonblocking 演示了 pipe2 函数和非阻塞模式的使用。5. main 函数协调各个演示部分，并在最后总结关键知识点。

编译和运行:

# 编译示例
gcc -o pipe_example1 pipe_example1.c
gcc -o pipe_example2 pipe_example2.c
gcc -o pipe_example3 pipe_example3.c

# 运行示例
./pipe_example1
./pipe_example2
./pipe_example3

总结:

pipe 函数是 Linux 进程间通信的基础工具之一。它简单高效，特别适合父子进程间的数据传递。理解其单向性、阻塞性和缓冲机制对于正确使用管道至关重要。在实际编程中，要注意及时关闭不需要的文件描述符，正确处理各种返回值和错误情况，并根据需要考虑使用 pipe2 或其他更高级的 IPC 机制。

2025-08-17

Linux系统编程

pipe系统调用及示例

我们继续学习 Linux 系统编程中的重要函数。这次我们介绍 pipe 函数，它是实现进程间通信 (IPC - Inter-Process Communication) 的基础机制之一，尤其适用于具有亲缘关系的进程（如父子进程、兄弟进程）之间进行单向数据传输。

1. 函数介绍

pipe 是一个 Linux 系统调用，用于创建一个匿名管道 (anonymous pipe)。管道是一种半双工（单向）的通信通道，具有固定的读端和写端。

你可以把管道想象成一个单向的水管或传送带：

一端是写入端 (write end)：数据被“放入”管道。
另一端是读取端 (read end)：数据从管道中被“取出”。
数据在管道内部按照先进先出 (FIFO) 的顺序流动。
管道有有限的容量（通常由 PIPE_BUF 常量定义，Linux 上通常是 65536 字节）。如果管道满了，写入操作会阻塞；如果管道空了，读取操作会阻塞。

匿名管道最常见的用途是在相关进程（通过 fork 创建的父子进程或兄弟进程）之间传递数据。

2. 函数原型

#include <unistd.h> // 必需

int pipe(int pipefd&#91;2]);

3. 功能

创建管道: 请求内核创建一个新的匿名管道。

返回文件描述符: 在成功创建后，将两个关联的文件描述符通过 pipefd 数组返回给调用者：

pipefd[0]: 读端 (read end) 的文件描述符。
`pipefd[1]**: 写端 (write end) 的文件描述符。

初始化状态: 刚创建时，管道是空的。

4. 参数

int pipefd[2]: 这是一个包含两个整数的数组，用于接收 pipe 调用返回的文件描述符。

pipefd[0]: 管道的读取端。进程可以对此文件描述符调用 read 来获取数据。
pipefd[1]: 管道的写入端。进程可以对此文件描述符调用 write 来放入数据。

5. 返回值

成功时: 返回 0。同时，pipefd[0] 和 pipefd[1] 被填充为有效的文件描述符。
失败时: 返回 -1，并设置全局变量 errno 来指示具体的错误原因（例如 EMFILE 进程打开的文件描述符已达上限，ENFILE 系统打开的文件总数已达上限等）。

6. 相似函数，或关联函数

socketpair: 创建一对相互连接的匿名套接字，可以实现双向进程间通信。
命名管道 (FIFO): 通过 mkfifo 或 mknod 创建的特殊文件，允许无亲缘关系的进程进行通信。
read, write: 用于对管道的读端和写端进行实际的数据传输。
close: 用于关闭管道的读端或写端。关闭写端会使读端在数据读完后 read 返回 0（EOF）；关闭读端会使写端 write 产生 SIGPIPE 信号（默认终止进程）。
fork: 通常与 pipe 结合使用，子进程和父进程通过继承的管道文件描述符进行通信。

7. 示例代码

示例 1：父子进程通过管道通信

这个经典的例子演示了如何使用 pipe 在父进程和子进程之间传递数据。

#include <unistd.h>  // pipe, fork, read, write, close
#include <sys/wait.h> // wait
#include <stdio.h>   // perror, printf
#include <stdlib.h>  // exit
#include <string.h>  // strlen

int main() {
    int pipefd&#91;2];          // 用于存储管道的两个文件描述符
    pid_t cpid;             // 子进程 ID
    char buf;               // 用于逐字节读取的缓冲区

    // 1. 创建管道
    if (pipe(pipefd) == -1) {
        perror("pipe");
        exit(EXIT_FAILURE);
    }

    // 2. 创建子进程
    cpid = fork();
    if (cpid == -1) {
        perror("fork");
        // 创建子进程失败，需要关闭已创建的管道
        close(pipefd&#91;0]);
        close(pipefd&#91;1]);
        exit(EXIT_FAILURE);
    }

    // 3. 根据进程 ID 执行不同代码
    if (cpid == 0) { // 子进程执行代码
        // --- 子进程 ---
        // 关闭不需要的写端
        if (close(pipefd&#91;1]) == -1) {
            perror("child: close write end");
            _exit(EXIT_FAILURE); // 子进程中使用 _exit
        }

        printf("Child process (PID %d): Reading from pipe...\n", getpid());

        // 从管道读端读取数据，直到遇到 EOF
        while (read(pipefd&#91;0], &buf, 1) > 0) {
            write(STDOUT_FILENO, &buf, 1); // 写入到标准输出 (屏幕)
        }

        // 检查 read 是否因错误而失败
        if (read(pipefd&#91;0], &buf, 1) == -1) {
            perror("child: read");
            _exit(EXIT_FAILURE);
        }

        printf("Child process: Finished reading. Exiting.\n");

        // 关闭读端
        if (close(pipefd&#91;0]) == -1) {
            perror("child: close read end");
            _exit(EXIT_FAILURE);
        }

        _exit(EXIT_SUCCESS); // 子进程成功退出

    } else { // 父进程执行代码
        // --- 父进程 ---
        // 关闭不需要的读端
        if (close(pipefd&#91;0]) == -1) {
            perror("parent: close read end");
            // 清理子进程？
            exit(EXIT_FAILURE);
        }

        const char *message = "Message from parent to child through pipe!\n";

        printf("Parent process (PID %d): Writing to pipe...\n", getpid());

        // 向管道写端写入数据
        if (write(pipefd&#91;1], message, strlen(message)) != (ssize_t)strlen(message)) {
            perror("parent: write");
            // 可能需要 kill 子进程
            exit(EXIT_FAILURE);
        }

        printf("Parent process: Message sent. Closing write end.\n");

        // 关闭写端，这会使子进程的 read() 在读完数据后返回 0 (EOF)
        if (close(pipefd&#91;1]) == -1) {
            perror("parent: close write end");
            exit(EXIT_FAILURE);
        }

        // 等待子进程结束
        int status;
        if (wait(&status) == -1) {
            perror("parent: wait");
            exit(EXIT_FAILURE);
        }

        if (WIFEXITED(status)) {
            printf("Parent process: Child exited with status %d.\n", WEXITSTATUS(status));
        } else {
            printf("Parent process: Child did not exit normally.\n");
        }
    }

    return 0;
}

代码解释:

调用 pipe(pipefd) 创建管道，成功后 pipefd[0] 是读端，pipefd[1] 是写端。2. 调用 fork() 创建子进程。fork 之后，父子进程都拥有管道两端的文件描述符副本。3. 子进程 (cpid == 0):* 关闭不需要的写端 pipefd[1]。* 进入循环，调用 read(pipefd[0], &buf, 1) 从管道读取数据（一次读一个字节）。* 将读到的字节写入标准输出。* 当 read 返回 0 时，表示已到达 EOF（因为父进程关闭了写端），循环结束。* 关闭读端 pipefd[0]。* 使用 _exit() 退出（在子进程中通常推荐使用 _exit 而非 exit，以避免刷新 stdio 缓冲区可能带来的问题）。4. 父进程 (cpid > 0):* 关闭不需要的读端 pipefd[0]。* 定义要发送的消息。* 调用 write(pipefd[1], message, …) 将消息写入管道。* 关闭写端 pipefd[1]。这一步很重要，它会通知子进程数据已发送完毕（读端 read 会返回 0）。* 调用 wait() 等待子进程结束，并检查其退出状态。

示例 2：使用管道实现简单的命令行管道 (ls | wc -l)

这个例子模拟了 shell 中 ls | wc -l 的功能，即列出当前目录内容并统计行数。

#include <unistd.h>  // pipe, fork, dup2, execvp, close
#include <sys/wait.h> // wait
#include <stdio.h>   // perror, fprintf, stderr
#include <stdlib.h>  // exit

int main() {
    int pipefd&#91;2];
    pid_t pid1, pid2;

    // 1. 创建管道
    if (pipe(pipefd) == -1) {
        perror("pipe");
        exit(EXIT_FAILURE);
    }

    // 2. 创建第一个子进程来执行 'ls'
    pid1 = fork();
    if (pid1 == -1) {
        perror("fork ls");
        close(pipefd&#91;0]);
        close(pipefd&#91;1]);
        exit(EXIT_FAILURE);
    }

    if (pid1 == 0) { // 第一个子进程
        // --- 'ls' 进程 ---
        // 关闭不需要的读端
        close(pipefd&#91;0]);

        // 将标准输出重定向到管道的写端
        // dup2(oldfd, newfd): 关闭 newfd, 然后使 newfd 成为 oldfd 的副本
        if (dup2(pipefd&#91;1], STDOUT_FILENO) == -1) {
            perror("dup2 ls");
            _exit(EXIT_FAILURE);
        }

        // 关闭原始的管道写端文件描述符 (因为已经复制到 STDOUT_FILENO)
        close(pipefd&#91;1]);

        // 执行 'ls' 命令
        // execlp 在 PATH 中查找程序
        execlp("ls", "ls", (char *)NULL);

        // 如果 execlp 返回，说明执行失败
        perror("execlp ls failed");
        _exit(EXIT_FAILURE);
    }

    // 3. 创建第二个子进程来执行 'wc -l'
    pid2 = fork();
    if (pid2 == -1) {
        perror("fork wc");
        // 可能需要 kill pid1?
        close(pipefd&#91;0]);
        close(pipefd&#91;1]);
        exit(EXIT_FAILURE);
    }

    if (pid2 == 0) { // 第二个子进程
        // --- 'wc -l' 进程 ---
        // 关闭不需要的写端
        close(pipefd&#91;1]);

        // 将标准输入重定向到管道的读端
        if (dup2(pipefd&#91;0], STDIN_FILENO) == -1) {
            perror("dup2 wc");
            _exit(EXIT_FAILURE);
        }

        // 关闭原始的管道读端文件描述符
        close(pipefd&#91;0]);

        // 执行 'wc -l' 命令
        char *cmd&#91;] = {"wc", "-l", NULL};
        execvp(cmd&#91;0], cmd); // execvp 需要 char *const argv&#91;]

        // 如果 execvp 返回，说明执行失败
        perror("execvp wc failed");
        _exit(EXIT_FAILURE);
    }

    // 4. 父进程
    // 父进程不需要使用管道，所以关闭两端
    close(pipefd&#91;0]);
    close(pipefd&#91;1]);

    // 等待两个子进程结束
    // 注意：waitpid 可能更精确地等待特定子进程
    int status;
    if (waitpid(pid1, &status, 0) == -1) {
        perror("waitpid ls");
    }
    if (waitpid(pid2, &status, 0) == -1) {
        perror("waitpid wc");
    }

    printf("Parent process: Both 'ls' and 'wc -l' have finished.\n");

    return 0;
}

代码解释:

调用 pipe(pipefd) 创建管道。2. 第一次 fork() 创建子进程 pid1。3. 在 pid1 子进程中：* 关闭管道读端。* 使用 dup2(pipefd[1], STDOUT_FILENO) 将子进程的标准输出 (STDOUT_FILENO，即文件描述符 1) 重定向到管道的写端。这意味着 ls 命令的所有输出都会被写入管道。* 关闭原始的管道写端文件描述符 pipefd[1]。* 调用 execlp(“ls”, “ls”, NULL) 执行 ls 命令。因为标准输出已被重定向，ls 的输出会进入管道。4. 第二次 fork() 创建子进程 pid2。5. 在 pid2 子进程中：* 关闭管道写端。* 使用 dup2(pipefd[0], STDIN_FILENO) 将子进程的标准输入 (STDIN_FILENO，即文件描述符 0) 重定向到管道的读端。这意味着 wc 命令会从管道读取输入。* 关闭原始的管道读端文件描述符 pipefd[0]。* 调用 execvp(“wc”, cmd) 执行 wc -l 命令。因为标准输入已被重定向，wc 会从管道读取数据并统计行数，结果输出到标准输出（通常是屏幕）。6. 父进程：* 关闭自己的管道文件描述符（不再需要）。* 调用 waitpid 等待两个子进程结束。

这个例子很好地展示了管道如何连接两个进程的标准输入和输出，从而实现数据流的传递，就像在 shell 中使用 | 一样。

总结:

pipe 函数是 Linux 进程间通信的基础工具之一。它创建的匿名管道简单高效，特别适合于有亲缘关系的进程之间的单向数据传输。理解其与 fork、dup2、read、write 等函数的配合使用是掌握 Linux IPC 的关键。

2025-08-17

AI编程

Python二进制文件编码探测工具

背景实现基于python语言cchardet库的二进制文件分析程序，按照预设分段参数对文件进行读取和cchardet的文本编码探测。脚本具备跳过文件头n字节，按照m字节分段二进制文件及分段后数据连续4字节探测功能。结果输出会展示每段的序号，偏移起始，片内置信度识别偏移字节，片大小，编码方式，置信度，高置信度提示信息字段；

如何使用脚本：

# 1. 基本用法：分析整个文件
python encoding_detector.py myfile.bin

# 2. 指定块大小
python encoding_detector.py -s 512 myfile.bin

# 3. 跳过每个块的前 10 个字节
python encoding_detector.py -s 100 -h 10 myfile.bin

# 4. 从文件偏移 1116 开始分析
python encoding_detector.py -s 100 -o 1116 ../ftp-pcap/ftp-utf8-long.pcap

# 5. 结合使用：从偏移 1000 开始，每块 256 字节，跳过每块前 20 字节
python encoding_detector.py -s 256 -h 20 -o 1000 myfile.bin

# 6. 通过管道输入
cat myfile.bin | python encoding_detector.py -s 512

Python脚本是实现

#!/usr/bin/env python3
import cchardet
import sys
import os

def print_hex(data, width=16):
    """以十六进制和ASCII形式打印字节数据"""
    for i in range(0, len(data), width):
        # 十六进制部分
        hex_part = ' '.join(f'{byte:02x}' for byte in data&#91;i:i+width])
        # ASCII部分 (可打印字符或'.')
        ascii_part = ''.join(chr(byte) if 32 <= byte <= 126 else '.' for byte in data&#91;i:i+width])
        # 打印地址偏移、十六进制和ASCII
        print(f'{i:08x}: {hex_part:<{width*3}} |{ascii_part}|')

def detect_chunks_from_file(filename, chunk_size=1024, from_head_bytes=0, from_file_offset=0):
    """
    将文件按指定大小切块，并对每个块进行编码检测。
    如果检测置信度为0，则尝试偏移1-4字节重新检测。
    from_file_offset: 从文件的哪个字节偏移开始读取。
    """
    if not os.path.exists(filename):
        print(f"Error: File '{filename}' does not exist.", file=sys.stderr)
        return

    try:
        file_size = os.path.getsize(filename)
        print(f"Analyzing file: {filename} (Total size: {file_size} bytes)")
        print(f"Chunk size: {chunk_size} bytes")
        if from_head_bytes > 0:
            print(f"Skipping first {from_head_bytes} bytes of each chunk for detection.")
        if from_file_offset > 0:
            print(f"Starting analysis from file offset: {from_file_offset}")
        print("-" * 50)

        with open(filename, 'rb') as f:
            # 定位到文件的起始偏移
            if from_file_offset > 0:
                f.seek(from_file_offset)
            
            chunk_number = 0
            while True:
                chunk_data = f.read(chunk_size)
                if not chunk_
                    break

                # 计算当前块在原始文件中的基础偏移量
                offset = from_file_offset + chunk_number * chunk_size

                # 裁剪用于检测的数据（跳过头部字节）
                detection_data = chunk_data&#91;from_head_bytes:] if len(chunk_data) > from_head_bytes else b''

                # --- 初始检测 ---
                encoding = None
                confidence = 0.0
                offset_by_used = 0 # 记录最终使用的偏移量

                if len(detection_data) > 0:
                    try:
                        result = cchardet.detect(detection_data)
                        if isinstance(result, dict):
                            encoding = result.get('encoding')
                            temp_confidence = result.get('confidence')
                            if temp_confidence is None:
                                confidence = 0.0
                            else:
                                confidence = temp_confidence
                            
                            if encoding is not None and not isinstance(encoding, str):
                                print(f"Warning: Unexpected encoding type in chunk {chunk_number}: {type(encoding)}", file=sys.stderr)
                                encoding = str(encoding) if encoding is not None else None
                        else:
                            print(f"Warning: cchardet returned unexpected type in chunk {chunk_number}: {type(result)}", file=sys.stderr)
                    except Exception as e:
                        print(f"Warning: cchardet failed on chunk {chunk_number}: {e}", file=sys.stderr)
                        encoding = "Error"
                        confidence = 0.0

                # --- 偏移优化逻辑 ---
                max_offset_attempts = 4
                if confidence == 0.0 and len(detection_data) > max_offset_attempts:
                    for offset_by in range(1, max_offset_attempts + 1):
                        if len(detection_data) > offset_by:
                            adjusted_detection_data = detection_data&#91;offset_by:]
                            if len(adjusted_detection_data) > 0:
                                try:
                                    adjusted_result = cchardet.detect(adjusted_detection_data)
                                    if isinstance(adjusted_result, dict):
                                        adjusted_confidence = adjusted_result.get('confidence')
                                        if adjusted_confidence is None:
                                            adjusted_confidence = 0.0
                                        
                                        if adjusted_confidence > confidence:
                                            encoding = adjusted_result.get('encoding')
                                            confidence = adjusted_confidence
                                            offset_by_used = offset_by # 记录使用的偏移量
                                            
                                            if confidence > 0.0:
                                                break
                                except Exception:
                                    pass
                        else:
                            break

                # --- 格式化输出 ---
                encoding_display = encoding if encoding is not None else "N/A"
                output_line = (f"Chunk {chunk_number:4d} | Offset {offset:8d} | "
                               f"offset_by {offset_by_used:2d} | "
                               f"Size {len(chunk_data):4d} | "
                               f"Encoding: {encoding_display:>12} | "
                               f"Confidence: {confidence:6.4f}")
                
                # 可以根据置信度调整输出格式，例如高亮高置信度结果
                if confidence >= 0.75:
                     print(output_line) # 或用不同颜色/符号标记，这里简化为普通打印
                else:
                     print(output_line)

                # 如果置信度为0，可以选择打印数据内容（当前被注释掉）
                # if confidence == 0.0 and len(chunk_data) > 0:
                #     print ("\n")
                #     print_hex(chunk_data)
                #     print ("\n")
                    
                chunk_number += 1

            # 文件读取结束后的检查
            # f.tell() 在 seek 后返回的是绝对位置
            absolute_tell = f.tell()
            if absolute_tell < file_size:
                 print(f"Warning: Stopped reading before end of file '{filename}'. "
                      f"Read up to file offset {absolute_tell} bytes out of {file_size} bytes.", file=sys.stderr)

    except IOError as e:
        print(f"Error reading file '{filename}': {e}", file=sys.stderr)
    except Exception as e:
        print(f"An unexpected error occurred while processing '{filename}': {e}", file=sys.stderr)
    
    print("-" * 50 + f" Analysis of '{filename}' finished. " + "-" * 10 + "\n")

def detect_chunks_from_bytes(data, source_name="Byte Input", chunk_size=1024, from_head_bytes=0):
    """
    将字节数据按指定大小切块，并对每个块进行编码检测。
    如果检测置信度为0，则尝试偏移1-3字节重新检测。
    """
    data_len = len(data)
    print(f"Analyzing data from: {source_name} (Total size: {data_len} bytes)")
    print(f"Chunk size: {chunk_size} bytes")
    if from_head_bytes > 0:
        print(f"Skipping first {from_head_bytes} bytes of each chunk for detection.")
    print("-" * 50)

    if data_len == 0:
        print("Input data is empty.")
        return

    chunk_number = 0
    for i in range(0, data_len, chunk_size):
        chunk_data = data&#91;i:i + chunk_size]
        if not chunk_
            break

        offset = i
        detection_data = chunk_data&#91;from_head_bytes:] if len(chunk_data) > from_head_bytes else b''

        encoding = None
        confidence = 0.0
        
        if len(detection_data) > 0:
            try:
                result = cchardet.detect(detection_data)
                if isinstance(result, dict):
                    encoding = result.get('encoding')
                    temp_confidence = result.get('confidence')
                    if temp_confidence is None:
                        confidence = 0.0
                    else:
                        confidence = temp_confidence
                    
                    if encoding is not None and not isinstance(encoding, str):
                        print(f"Warning: Unexpected encoding type in chunk {chunk_number}: {type(encoding)}", file=sys.stderr)
                        encoding = str(encoding) if encoding is not None else None
                else:
                    print(f"Warning: cchardet returned unexpected type in chunk {chunk_number}: {type(result)}", file=sys.stderr)
            except Exception as e:
                print(f"Warning: cchardet failed on chunk {chunk_number}: {e}", file=sys.stderr)
                encoding = "Error"
                confidence = 0.0

        # --- 偏移优化逻辑 (针对 bytes 输入)---
        max_offset_attempts = 3
        offset_by_used = 0
        if confidence == 0.0 and len(detection_data) > max_offset_attempts:
            for offset_by in range(1, max_offset_attempts + 1):
                if len(detection_data) > offset_by:
                    adjusted_detection_data = detection_data&#91;offset_by:]
                    if len(adjusted_detection_data) > 0:
                        try:
                            adjusted_result = cchardet.detect(adjusted_detection_data)
                            if isinstance(adjusted_result, dict):
                                adjusted_confidence = adjusted_result.get('confidence')
                                if adjusted_confidence is None:
                                    adjusted_confidence = 0.0
                                
                                if adjusted_confidence > confidence:
                                    encoding = adjusted_result.get('encoding')
                                    confidence = adjusted_confidence
                                    offset_by_used = offset_by
                                    
                                    if confidence > 0.0:
                                        break
                        except Exception:
                            pass
                else:
                    break

        # 格式化输出 (bytes 输入也显示 offset_by)
        encoding_display = encoding if encoding is not None else "N/A"
        print(f"Chunk {chunk_number:4d} | Offset {offset:8d} | "
              f"offset_by {offset_by_used:2d} | " # 添加 offset_by 显示
              f"Size {len(chunk_data):4d} | "
              f"Encoding: {encoding_display:>12} | "
              f"Confidence: {confidence:6.4f}")

        # 如果置信度为0，打印数据内容
        # if confidence == 0.0 and len(chunk_data) > 0:
        #     print ("\n")
        #     print_hex(chunk_data)
        #     print ("\n")

        chunk_number += 1

    print("-" * 50 + f" Analysis of '{source_name}' finished. " + "-" * 10 + "\n")

def main():
    """
    主函数，处理命令行参数并调用相应的检测函数。
    """
    if len(sys.argv) < 2:
        print("No filename provided. Reading binary data from STDIN...", file=sys.stderr)
        try:
            data = sys.stdin.buffer.read()
            detect_chunks_from_bytes(data, source_name="STDIN", chunk_size=1024)
        except KeyboardInterrupt:
            print("\nInterrupted by user.", file=sys.stderr)
        except Exception as e:
            print(f"Error reading from STDIN: {e}", file=sys.stderr)
        sys.exit(0)

    # 默认参数
    chunk_size = 1024
    from_head_bytes = 0
    from_file_offset = 0 # 新增默认参数
    filenames = &#91;]

    # 解析命令行参数
    i = 1
    while i < len(sys.argv):
        if sys.argv&#91;i] == '-s':
            if i + 1 < len(sys.argv):
                try:
                    chunk_size = int(sys.argv&#91;i + 1])
                    if chunk_size <= 0:
                        raise ValueError("Chunk size must be positive.")
                    i += 2
                except ValueError as e:
                    print(f"Error: Invalid chunk size '-s {sys.argv&#91;i + 1]}': {e}", file=sys.stderr)
                    sys.exit(1)
            else:
                print("Error: Option '-s' requires an argument.", file=sys.stderr)
                sys.exit(1)
        elif sys.argv&#91;i] == '-h':
             if i + 1 < len(sys.argv):
                try:
                    from_head_bytes = int(sys.argv&#91;i + 1])
                    if from_head_bytes < 0:
                        raise ValueError("Head bytes to skip must be non-negative.")
                    i += 2
                except ValueError as e:
                    print(f"Error: Invalid head bytes '-h {sys.argv&#91;i + 1]}': {e}", file=sys.stderr)
                    sys.exit(1)
             else:
                print("Error: Option '-h' requires an argument.", file=sys.stderr)
                sys.exit(1)
        # --- 新增：解析 -o 参数 ---
        elif sys.argv&#91;i] == '-o':
             if i + 1 < len(sys.argv):
                try:
                    from_file_offset = int(sys.argv&#91;i + 1])
                    if from_file_offset < 0:
                        raise ValueError("File offset must be non-negative.")
                    i += 2
                except ValueError as e:
                    print(f"Error: Invalid file offset '-o {sys.argv&#91;i + 1]}': {e}", file=sys.stderr)
                    sys.exit(1)
             else:
                print("Error: Option '-o' requires an argument.", file=sys.stderr)
                sys.exit(1)
        # --- 新增结束 ---
        else:
            filenames.append(sys.argv&#91;i])
            i += 1

    if not filenames:
        print("Error: No filename provided.", file=sys.stderr)
        sys.exit(1)

    # 对每个提供的文件进行处理
    for filename in filenames:
        # --- 修改：传递 from_file_offset 参数 ---
        detect_chunks_from_file(filename, chunk_size, from_head_bytes, from_file_offset)

if __name__ == "__main__":
    main()

关键词（keywords）：文本编码，二进制，python文本编码，cchardet

2025-08-17

Linux系统编程

seccomp系统调用及示例

函数介绍

seccomp 是Linux系统调用过滤机制，用于限制进程可以执行的系统调用。它通过Berkeley Packet Filter (BPF) 程序来定义哪些系统调用是允许的，哪些是禁止的。seccomp 是构建沙箱环境、提高应用程序安全性的重要工具，可以有效防止恶意代码执行危险的系统调用。

函数原型

#include <linux/seccomp.h>
#include <linux/filter.h>
#include <sys/prctl.h>
#include <unistd.h>

int prctl(int option, unsigned long arg2, unsigned long arg3, 
          unsigned long arg4, unsigned long arg5);

int seccomp(unsigned int operation, unsigned int flags, void *args);

功能

seccomp 提供了系统调用级别的安全控制，可以：

限制进程可执行的系统调用集合
定义系统调用的执行策略（允许、错误、终止）
使用BPF程序实现复杂的过滤逻辑
构建安全的沙箱环境

参数

prctl方式：

int option: 控制选项（如PR_SET_SECCOMP）
unsigned long arg2: seccomp模式（SECCOMP_MODE_STRICT/SECCOMP_MODE_FILTER）
其他参数: 根据选项而定

seccomp系统调用：

unsigned int operation: 操作类型（SECCOMP_SET_MODE_STRICT/SECCOMP_SET_MODE_FILTER）
unsigned int flags: 标志位（通常为0）
*void args: 操作参数（BPF程序指针等）

返回值

成功: 返回0
失败: 返回-1，并设置errno

相似函数，或关联函数

prctl: 进程控制接口
personality: 设置进程执行特性
chroot: 改变根目录
capset: 设置进程权限

示例代码

示例1：基础seccomp使用

#define _GNU_SOURCE
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/syscall.h>

/**
 * 演示基础seccomp使用方法
 */
int demo_seccomp_basic() {
    printf("=== 基础seccomp使用示例 ===\n");
    
    // 显示当前seccomp状态
    int current_mode = prctl(PR_GET_SECCOMP, 0, 0, 0, 0);
    printf("当前seccomp模式: ");
    switch (current_mode) {
        case 0:
            printf("SECCOMP_MODE_DISABLED (禁用)\n");
            break;
        case 1:
            printf("SECCOMP_MODE_STRICT (严格模式)\n");
            break;
        case 2:
            printf("SECCOMP_MODE_FILTER (过滤模式)\n");
            break;
        default:
            printf("未知模式 (%d)\n", current_mode);
            break;
    }
    
    // 测试普通系统调用（应该成功）
    printf("测试普通系统调用...\n");
    write(STDOUT_FILENO, "  普通write调用成功\n", 21);
    
    // 启用严格模式seccomp
    printf("启用seccomp严格模式...\n");
    if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT, 0, 0, 0) == -1) {
        printf("启用seccomp失败: %s\n", strerror(errno));
        printf("注意：严格模式只允许read/write/exit/exit_group系统调用\n");
        return -1;
    }
    
    printf("seccomp严格模式启用成功\n");
    printf("当前seccomp模式: %d\n", prctl(PR_GET_SECCOMP, 0, 0, 0, 0));
    
    // 测试允许的系统调用
    printf("测试允许的系统调用...\n");
    write(STDOUT_FILENO, "  write调用仍然允许\n", 20);
    
    // 测试不允许的系统调用（这会导致程序终止）
    printf("测试不允许的系统调用（程序将终止）...\n");
    printf("  尝试调用getpid()...\n");
    
    // 注意：下面的调用会导致程序被SIGKILL终止
    // 为了演示目的，我们注释掉危险操作
    /*
    pid_t pid = getpid();  // 这会导致程序终止！
    printf("getpid()返回: %d\n", pid);
    */
    
    printf("  注意：getpid()等系统调用在严格模式下会被禁止\n");
    printf("  实际执行会导致程序被SIGKILL终止\n");
    
    return 0;
}

int main() {
    return demo_seccomp_basic();
}

示例2：自定义BPF过滤器

#define _GNU_SOURCE
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/syscall.h>

/**
 * 创建允许特定系统调用的BPF过滤器
 */
int demo_custom_bpf_filter() {
    printf("=== 自定义BPF过滤器示例 ===\n");
    
    // 定义BPF过滤器程序
    // 允许的系统调用：read, write, exit, exit_group
    struct sock_filter filter&#91;] = {
        // 加载系统调用号到累加器
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)),
        
        // 允许 read 系统调用 (SYS_read = 0)
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_read, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        
        // 允许 write 系统调用 (SYS_write = 1)
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_write, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        
        // 允许 exit 系统调用 (SYS_exit = 60 on x86_64)
#ifdef __x86_64__
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 60, 0, 1),
#elif defined(__i386__)
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 1, 0, 1),
#endif
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        
        // 允许 exit_group 系统调用
#ifdef __x86_64__
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 231, 0, 1),
#elif defined(__i386__)
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 252, 0, 1),
#endif
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        
        // 其他系统调用返回EPERM错误
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ERRNO | (EPERM & 0xFFFF)),
    };
    
    struct sock_fprog prog = {
        .len = sizeof(filter) / sizeof(filter&#91;0]),
        .filter = filter,
    };
    
    // 显示过滤器信息
    printf("创建BPF过滤器，允许系统调用:\n");
    printf("  read(%d), write(%d), exit(%d), exit_group(%d)\n", 
#ifdef __x86_64__
           SYS_read, SYS_write, 60, 231
#elif defined(__i386__)
           SYS_read, SYS_write, 1, 252
#endif
    );
    printf("其他系统调用将返回EPERM错误\n");
    
    // 应用BPF过滤器
    if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog, 0, 0) == -1) {
        printf("应用BPF过滤器失败: %s\n", strerror(errno));
        printf("可能的原因：\n");
        printf("  1. 内核不支持seccomp BPF\n");
        printf("  2. 缺少CAP_SYS_ADMIN权限\n");
        printf("  3. 已经设置了seccomp策略\n");
        return -1;
    }
    
    printf("BPF过滤器应用成功\n");
    
    // 测试允许的系统调用
    printf("\n测试允许的系统调用:\n");
    write(STDOUT_FILENO, "  write调用成功\n", 16);
    
    char buffer&#91;10];
    ssize_t bytes_read = read(STDIN_FILENO, buffer, sizeof(buffer));
    if (bytes_read >= 0) {
        printf("  read调用成功\n");
    }
    
    // 测试不允许的系统调用
    printf("\n测试不允许的系统调用:\n");
    long result = syscall(SYS_getpid);
    if (result == -1) {
        printf("  getpid调用被阻止: %s\n", strerror(errno));
    } else {
        printf("  getpid调用意外成功: %ld\n", result);
    }
    
    result = syscall(SYS_open, "/etc/passwd", 0);
    if (result == -1) {
        printf("  open调用被阻止: %s\n", strerror(errno));
    } else {
        printf("  open调用意外成功: %ld\n", result);
    }
    
    printf("\n安全的系统调用仍然可以正常工作\n");
    
    return 0;
}

int main() {
    return demo_custom_bpf_filter();
}

示例3：只读沙箱环境

#define _GNU_SOURCE
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <sys/stat.h>

/**
 * 创建只读沙箱环境的BPF过滤器
 */
int demo_readonly_sandbox() {
    printf("=== 只读沙箱环境示例 ===\n");
    
    // 定义只读沙箱的BPF过滤器
    // 允许读操作和基本系统调用，禁止写操作
    struct sock_filter filter&#91;] = {
        // 加载系统调用号
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)),
        
        // 允许 read 系统调用
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_read, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        
        // 允许 write 系统调用（仅允许写到stdout/stderr）
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_write, 0, 5),
        // 检查文件描述符是否为stdout(1)或stderr(2)
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, args&#91;0])),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 1, 0, 1),  // stdout
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 2, 0, 1),  // stderr
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ERRNO | (EPERM & 0xFFFF)),
        
        // 允许 exit 和 exit_group
#ifdef __x86_64__
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 60, 0, 1),   // exit
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 231, 0, 1),  // exit_group
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
#elif defined(__i386__)
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 1, 0, 1),    // exit
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 252, 0, 1),  // exit_group
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
#endif
        
        // 允许 read-only 文件操作
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_open, 0, 3),
        // 检查打开标志是否包含O_RDONLY
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, args&#91;1])),
        BPF_JUMP(BPF_JMP | BPF_JSET | BPF_K, O_RDONLY, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ERRNO | (EPERM & 0xFFFF)),
        
        // 允许 close 系统调用
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_close, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        
        // 禁止其他所有系统调用
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ERRNO | (EPERM & 0xFFFF)),
    };
    
    struct sock_fprog prog = {
        .len = sizeof(filter) / sizeof(filter&#91;0]),
        .filter = filter,
    };
    
    printf("创建只读沙箱环境\n");
    printf("允许的操作：\n");
    printf("  - 读取文件（只读模式）\n");
    printf("  - 写入标准输出和标准错误\n");
    printf("  - 基本的进程控制\n");
    printf("禁止的操作：\n");
    printf("  - 写入文件\n");
    printf("  - 网络操作\n");
    printf("  - 进程创建\n");
    printf("  - 其他危险操作\n");
    
    // 应用过滤器
    if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog, 0, 0) == -1) {
        printf("创建沙箱失败: %s\n", strerror(errno));
        return -1;
    }
    
    printf("只读沙箱创建成功\n");
    
    // 测试沙箱功能
    printf("\n=== 沙箱功能测试 ===\n");
    
    // 测试允许的读操作
    printf("1. 测试允许的读操作:\n");
    int fd = open("/etc/passwd", O_RDONLY);
    if (fd != -1) {
        char buffer&#91;100];
        ssize_t bytes = read(fd, buffer, sizeof(buffer));
        if (bytes > 0) {
            printf("  读取/etc/passwd成功 (%zd 字节)\n", bytes);
        }
        close(fd);
    } else {
        printf("  打开/etc/passwd失败: %s\n", strerror(errno));
    }
    
    // 测试允许的写操作（stdout/stderr）
    printf("\n2. 测试允许的写操作:\n");
    write(STDOUT_FILENO, "  写入stdout成功\n", 17);
    write(STDERR_FILENO, "  写入stderr成功\n", 17);
    
    // 测试禁止的写操作
    printf("\n3. 测试禁止的写操作:\n");
    fd = open("/tmp/test_seccomp", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) {
        printf("  创建文件被阻止: %s\n", strerror(errno));
    } else {
        printf("  创建文件意外成功\n");
        close(fd);
        unlink("/tmp/test_seccomp");
    }
    
    // 测试禁止的系统调用
    printf("\n4. 测试禁止的系统调用:\n");
    long result = syscall(SYS_fork);
    if (result == -1) {
        printf("  fork被阻止: %s\n", strerror(errno));
    }
    
    result = syscall(SYS_socket, AF_INET, SOCK_STREAM, 0);
    if (result == -1) {
        printf("  socket被阻止: %s\n", strerror(errno));
    }
    
    printf("\n沙箱环境测试完成\n");
    
    return 0;
}

int main() {
    return demo_readonly_sandbox();
}

示例4：进程监控和日志

#define _GNU_SOURCE
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/syscall.h>
#include <signal.h>
#include <sys/wait.h>

/**
 * 信号处理函数
 */
void signal_handler(int sig) {
    printf("捕获信号 %d\n", sig);
    if (sig == SIGSYS) {
        printf("检测到被禁止的系统调用\n");
    }
}

/**
 * 演示seccomp的监控和日志功能
 */
int demo_seccomp_monitoring() {
    printf("=== seccomp监控和日志示例 ===\n");
    
    // 注册信号处理程序来捕获SIGSYS
    signal(SIGSYS, signal_handler);
    
    // 创建带日志的BPF过滤器
    struct sock_filter filter&#91;] = {
        // 加载系统调用号
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)),
        
        // 允许基本的读写操作
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_read, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_write, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        
        // 允许exit相关调用
#ifdef __x86_64__
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 60, 0, 1),   // exit
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 231, 0, 1),  // exit_group
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
#endif
        
        // 对于其他系统调用，返回追踪标志（用于日志）
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_TRACE | (1 & 0xFFFF)),
    };
    
    struct sock_fprog prog = {
        .len = sizeof(filter) / sizeof(filter&#91;0]),
        .filter = filter,
    };
    
    printf("创建带监控的日志过滤器\n");
    printf("SECCOMP_RET_TRACE可以用于:\n");
    printf("  - 系统调用追踪\n");
    printf("  - 安全审计\n");
    printf("  - 调试和分析\n");
    
    // 启用seccomp
    if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) == -1) {
        printf("启用seccomp失败: %s\n", strerror(errno));
        return -1;
    }
    
    printf("seccomp监控启用成功\n");
    
    // 测试监控功能
    printf("\n测试监控功能:\n");
    
    // 允许的系统调用
    write(STDOUT_FILENO, "允许的write调用\n", 17);
    
    // 被监控的系统调用
    printf("测试被监控的系统调用:\n");
    
    pid_t pid = getpid();
    printf("getpid()返回: %d\n", (int)pid);
    
    uid_t uid = getuid();
    printf("getuid()返回: %d\n", (int)uid);
    
    printf("注意：在实际应用中，SECCOMP_RET_TRACE会触发ptrace事件\n");
    printf("这需要额外的监控进程来处理追踪事件\n");
    
    return 0;
}

int main() {
    return demo_seccomp_monitoring();
}

示例5：安全沙箱应用

#define _GNU_SOURCE
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>

/**
 * 安全沙箱配置
 */
typedef struct {
    int allow_network;
    int allow_file_write;
    int allow_process_creation;
    int allow_memory_mapping;
} sandbox_config_t;

/**
 * 创建安全沙箱
 */
int create_secure_sandbox(const sandbox_config_t *config) {
    printf("=== 创建安全沙箱 ===\n");
    
    // 根据配置创建BPF过滤器
    struct sock_filter filter&#91;100];
    int filter_index = 0;
    
    // 基础加载系统调用号指令
    filter&#91;filter_index++] = BPF_STMT(BPF_LD | BPF_W | BPF_ABS, 
                                     offsetof(struct seccomp_data, nr));
    
    // 始终允许的系统调用
    int essential_calls&#91;] = {SYS_read, SYS_write, 
#ifdef __x86_64__
                           60,  // exit
                           231  // exit_group
#elif defined(__i386__)
                           1,   // exit
                           252  // exit_group
#endif
    };
    
    for (size_t i = 0; i < sizeof(essential_calls)/sizeof(essential_calls&#91;0]); i++) {
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 
                                         essential_calls&#91;i], 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
    }
    
    // 根据配置允许额外的系统调用
    if (config->allow_file_write) {
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_open, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
        
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_openat, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
        
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_close, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
    }
    
    if (config->allow_network) {
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_socket, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
        
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_connect, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
    }
    
    if (config->allow_process_creation) {
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_fork, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
        
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_clone, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
    }
    
    if (config->allow_memory_mapping) {
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_mmap, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
        
        filter&#91;filter_index++] = BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, SYS_munmap, 0, 1);
        filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW);
    }
    
    // 默认拒绝所有其他系统调用
    filter&#91;filter_index++] = BPF_STMT(BPF_RET | BPF_K, 
                                     SECCOMP_RET_ERRNO | (EPERM & 0xFFFF));
    
    struct sock_fprog prog = {
        .len = filter_index,
        .filter = filter,
    };
    
    printf("沙箱配置:\n");
    printf("  网络访问: %s\n", config->allow_network ? "允许" : "禁止");
    printf("  文件写入: %s\n", config->allow_file_write ? "允许" : "禁止");
    printf("  进程创建: %s\n", config->allow_process_creation ? "允许" : "禁止");
    printf("  内存映射: %s\n", config->allow_memory_mapping ? "允许" : "禁止");
    
    // 应用沙箱
    if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog, 0, 0) == -1) {
        printf("创建沙箱失败: %s\n", strerror(errno));
        return -1;
    }
    
    printf("安全沙箱创建成功\n");
    return 0;
}

/**
 * 演示不同安全级别的沙箱
 */
int demo_security_levels() {
    sandbox_config_t configs&#91;3] = {
        // 最严格：只允许基本I/O
        {0, 0, 0, 0},
        
        // 中等：允许文件操作
        {0, 1, 0, 1},
        
        // 宽松：允许网络和进程创建
        {1, 1, 1, 1}
    };
    
    const char *level_names&#91;] = {"最高安全", "中等安全", "较低安全"};
    
    printf("=== 不同安全级别沙箱演示 ===\n");
    
    for (int level = 0; level < 3; level++) {
        printf("\n--- %s级别沙箱 ---\n", level_names&#91;level]);
        
        if (create_secure_sandbox(&configs&#91;level]) == 0) {
            printf("沙箱 %s 创建成功\n", level_names&#91;level]);
            
            // 测试沙箱功能
            write(STDOUT_FILENO, "基本I/O测试成功\n", 17);
            
            if (configs&#91;level].allow_network) {
                printf("网络功能可用\n");
            }
            
            if (configs&#91;level].allow_file_write) {
                printf("文件写入功能可用\n");
            }
            
            // 由于seccomp策略一旦设置就不能放松，我们需要在子进程中测试
            break;  // 只测试第一个配置
        }
    }
    
    return 0;
}

/**
 * 演示沙箱的实际应用
 */
int demo_practical_sandbox() {
    printf("=== 实际沙箱应用演示 ===\n");
    
    // 创建一个限制性的沙箱：只允许基本操作
    sandbox_config_t config = {0, 0, 0, 0};  // 最严格
    
    if (create_secure_sandbox(&config) != 0) {
        return -1;
    }
    
    printf("\n沙箱环境中运行测试程序:\n");
    
    // 测试基本功能
    printf("1. 基本输出测试:\n");
    printf("   标准输出工作正常\n");
    write(STDOUT_FILENO, "   write系统调用工作正常\n", 24);
    
    // 测试被限制的功能
    printf("\n2. 被限制功能测试:\n");
    
    // 尝试网络操作
    long result = syscall(SYS_socket, AF_INET, SOCK_STREAM, 0);
    if (result == -1) {
        printf("   网络操作被成功阻止: %s\n", strerror(errno));
    }
    
    // 尝试文件写入
    result = syscall(SYS_open, "/tmp/test", O_WRONLY | O_CREAT, 0644);
    if (result == -1) {
        printf("   文件写入被成功阻止: %s\n", strerror(errno));
    }
    
    // 尝试进程创建
    result = syscall(SYS_fork);
    if (result == -1) {
        printf("   进程创建被成功阻止: %s\n", strerror(errno));
    }
    
    printf("\n3. 沙箱优势:\n");
    printf("   ✓ 防止恶意代码执行危险操作\n");
    printf("   ✓ 限制程序的权限范围\n");
    printf("   ✓ 提供额外的安全层\n");
    printf("   ✓ 可以与其它安全机制配合使用\n");
    
    printf("\n4. 使用场景:\n");
    printf("   - 插件或扩展的安全执行\n");
    printf("   - 不可信代码的沙箱运行\n");
    printf("   - 容器和虚拟化环境\n");
    printf("   - 安全审计和监控\n");
    
    return 0;
}

int main() {
    printf("seccomp - Linux系统调用过滤机制\n");
    printf("================================\n\n");
    
    // 由于seccomp策略一旦设置就会影响整个进程，
    // 我们分别在不同的子进程中演示不同功能
    
    if (fork() == 0) {
        return demo_practical_sandbox();
    }
    
    int status;
    wait(&status);
    
    return 0;
}

seccomp 使用注意事项

系统要求：

内核版本: 需要Linux 3.5或更高版本

架构支持: 支持多种CPU架构

编译选项: 需要内核编译时启用CONFIG_SECCOMP

权限要求：

CAP_SYS_ADMIN: 通常需要管理员权限2. 无特权进程: 可以使用SECCOMP_MODE_STRICT3. 容器环境: Docker等容器可能有限制

安全考虑：

策略不可逆: 一旦应用，seccomp策略不能放松2. 调试困难: 被阻止的系统调用可能难以调试3. 兼容性: 可能影响程序的正常功能4. 性能影响: BPF过滤会增加系统调用开销

最佳实践：

渐进式应用: 从宽松策略开始，逐步收紧

充分测试: 在生产环境前充分测试

错误处理: 妥善处理被阻止的系统调用

日志记录: 记录安全相关事件

备份方案: 提供策略失效时的处理方案

seccomp 模式详解

SECCOMP_MODE_STRICT (模式1)：

特点: 最简单的模式，只允许read/write/exit/exit_group
优点: 简单、高效、安全
缺点: 功能极其有限
适用: 极度安全要求的简单程序

SECCOMP_MODE_FILTER (模式2)：

特点: 使用BPF程序定义复杂过滤规则
优点: 灵活、功能强大
缺点: 配置复杂
适用: 大多数实际应用场景

常见系统调用编号

x86_64架构：

SYS_read = 0
SYS_write = 1
SYS_open = 2
SYS_close = 3
SYS_stat = 4
SYS_fstat = 5
SYS_lstat = 6
SYS_poll = 7
SYS_lseek = 8
SYS_mmap = 9
SYS_mprotect = 10
SYS_munmap = 11
SYS_brk = 12
SYS_rt_sigaction = 13
SYS_rt_sigprocmask = 14
SYS_rt_sigreturn = 15
SYS_ioctl = 16
SYS_pread64 = 17
SYS_pwrite64 = 18
SYS_readv = 19
SYS_writev = 20
SYS_access = 21
SYS_pipe = 22
SYS_select = 23
SYS_sched_yield = 24
SYS_mremap = 25
SYS_msync = 26
SYS_mincore = 27
SYS_madvise = 28
SYS_shmget = 29
SYS_shmat = 30
SYS_shmctl = 31
SYS_dup = 32
SYS_dup2 = 33
SYS_pause = 34
SYS_nanosleep = 35
SYS_getitimer = 36
SYS_alarm = 37
SYS_setitimer = 38
SYS_getpid = 39
SYS_sendfile = 40
SYS_socket = 41
SYS_connect = 42
SYS_accept = 43
SYS_sendto = 44
SYS_recvfrom = 45
SYS_sendmsg = 46
SYS_recvmsg = 47
SYS_shutdown = 48
SYS_bind = 49
SYS_listen = 50
SYS_getsockname = 51
SYS_getpeername = 52
SYS_socketpair = 53
SYS_setsockopt = 54
SYS_getsockopt = 55
SYS_clone = 56
SYS_fork = 57
SYS_vfork = 58
SYS_execve = 59
SYS_exit = 60
SYS_wait4 = 61
SYS_kill = 62
SYS_uname = 63
SYS_semget = 64
SYS_semop = 65
SYS_semctl = 66
SYS_shmdt = 67
SYS_msgget = 68
SYS_msgsnd = 69
SYS_msgrcv = 70
SYS_msgctl = 71
SYS_fcntl = 72
SYS_flock = 73
SYS_fsync = 74
SYS_fdatasync = 75
SYS_truncate = 76
SYS_ftruncate = 77
SYS_getdents = 78
SYS_getcwd = 79
SYS_chdir = 80
SYS_fchdir = 81
SYS_rename = 82
SYS_mkdir = 83
SYS_rmdir = 84
SYS_creat = 85
SYS_link = 86
SYS_unlink = 87
SYS_symlink = 88
SYS_readlink = 89
SYS_chmod = 90
SYS_fchmod = 91
SYS_chown = 92
SYS_fchown = 93
SYS_lchown = 94
SYS_umask = 95
SYS_gettimeofday = 96
SYS_getrlimit = 97
SYS_getrusage = 98
SYS_sysinfo = 99
SYS_times = 100
SYS_ptrace = 101
SYS_getuid = 102
SYS_syslog = 103
SYS_getgid = 104
SYS_setuid = 105
SYS_setgid = 106
SYS_geteuid = 107
SYS_getegid = 108
SYS_setpgid = 109
SYS_getppid = 110
SYS_getpgrp = 111
SYS_setsid = 112
SYS_setreuid = 113
SYS_setregid = 114
SYS_getgroups = 115
SYS_setgroups = 116
SYS_setresuid = 117
SYS_getresuid = 118
SYS_setresgid = 119
SYS_getresgid = 120
SYS_getpgid = 121
SYS_setfsuid = 122
SYS_setfsgid = 123
SYS_getsid = 124
SYS_capget = 125
SYS_capset = 126
SYS_rt_sigpending = 127
SYS_rt_sigtimedwait = 128
SYS_rt_sigqueueinfo = 129
SYS_rt_sigsuspend = 130
SYS_sigaltstack = 131
SYS_utime = 132
SYS_mknod = 133
SYS_uselib = 134
SYS_personality = 135
SYS_ustat = 136
SYS_statfs = 137
SYS_fstatfs = 138
SYS_sysfs = 139
SYS_getpriority = 140
SYS_setpriority = 141
SYS_sched_setparam = 142
SYS_sched_getparam = 143
SYS_sched_setscheduler = 144
SYS_sched_getscheduler = 145
SYS_sched_get_priority_max = 146
SYS_sched_get_priority_min = 147
SYS_sched_rr_get_interval = 148
SYS_mlock = 149
SYS_munlock = 150
SYS_mlockall = 151
SYS_munlockall = 152
SYS_vhangup = 153
SYS_modify_ldt = 154
SYS_pivot_root = 155
SYS__sysctl = 156
SYS_prctl = 157
SYS_arch_prctl = 158
SYS_adjtimex = 159
SYS_setrlimit = 160
SYS_chroot = 161
SYS_sync = 162
SYS_acct = 163
SYS_settimeofday = 164
SYS_mount = 165
SYS_umount2 = 166
SYS_swapon = 167
SYS_swapoff = 168
SYS_reboot = 169
SYS_sethostname = 170
SYS_setdomainname = 171
SYS_iopl = 172
SYS_ioperm = 173
SYS_create_module = 174
SYS_init_module = 175
SYS_delete_module = 176
SYS_get_kernel_syms = 177
SYS_query_module = 178
SYS_quotactl = 179
SYS_nfsservctl = 180
SYS_getpmsg = 181
SYS_putpmsg = 182
SYS_afs_syscall = 183
SYS_tuxcall = 184
SYS_security = 185
SYS_gettid = 186
SYS_readahead = 187
SYS_setxattr = 188
SYS_lsetxattr = 189
SYS_fsetxattr = 190
SYS_getxattr = 191
SYS_lgetxattr = 192
SYS_fgetxattr = 193
SYS_listxattr = 194
SYS_llistxattr = 195
SYS_flistxattr = 196
SYS_removexattr = 197
SYS_lremovexattr = 198
SYS_fremovexattr = 199
SYS_tkill = 200
SYS_time = 201
SYS_futex = 202
SYS_sched_setaffinity = 203
SYS_sched_getaffinity = 204
SYS_set_thread_area = 205
SYS_io_setup = 206
SYS_io_destroy = 207
SYS_io_getevents = 208
SYS_io_submit = 209
SYS_io_cancel = 210
SYS_get_thread_area = 211
SYS_lookup_dcookie = 212
SYS_epoll_create = 213
SYS_epoll_ctl_old = 214
SYS_epoll_wait_old = 215
SYS_remap_file_pages = 216
SYS_getdents64 = 217
SYS_set_tid_address = 218
SYS_restart_syscall = 219
SYS_semtimedop = 220
SYS_fadvise64 = 221
SYS_timer_create = 222
SYS_timer_settime = 223
SYS_timer_gettime = 224
SYS_timer_getoverrun = 225
SYS_timer_delete = 226
SYS_clock_settime = 227
SYS_clock_gettime = 228
SYS_clock_getres = 229
SYS_clock_nanosleep = 230
SYS_exit_group = 231
SYS_epoll_wait = 232
SYS_epoll_ctl = 233
SYS_tgkill = 234
SYS_utimes = 235
SYS_vserver = 236
SYS_mbind = 237
SYS_set_mempolicy = 238
SYS_get_mempolicy = 239
SYS_mq_open = 240
SYS_mq_unlink = 241
SYS_mq_timedsend = 242
SYS_mq_timedreceive = 243
SYS_mq_notify = 244
SYS_mq_getsetattr = 245
SYS_kexec_load = 246
SYS_waitid = 247
SYS_add_key = 248
SYS_request_key = 249
SYS_keyctl = 250
SYS_ioprio_set = 251
SYS_ioprio_get = 252
SYS_inotify_init = 253
SYS_inotify_add_watch = 254
SYS_inotify_rm_watch = 255
SYS_migrate_pages = 256
SYS_openat = 257
SYS_mkdirat = 258
SYS_mknodat = 259
SYS_fchownat = 260
SYS_futimesat = 261
SYS_newfstatat = 262
SYS_unlinkat = 263
SYS_renameat = 264
SYS_linkat = 265
SYS_symlinkat = 266
SYS_readlinkat = 267
SYS_fchmodat = 268
SYS_faccessat = 269
SYS_pselect6 = 270
SYS_ppoll = 271
SYS_unshare = 272
SYS_set_robust_list = 273
SYS_get_robust_list = 274
SYS_splice = 275
SYS_tee = 276
SYS_sync_file_range = 277
SYS_vmsplice = 278
SYS_move_pages = 279
SYS_utimensat = 280
SYS_epoll_pwait = 281
SYS_signalfd = 282
SYS_timerfd_create = 283
SYS_eventfd = 284
SYS_fallocate = 285
SYS_timerfd_settime = 286
SYS_timerfd_gettime = 287
SYS_accept4 = 288
SYS_signalfd4 = 289
SYS_eventfd2 = 290
SYS_epoll_create1 = 291
SYS_dup3 = 292
SYS_pipe2 = 293
SYS_inotify_init1 = 294
SYS_preadv = 295
SYS_pwritev = 296
SYS_rt_tgsigqueueinfo = 297
SYS_perf_event_open = 298
SYS_recvmmsg = 299
SYS_fanotify_init = 300
SYS_fanotify_mark = 301
SYS_prlimit64 = 302
SYS_name_to_handle_at = 303
SYS_open_by_handle_at = 304
SYS_clock_adjtime = 305
SYS_syncfs = 306
SYS_sendmmsg = 307
SYS_setns = 308
SYS_getcpu = 309
SYS_process_vm_readv = 310
SYS_process_vm_writev = 311
SYS_kcmp = 312
SYS_finit_module = 313
SYS_sched_setattr = 314
SYS_sched_getattr = 315
SYS_renameat2 = 316
SYS_seccomp = 317
SYS_getrandom = 318
SYS_memfd_create = 319
SYS_kexec_file_load = 320
SYS_bpf = 321
SYS_execveat = 322
SYS_userfaultfd = 323
SYS_membarrier = 324
SYS_mlock2 = 325
SYS_copy_file_range = 326
SYS_preadv2 = 327
SYS_pwritev2 = 328
SYS_pkey_mprotect = 329
SYS_pkey_alloc = 330
SYS_pkey_free = 331
SYS_statx = 332
SYS_io_pgetevents = 333
SYS_rseq = 334

总结

seccomp 是Linux系统中强大的安全机制，提供了：

系统调用级别的访问控制: 精确控制进程可以执行的操作2. 灵活的策略定义: 通过BPF程序实现复杂过滤逻辑3. 高效的执行: 内核级别的过滤，性能开销小4. 广泛的应用场景: 适用于沙箱、容器、安全审计等

通过合理使用seccomp，可以显著提高应用程序的安全性，构建更加安全可靠的计算环境。在实际应用中，需要仔细设计过滤策略，充分测试，并考虑错误处理和调试需求。

2025-08-16

Linux系统编程

kcmp系统调用及示例

kcmp 函数详解

函数介绍

kcmp 是 Linux 系统中用于比较两个进程的内核资源的系统调用。可以把 kcmp 想象成”内核级别的资源比较器”——它能够检查两个进程是否共享相同的内核资源（如文件描述符、虚拟内存区域等），就像比较两个人是否使用相同的银行账户或信用卡一样。

在容器化和虚拟化环境中，kcmp 非常有用，因为它可以帮助确定两个进程是否属于同一个容器或命名空间，或者它们之间是否存在资源共享关系。。

函数原型

#include <linux/kcmp.h>
#include <sys/syscall.h>
#include <unistd.h>

int kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2);

功能

kcmp 函数用于比较两个进程的指定内核资源。它可以确定两个进程的特定资源是否指向内核中的同一个对象。。

参数

pid1: 第一个进程的进程 ID（0 表示调用进程）
pid2: 第二个进程的进程 ID（0 表示调用进程）
type: 比较的资源类型
idx1: 第一个进程的资源索引（根据 type 而定）
idx2: 第二个进程的资源索引（根据 type 而定）

资源类型（type 参数）

类型值说明KCMP_FILE0比较文件描述符KCMP_VM1比较虚拟内存KCMP_FILES2比较文件描述符表KCMP_FS3比较文件系统信息KCMP_SIGHAND4比较信号处理信息KCMP_IO5比较 I/O 信息KCMP_SYSVSEM6比较 System V 信号量

返回值

0: 两个资源相同（指向内核中的同一对象）
1: 两个资源不同
2: 目标进程不存在或无法访问
负值: 错误，设置相应的 errno

错误码

EPERM: 权限不足（无法访问目标进程信息）
ESRCH: 进程不存在
EINVAL: 参数无效
EACCES: 访问被拒绝

相似函数或关联函数

clone: 创建进程时可以共享资源
unshare: 使进程脱离共享资源
setns: 加入命名空间
/proc 文件系统: 查看进程信息
ptrace: 进程跟踪和调试

示例代码

示例1：基础用法 - 比较文件描述符。

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/kcmp.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

// kcmp 系统调用包装函数
int kcmp_wrapper(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2) {
    return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2);
}

// 将 kcmp 结果转换为字符串
const char* kcmp_result_to_string(int result) {
    switch (result) {
        case 0: return "相同";
        case 1: return "不同";
        case 2: return "进程不存在或无法访问";
        default: return "错误";
    }
}

int main() {
    int fd1, fd2, fd3;
    pid_t current_pid = getpid();
    int result;
    
    printf("=== kcmp 基础示例 - 文件描述符比较 ===\n\n");
    
    // 创建测试文件
    fd1 = open("test1.txt", O_CREAT | O_RDWR | O_TRUNC, 0644);
    fd2 = open("test2.txt", O_CREAT | O_RDWR | O_TRUNC, 0644);
    fd3 = dup(fd1);  // fd3 是 fd1 的副本
    
    if (fd1 == -1 || fd2 == -1 || fd3 == -1) {
        perror("open/dup");
        return 1;
    }
    
    printf("创建的文件描述符:\n");
    printf("  fd1: %d (test1.txt)\n", fd1);
    printf("  fd2: %d (test2.txt)\n", fd2);
    printf("  fd3: %d (test1.txt 的副本)\n", fd3);
    printf("\n");
    
    // 比较相同的文件描述符
    printf("比较相同文件描述符:\n");
    result = kcmp_wrapper(current_pid, current_pid, KCMP_FILE, fd1, fd1);
    printf("  fd1 vs fd1: %s (结果: %d)\n", kcmp_result_to_string(result), result);
    
    // 比较不同的文件描述符（不同文件）
    printf("比较不同文件的文件描述符:\n");
    result = kcmp_wrapper(current_pid, current_pid, KCMP_FILE, fd1, fd2);
    printf("  fd1 vs fd2: %s (结果: %d)\n", kcmp_result_to_string(result), result);
    
    // 比较相同的文件描述符（通过 dup 创建）
    printf("比较相同文件的文件描述符:\n");
    result = kcmp_wrapper(current_pid, current_pid, KCMP_FILE, fd1, fd3);
    printf("  fd1 vs fd3: %s (结果: %d)\n", kcmp_result_to_string(result), result);
    
    // 比较文件描述符表
    printf("比较文件描述符表:\n");
    result = kcmp_wrapper(current_pid, current_pid, KCMP_FILES, 0, 0);
    printf("  FILES 表: %s (结果: %d)\n", kcmp_result_to_string(result), result);
    
    // 清理资源
    close(fd1);
    close(fd2);
    close(fd3);
    unlink("test1.txt");
    unlink("test2.txt");
    
    return 0;
}

示例2：进程间资源比较。

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/kcmp.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <sys/wait.h>
#include <sys/stat.h>

// kcmp 系统调用包装函数
int kcmp_wrapper(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2) {
    return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2);
}

// 显示 kcmp 结果
void show_kcmp_result(const char* description, int result) {
    printf("%-30s: ", description);
    switch (result) {
        case 0: printf("相同\n"); break;
        case 1: printf("不同\n"); break;
        case 2: printf("进程不存在或无法访问\n"); break;
        default: printf("错误 (%s)\n", strerror(errno)); break;
    }
}

// 创建测试环境
int setup_test_environment(int *fd1, int *fd2) {
    // 创建测试文件
    *fd1 = open("shared_file.txt", O_CREAT | O_RDWR | O_TRUNC, 0644);
    *fd2 = open("different_file.txt", O_CREAT | O_RDWR | O_TRUNC, 0644);
    
    if (*fd1 == -1 || *fd2 == -1) {
        perror("创建测试文件失败");
        return -1;
    }
    
    // 写入一些数据
    const char *data = "测试数据";
    write(*fd1, data, strlen(data));
    write(*fd2, data, strlen(data));
    
    return 0;
}

int main() {
    pid_t parent_pid = getpid();
    pid_t child_pid;
    int fd1, fd2;
    int result;
    
    printf("=== kcmp 进程间资源比较示例 ===\n\n");
    
    // 设置测试环境
    if (setup_test_environment(&fd1, &fd2) == -1) {
        return 1;
    }
    
    printf("父进程 PID: %d\n", parent_pid);
    printf("测试文件描述符: fd1=%d, fd2=%d\n\n", fd1, fd2);
    
    // 创建子进程
    child_pid = fork();
    if (child_pid == -1) {
        perror("fork");
        close(fd1);
        close(fd2);
        unlink("shared_file.txt");
        unlink("different_file.txt");
        return 1;
    }
    
    if (child_pid == 0) {
        // 子进程
        int child_fd1, child_fd2;
        pid_t my_pid = getpid();
        
        printf("子进程 PID: %d\n", my_pid);
        
        // 子进程打开相同的文件
        child_fd1 = open("shared_file.txt", O_RDWR);
        child_fd2 = open("different_file.txt", O_RDWR);
        
        if (child_fd1 == -1 || child_fd2 == -1) {
            perror("子进程打开文件失败");
            exit(1);
        }
        
        printf("子进程文件描述符: fd1=%d, fd2=%d\n\n", child_fd1, child_fd2);
        
        // 比较父子进程的文件描述符
        printf("父子进程文件描述符比较:\n");
        result = kcmp_wrapper(parent_pid, my_pid, KCMP_FILE, fd1, child_fd1);
        show_kcmp_result("父fd1 vs 子fd1 (相同文件)", result);
        
        result = kcmp_wrapper(parent_pid, my_pid, KCMP_FILE, fd2, child_fd2);
        show_kcmp_result("父fd2 vs 子fd2 (相同文件)", result);
        
        result = kcmp_wrapper(parent_pid, my_pid, KCMP_FILE, fd1, child_fd2);
        show_kcmp_result("父fd1 vs 子fd2 (不同文件)", result);
        
        // 比较进程资源表
        printf("\n进程资源表比较:\n");
        result = kcmp_wrapper(parent_pid, my_pid, KCMP_FILES, 0, 0);
        show_kcmp_result("文件描述符表", result);
        
        result = kcmp_wrapper(parent_pid, my_pid, KCMP_VM, 0, 0);
        show_kcmp_result("虚拟内存", result);
        
        result = kcmp_wrapper(parent_pid, my_pid, KCMP_FS, 0, 0);
        show_kcmp_result("文件系统信息", result);
        
        result = kcmp_wrapper(parent_pid, my_pid, KCMP_SIGHAND, 0, 0);
        show_kcmp_result("信号处理", result);
        
        // 清理子进程资源
        close(child_fd1);
        close(child_fd2);
        
        exit(0);
    } else {
        // 父进程等待子进程完成
        int status;
        waitpid(child_pid, &status, 0);
        
        // 清理父进程资源
        close(fd1);
        close(fd2);
        unlink("shared_file.txt");
        unlink("different_file.txt");
        
        printf("\n=== 容器和命名空间检测示例 ===\n");
        printf("kcmp 在容器技术中的应用:\n");
        printf("1. 检测进程是否在相同容器中\n");
        printf("2. 验证资源隔离效果\n");
        printf("3. 调试容器间资源共享\n");
    }
    
    return 0;
}

示例3：完整的进程关系分析工具。

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/kcmp.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <getopt.h>
#include <pwd.h>
#include <grp.h>

// kcmp 系统调用包装函数
int kcmp_wrapper(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2) {
    return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2);
}

// 资源类型名称
const char* get_resource_type_name(int type) {
    switch (type) {
        case KCMP_FILE: return "FILE";
        case KCMP_VM: return "VM";
        case KCMP_FILES: return "FILES";
        case KCMP_FS: return "FS";
        case KCMP_SIGHAND: return "SIGHAND";
        case KCMP_IO: return "IO";
        case KCMP_SYSVSEM: return "SYSVSEM";
        default: return "UNKNOWN";
    }
}

// 显示详细的 kcmp 结果
void show_detailed_result(const char* description, int result, int show_details) {
    printf("%-25s: ", description);
    
    switch (result) {
        case 0:
            printf("相同 ✓");
            if (show_details) printf(" (共享同一内核资源)");
            break;
        case 1:
            printf("不同 ✗");
            if (show_details) printf(" (使用不同内核资源)");
            break;
        case 2:
            printf("无法访问");
            if (show_details) printf(" (进程不存在或权限不足)");
            break;
        default:
            printf("错误 (%d)", result);
            if (show_details) printf(" (%s)", strerror(errno));
            break;
    }
    printf("\n");
}

// 分析两个进程的关系
void analyze_process_relationship(pid_t pid1, pid_t pid2) {
    printf("=== 进程关系分析 ===\n");
    printf("进程1 PID: %d\n", pid1);
    printf("进程2 PID: %d\n\n", pid2);
    
    // 基本信息检查
    if (pid1 == pid2) {
        printf("注意: 比较的是同一进程\n\n");
    }
    
    // 逐个比较资源类型
    int resource_types&#91;] = {
        KCMP_FILES, KCMP_VM, KCMP_FS, KCMP_SIGHAND, KCMP_IO, KCMP_SYSVSEM
    };
    int num_types = sizeof(resource_types) / sizeof(resource_types&#91;0]);
    
    printf("资源共享分析:\n");
    printf("%-25s  %s\n", "资源类型", "状态");
    printf("%-25s  %s\n", "--------", "----");
    
    int shared_count = 0;
    for (int i = 0; i < num_types; i++) {
        int result = kcmp_wrapper(pid1, pid2, resource_types&#91;i], 0, 0);
        show_detailed_result(get_resource_type_name(resource_types&#91;i]), result, 0);
        
        if (result == 0) {
            shared_count++;
        }
    }
    
    printf("\n共享资源统计: %d/%d 类型共享\n", shared_count, num_types);
    
    // 关系判断
    printf("\n关系判断:\n");
    if (shared_count == num_types) {
        printf("✓ 进程很可能有父子关系或克隆关系\n");
    } else if (shared_count >= 3) {
        printf("○ 进程可能在相同环境中运行\n");
    } else if (shared_count > 0) {
        printf("⚠ 进程部分资源共享\n");
    } else {
        printf("✗ 进程完全独立\n");
    }
    
    // 容器检测提示
    if (shared_count < 3) {
        printf("\n容器环境检测:\n");
        printf("  如果进程在不同容器中，大多数资源应该显示为'不同'\n");
    }
}

// 比较特定文件描述符
void compare_file_descriptors(pid_t pid1, pid_t pid2, int fd1, int fd2) {
    printf("\n=== 文件描述符比较 ===\n");
    
    int result = kcmp_wrapper(pid1, pid2, KCMP_FILE, fd1, fd2);
    
    printf("进程 %d fd%d vs 进程 %d fd%d: ", pid1, fd1, pid2, fd2);
    switch (result) {
        case 0: printf("指向同一文件 ✓\n"); break;
        case 1: printf("指向不同文件 ✗\n"); break;
        case 2: printf("无法访问进程\n"); break;
        default: printf("比较失败 (%s)\n", strerror(errno)); break;
    }
}

// 显示帮助信息
void show_help(const char *program_name) {
    printf("用法: %s &#91;选项]\n", program_name);
    printf("\n选项:\n");
    printf("  -1, --pid1=PID         第一个进程 ID (默认: 当前进程)\n");
    printf("  -2, --pid2=PID         第二个进程 ID (默认: 父进程)\n");
    printf("  -f, --fd=FD1:FD2       比较特定文件描述符\n");
    printf("  -t, --type=TYPE        比较特定资源类型\n");
    printf("  -a, --all              显示详细分析\n");
    printf("  -h, --help             显示此帮助信息\n");
    printf("\n资源类型:\n");
    printf("  file    - 文件描述符\n");
    printf("  vm      - 虚拟内存\n");
    printf("  files   - 文件描述符表\n");
    printf("  fs      - 文件系统信息\n");
    printf("  sighand - 信号处理\n");
    printf("  io      - I/O 信息\n");
    printf("  sysvsem - System V 信号量\n");
    printf("\n示例:\n");
    printf("  %s -1 1234 -2 5678     # 比较进程 1234 和 5678\n");
    printf("  %s -f 3:4              # 比较当前进程的 fd3 和 fd4\n");
    printf("  %s -t vm               # 比较虚拟内存\n");
}

int main(int argc, char *argv&#91;]) {
    pid_t pid1 = 0;  // 0 表示当前进程
    pid_t pid2 = getppid();  // 默认比较当前进程和父进程
    int fd1 = -1, fd2 = -1;
    int resource_type = -1;
    int show_all = 0;
    int specific_fd = 0;
    
    // 解析命令行参数
    static struct option long_options&#91;] = {
        {"pid1",    required_argument, 0, '1'},
        {"pid2",    required_argument, 0, '2'},
        {"fd",      required_argument, 0, 'f'},
        {"type",    required_argument, 0, 't'},
        {"all",     no_argument,       0, 'a'},
        {"help",    no_argument,       0, 'h'},
        {0, 0, 0, 0}
    };
    
    int c;
    while (1) {
        int option_index = 0;
        c = getopt_long(argc, argv, "1:2:f:t:ah", long_options, &option_index);
        
        if (c == -1)
            break;
            
        switch (c) {
            case '1':
                pid1 = atoi(optarg);
                break;
            case '2':
                pid2 = atoi(optarg);
                break;
            case 'f':
                if (sscanf(optarg, "%d:%d", &fd1, &fd2) == 2) {
                    specific_fd = 1;
                } else {
                    fprintf(stderr, "错误: 文件描述符格式应为 FD1:FD2\n");
                    return 1;
                }
                break;
            case 't':
                if (strcmp(optarg, "file") == 0) resource_type = KCMP_FILE;
                else if (strcmp(optarg, "vm") == 0) resource_type = KCMP_VM;
                else if (strcmp(optarg, "files") == 0) resource_type = KCMP_FILES;
                else if (strcmp(optarg, "fs") == 0) resource_type = KCMP_FS;
                else if (strcmp(optarg, "sighand") == 0) resource_type = KCMP_SIGHAND;
                else if (strcmp(optarg, "io") == 0) resource_type = KCMP_IO;
                else if (strcmp(optarg, "sysvsem") == 0) resource_type = KCMP_SYSVSEM;
                else {
                    fprintf(stderr, "错误: 未知的资源类型 '%s'\n", optarg);
                    return 1;
                }
                break;
            case 'a':
                show_all = 1;
                break;
            case 'h':
                show_help(argv&#91;0]);
                return 0;
            case '?':
                return 1;
        }
    }
    
    printf("=== kcmp 进程资源比较工具 ===\n\n");
    
    // 如果指定了特定文件描述符比较
    if (specific_fd) {
        compare_file_descriptors(pid1, pid2, fd1, fd2);
        return 0;
    }
    
    // 如果指定了特定资源类型
    if (resource_type != -1) {
        int result = kcmp_wrapper(pid1, pid2, resource_type, 0, 0);
        printf("进程 %d 和 %d 的 %s 资源比较: ", 
               pid1 ? pid1 : getpid(), pid2 ? pid2 : getppid(),
               get_resource_type_name(resource_type));
        
        switch (result) {
            case 0: printf("相同\n"); break;
            case 1: printf("不同\n"); break;
            case 2: printf("无法访问\n"); break;
            default: printf("错误 (%s)\n", strerror(errno)); break;
        }
        return 0;
    }
    
    // 执行完整的进程关系分析
    analyze_process_relationship(pid1 ? pid1 : getpid(), pid2 ? pid2 : getppid());
    
    // 显示系统信息
    printf("\n=== 系统信息 ===\n");
    printf("当前进程: %d\n", getpid());
    printf("父进程: %d\n", getppid());
    printf("用户 ID: %d\n", getuid());
    
    struct passwd *pwd = getpwuid(getuid());
    if (pwd) {
        printf("用户名: %s\n", pwd->pw_name);
    }
    
    // 使用建议
    printf("\n=== kcmp 使用建议 ===\n");
    printf("典型应用场景:\n");
    printf("1. 容器技术: 检测进程是否在同一容器中\n");
    printf("2. 调试工具: 分析进程间资源共享情况\n");
    printf("3. 安全审计: 验证进程隔离效果\n");
    printf("4. 系统监控: 跟踪进程关系变化\n");
    printf("\n注意事项:\n");
    printf("1. 需要适当权限才能比较其他进程\n");
    printf("2. 结果可能受 SELinux 等安全模块影响\n");
    printf("3. 某些资源类型可能不适用于所有内核版本\n");
    
    return 0;
}

编译和运行说明

# 编译示例程序
gcc -o kcmp_example1 example1.c
gcc -o kcmp_example2 example2.c
gcc -o kcmp_example3 example3.c

# 运行示例
./kcmp_example1
./kcmp_example2
./kcmp_example3 --help
./kcmp_example3 -a
./kcmp_example3 -1 1 -2 $$

系统要求检查

# 检查内核版本（需要 3.5+）
uname -r

# 检查 kcmp 系统调用支持
grep -w kcmp /usr/include/asm/unistd_64.h

# 查看系统调用表
cat /proc/kallsyms | grep kcmp

重要注意事项

内核版本: 需要 Linux 3.5+ 内核支持2. 权限要求: 通常需要 ptrace 权限才能比较其他进程3. 安全限制: SELinux、AppArmor 等可能限制访问4. 进程存在性: 目标进程必须存在且可访问5. 错误处理: 始终检查返回值和 errno。

实际应用场景

容器技术: Docker、LXC 等容器中检测进程关系2. 系统调试: 分析进程间资源共享情况3. 安全审计: 验证进程隔离和沙箱效果4. 性能分析: 理解进程资源使用模式5. 故障排查: 诊断进程间意外的资源共享。

容器环境中的应用

// 检测进程是否在同一容器中
int are_processes_in_same_container(pid_t pid1, pid_t pid2) {
    // 比较关键资源
    int results&#91;] = {
        kcmp_wrapper(pid1, pid2, KCMP_FILES, 0, 0),
        kcmp_wrapper(pid1, pid2, KCMP_FS, 0, 0),
        kcmp_wrapper(pid1, pid2, KCMP_SIGHAND, 0, 0)
    };
    
    // 如果大多数关键资源不同，可能在不同容器中
    int different_count = 0;
    for (int i = 0; i < 3; i++) {
        if (results&#91;i] == 1) different_count++;
    }
    
    return (different_count >= 2) ? 0 : 1;  // 0=不同容器, 1=可能同容器
}

最佳实践

// 安全地使用 kcmp
int safe_kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2) {
    // 验证参数
    if (type < KCMP_FILE || type > KCMP_SYSVSEM) {
        errno = EINVAL;
        return -1;
    }
    
    // 检查权限
    if ((pid1 != 0 && pid1 != getpid()) || (pid2 != 0 && pid2 != getpid())) {
        // 可能需要额外权限检查
        printf("注意: 比较其他进程需要适当权限\n");
    }
    
    int result = kcmp_wrapper(pid1, pid2, type, idx1, idx2);
    
    // 处理常见错误
    if (result == 2) {
        printf("警告: 目标进程可能不存在或无法访问\n");
    } else if (result < 0) {
        printf("错误: kcmp 调用失败 (%s)\n", strerror(errno));
    }
    
    return result;
}

文档标题: kcmp系统调用及示例在这些示例中，展示了kcmp函数的多种使用情况，从基本的资源比较到完整的进程关系分析工具，有助于全面了解Linux系统中的进程资源比较机制。文档标题: kcmp系统调用及示例在这些示例中，展示了kcmp函数的多种使用情况，从基本的资源比较到完整的进程关系分析工具，有助于全面了解Linux系统中的进程资源比较机制。文档标题: kcmp系统调用及示例在这些示例中，展示了kcmp函数的多种使用情况，从基本的资源比较到完整的进程关系分析工具，有助于全面了解Linux系统中的进程资源比较机制。

2025-08-16

Linux系统编程

kexec_load系统调用及示例

kexec_load - 加载新的内核镜像用于快速重启

1. 函数介绍

kexec_load 是一个 Linux 系统调用，用于加载新的内核镜像到内存中，以便通过 kexec 机制进行快速内核切换。kexec 是一种允许在不经过 BIOS/UEFI 初始化过程的情况下直接启动新内核的技术，大大减少了系统重启时间。

2. 函数原型

#include <linux/kexec.h>

long kexec_load(unsigned long entry, unsigned long nr_segments,
                struct kexec_segment *segments, unsigned long flags);

注意：这不是标准 C 库函数，需要通过 syscall() 调用。

3. 功能

将指定的内核镜像加载到内存中，为后续的 kexec 调用做准备。加载的内核可以在任何时候通过 reboot() 系统调用的 LINUX_REBOOT_CMD_KEXEC 命令激活。

4. 参数

unsigned long entry: 新内核的入口点地址
unsigned long nr_segments: 段的数量
struct kexec_segment *segments: 指向段描述符数组的指针

unsigned long flags: 控制标志

KEXEC_ON_CRASH: 为内核崩溃转储加载内核
KEXEC_PRESERVE_CONTEXT: 保留 CPU 状态
KEXEC_ARCH_MASK: 架构相关标志

5. kexec_segment 结构体

struct kexec_segment {
    void *buf;          /* 缓冲区指针 */
    size_t bufsz;       /* 缓冲区大小 */
    void *mem;          /* 内存地址 */
    size_t memsz;       /* 内存段大小 */
};

6. 返回值

成功时：返回 0
失败时：返回 -1，并设置 errno

7. 常见 errno 错误码

EBUSY: kexec 子系统正在使用中
EINVAL: 参数无效
EPERM: 权限不足（需要 CAP_SYS_BOOT 能力）
ENOMEM: 内存不足
EADDRNOTAVAIL: 指定的内存地址不可用
ENOEXEC: 内核镜像格式无效

8. 相似函数，或关联函数

reboot(): 系统重启函数，用于激活已加载的内核
kexec_file_load(): 更现代的内核加载接口（Linux 3.17+）
syscall(): 系统调用接口
/sbin/kexec: 用户态 kexec 工具
/proc/iomem: 查看系统内存布局
/sys/kernel/kexec_crash_loaded: 检查崩溃内核是否已加载

9. 示例代码

示例1：基本使用 - kexec 加载框架

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/kexec.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>

#ifndef SYS_kexec_load
# define SYS_kexec_load 246  // x86_64 架构下的系统调用号
#endif

#ifndef SYS_reboot
# define SYS_reboot 169
#endif

#define LINUX_REBOOT_MAGIC1 0xfee1dead
#define LINUX_REBOOT_MAGIC2 0x28121969
#define LINUX_REBOOT_CMD_KEXEC 0x45584543

// 检查 kexec 支持
int check_kexec_support() {
    if (access("/proc/iomem", R_OK) == -1) {
        printf("系统不支持 /proc/iomem\n");
        return -1;
    }
    
    FILE *fp = fopen("/proc/cmdline", "r");
    if (fp) {
        char cmdline&#91;1024];
        if (fgets(cmdline, sizeof(cmdline), fp)) {
            if (strstr(cmdline, "nokexec")) {
                printf("内核启动参数禁用了 kexec\n");
                fclose(fp);
                return -1;
            }
        }
        fclose(fp);
    }
    
    return 0;
}

// 检查权限
int check_kexec_permissions() {
    if (geteuid() != 0) {
        printf("需要 root 权限执行 kexec\n");
        return -1;
    }
    
    // 检查 CAP_SYS_BOOT 能力
    // 这里简化处理，实际需要使用 libcap
    
    return 0;
}

// 加载内核文件到内存
void* load_kernel_file(const char *filename, size_t *size) {
    int fd = open(filename, O_RDONLY);
    if (fd == -1) {
        printf("无法打开内核文件: %s\n", filename);
        return NULL;
    }
    
    struct stat st;
    if (fstat(fd, &st) == -1) {
        printf("无法获取文件状态: %s\n", strerror(errno));
        close(fd);
        return NULL;
    }
    
    void *buffer = malloc(st.st_size);
    if (!buffer) {
        printf("内存分配失败\n");
        close(fd);
        return NULL;
    }
    
    ssize_t bytes_read = read(fd, buffer, st.st_size);
    if (bytes_read != st.st_size) {
        printf("读取文件失败\n");
        free(buffer);
        close(fd);
        return NULL;
    }
    
    *size = st.st_size;
    close(fd);
    return buffer;
}

int main() {
    printf("=== kexec 加载框架演示 ===\n");
    
    // 检查系统支持
    if (check_kexec_support() == -1) {
        printf("系统不支持 kexec 功能\n");
        return 1;
    }
    
    // 检查权限
    if (check_kexec_permissions() == -1) {
        printf("权限不足，无法执行 kexec\n");
        return 1;
    }
    
    printf("系统支持 kexec，权限检查通过\n");
    
    // 显示系统信息
    printf("当前内核版本: ");
    system("uname -r");
    
    printf("系统架构: ");
    system("uname -m");
    
    // 检查 kexec 状态
    printf("\n当前 kexec 状态:\n");
    system("ls /sys/kernel/kexec* 2>/dev/null || echo 'kexec 信息不可用'");
    
    return 0;
}

示例2：kexec 文件加载接口（现代方法）

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/kexec.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>

#ifndef SYS_kexec_file_load
# define SYS_kexec_file_load 311  // x86_64 架构下的系统调用号
#endif

// kexec_file_load 系统调用（Linux 3.17+）
long kexec_file_load_syscall(int kernel_fd, int initrd_fd,
                            unsigned long cmdline_len,
                            const char *cmdline_ptr,
                            unsigned long flags) {
    return syscall(SYS_kexec_file_load, kernel_fd, initrd_fd,
                   cmdline_len, cmdline_ptr, flags);
}

void demonstrate_kexec_file_load() {
    printf("=== kexec_file_load 演示 ===\n");
    
    if (geteuid() != 0) {
        printf("需要 root 权限\n");
        return;
    }
    
    // 检查系统是否支持 kexec_file_load
    long result = syscall(SYS_kexec_file_load, -1, -1, 0, NULL, 0);
    if (result == -1 && errno == ENOSYS) {
        printf("系统不支持 kexec_file_load 系统调用\n");
        printf("需要 Linux 内核 3.17 或更高版本\n");
        return;
    }
    
    printf("系统支持 kexec_file_load 接口\n");
    
    // 显示可用的内核文件
    printf("\n可用的内核文件:\n");
    system("ls /boot/vmlinuz* 2>/dev/null | head -5 || echo '未找到内核文件'");
    
    // 显示 initrd 文件
    printf("\n可用的 initrd 文件:\n");
    system("ls /boot/initrd* /boot/initramfs* 2>/dev/null | head -5 || echo '未找到 initrd 文件'");
    
    // 显示当前 kexec 状态
    printf("\n当前 kexec 状态:\n");
    system("cat /sys/kernel/kexec_loaded 2>/dev/null || echo 'kexec_loaded 不可用'");
    system("cat /sys/kernel/kexec_crash_loaded 2>/dev/null || echo 'kexec_crash_loaded 不可用'");
}

int main() {
    demonstrate_kexec_file_load();
    return 0;
}

示例3：kexec 状态检查工具

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

void check_kexec_status() {
    printf("=== kexec 状态检查 ===\n");
    
    // 检查内核配置
    printf("内核 kexec 支持:\n");
    system("grep CONFIG_KEXEC /boot/config-$(uname -r) 2>/dev/null || "
           "zgrep CONFIG_KEXEC /proc/config.gz 2>/dev/null || "
           "echo '无法确定内核配置'");
    
    // 检查 kexec 模块参数
    printf("\nkexec 模块信息:\n");
    system("lsmod | grep kexec 2>/dev/null || echo 'kexec 作为内建功能或未加载'");
    
    // 检查 sysfs 接口
    printf("\n系统 kexec 状态:\n");
    system("echo 'kexec 加载状态: ' && "
           "cat /sys/kernel/kexec_loaded 2>/dev/null || echo '未知'");
    system("echo '崩溃内核状态: ' && "
           "cat /sys/kernel/kexec_crash_loaded 2>/dev/null || echo '未知'");
    system("echo 'kexec 锁定状态: ' && "
           "cat /sys/kernel/kexec_crash_size 2>/dev/null || echo '未知'");
    
    // 检查内存布局
    printf("\n系统内存布局 (相关部分):\n");
    system("grep -E 'System RAM|Kernel code|Kernel data' /proc/iomem 2>/dev/null || "
           "echo '无法读取内存布局'");
    
    // 检查权限
    printf("\n权限检查:\n");
    printf("当前 UID: %d\n", getuid());
    printf("当前 EUID: %d\n", geteuid());
    
    if (geteuid() == 0) {
        printf("✓ 以 root 身份运行\n");
    } else {
        printf("✗ 需要 root 权限\n");
    }
    
    // 检查能力
    printf("\n能力检查:\n");
    system("cat /proc/self/status | grep Cap 2>/dev/null || echo '无法读取能力信息'");
}

void show_kexec_limits() {
    printf("\n=== kexec 限制和配置 ===\n");
    
    // 显示硬件限制
    printf("硬件架构限制:\n");
    system("uname -m");
    
    // 显示内存限制
    printf("\n内存相关信息:\n");
    system("free -h");
    
    // 显示 kexec 相关的内核参数
    printf("\n相关内核参数:\n");
    system("sysctl -a 2>/dev/null | grep -E 'kexec|crash' | head -10 || "
           "echo '无法读取内核参数'");
    
    // 显示安全限制
    printf("\n安全相关设置:\n");
    system("cat /proc/sys/kernel/kptr_restrict 2>/dev/null || echo 'kptr_restrict 未设置'");
    system("cat /proc/sys/kernel/perf_event_paranoid 2>/dev/null || echo 'perf_event_paranoid 未设置'");
}

void demonstrate_kexec_commands() {
    printf("\n=== kexec 相关命令 ===\n");
    
    printf("常用 kexec 命令:\n");
    printf("  kexec -l <kernel>          # 加载内核\n");
    printf("  kexec -e                   # 执行已加载的内核\n");
    printf("  kexec -p <kernel>          # 加载崩溃内核\n");
    printf("  kexec -u                   # 卸载当前内核\n");
    
    printf("\n示例用法:\n");
    printf("  # 加载新内核\n");
    printf("  kexec -l /boot/vmlinuz-$(uname -r) \\\n");
    printf("        --initrd=/boot/initrd.img-$(uname -r) \\\n");
    printf("        --command-line=\"$(cat /proc/cmdline)\"\n");
    printf("  \n");
    printf("  # 快速重启\n");
    printf("  kexec -e\n");
    
    // 检查 kexec 命令是否存在
    printf("\n系统中的 kexec 工具:\n");
    system("which kexec 2>/dev/null && echo '✓ kexec 命令可用' || echo '✗ kexec 命令不可用'");
}

int main() {
    check_kexec_status();
    show_kexec_limits();
    demonstrate_kexec_commands();
    
    return 0;
}

示例4：kexec 安全和监控工具

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/stat.h>
#include <fcntl.h>

void security_analysis() {
    printf("=== kexec 安全分析 ===\n");
    
    // 检查 kexec 锁定状态
    printf("kexec 锁定检查:\n");
    int fd = open("/proc/sys/kernel/kexec_load_disabled", O_RDONLY);
    if (fd != -1) {
        char buf&#91;16];
        ssize_t n = read(fd, buf, sizeof(buf) - 1);
        if (n > 0) {
            buf&#91;n] = '\0';
            printf("kexec 加载已%s\n", atoi(buf) ? "禁用" : "启用");
        }
        close(fd);
    } else {
        printf("无法检查 kexec 锁定状态\n");
    }
    
    // 检查安全启动状态
    printf("\n安全启动状态:\n");
    system("mokutil --sb-state 2>/dev/null || echo 'mokutil 不可用或安全启动未启用'");
    
    // 检查 IMA/EVM 状态
    printf("\nIMA/EVM 状态:\n");
    system("cat /sys/kernel/security/ima/policy_count 2>/dev/null || echo 'IMA 不可用'");
    
    // 检查 SELinux/AppArmor 状态
    printf("\n强制访问控制:\n");
    system("getenforce 2>/dev/null || echo 'SELinux 不可用'");
    system("aa-status 2>/dev/null | head -5 || echo 'AppArmor 信息不可用'");
}

void performance_benchmark() {
    printf("\n=== kexec 性能优势 ===\n");
    
    printf("传统重启 vs kexec 重启:\n");
    printf("  传统重启时间:     30-120 秒\n");
    printf("  kexec 重启时间:   2-10 秒\n");
    printf("  时间节省:         80-95%%\n");
    
    printf("\n适用场景:\n");
    printf("  • 高可用性集群\n");
    printf("  • 实时系统维护\n");
    printf("  • 内核升级测试\n");
    printf("  • 故障恢复\n");
    printf("  • 开发调试\n");
    
    printf("\n性能考虑:\n");
    printf("  • 需要足够的内存保留空间\n");
    printf("  • 内核镜像大小影响加载时间\n");
    printf("  • 内存碎片可能影响加载成功率\n");
}

void crash_dump_analysis() {
    printf("\n=== 崩溃转储支持 ===\n");
    
    printf("kexec 崩溃转储功能:\n");
    printf("  • 内核崩溃时保存内存状态\n");
    printf("  • 快速重启并保存转储信息\n");
    printf("  • 支持 vmcore 分析\n");
    
    // 检查崩溃转储配置
    printf("\n崩溃转储配置:\n");
    system("cat /proc/sys/kernel/panic 2>/dev/null || echo 'panic 设置不可用'");
    system("cat /proc/sys/kernel/panic_on_oops 2>/dev/null || echo 'panic_on_oops 设置不可用'");
    system("cat /sys/kernel/kexec_crash_size 2>/dev/null || echo '崩溃内存大小未设置'");
    
    printf("\n相关工具:\n");
    printf("  • crash: 内存转储分析工具\n");
    printf("  • makedumpfile: 转储文件处理工具\n");
    printf("  • kgdb: 内核调试器\n");
}

void best_practices() {
    printf("\n=== kexec 最佳实践 ===\n");
    
    printf("使用建议:\n");
    printf("  1. 确保新内核与硬件兼容\n");
    printf("  2. 备份重要数据\n");
    printf("  3. 测试内核参数\n");
    printf("  4. 监控系统日志\n");
    printf("  5. 准备回滚方案\n");
    
    printf("\n安全注意事项:\n");
    printf("  • 验证内核镜像完整性\n");
    printf("  • 限制 kexec 权限\n");
    printf("  • 启用审计日志\n");
    printf("  • 定期更新内核\n");
    
    printf("\n故障排除:\n");
    printf("  • 检查内核日志: dmesg | grep kexec\n");
    printf("  • 验证内存: free -h\n");
    printf("  • 检查权限: ls -l /proc/sys/kernel/kexec*\n");
}

int main() {
    printf("=== kexec 综合分析工具 ===\n");
    
    security_analysis();
    performance_benchmark();
    crash_dump_analysis();
    best_practices();
    
    printf("\n=== 总结 ===\n");
    printf("kexec 是 Linux 系统中重要的快速重启技术，\n");
    printf("适用于需要最小化停机时间的场景。\n");
    printf("使用时需注意安全性和兼容性问题。\n");
    
    return 0;
}

10. kexec 相关结构体和常量

// kexec 标志位
#define KEXEC_ON_CRASH          0x00000001
#define KEXEC_PRESERVE_CONTEXT  0x00000002
#define KEXEC_ARCH_MASK         0xffff0000

// 架构相关标志
#define KEXEC_ARCH_DEFAULT      (0 << 16)
#define KEXEC_ARCH_386          (3 << 16)
#define KEXEC_ARCH_68K          (4 << 16)
#define KEXEC_ARCH_PARISC       (15 << 16)
#define KEXEC_ARCH_X86_64       (62 << 16)
#define KEXEC_ARCH_PPC          (20 << 16)
#define KEXEC_ARCH_PPC64        (21 << 16)
#define KEXEC_ARCH_IA_64        (50 << 16)
#define KEXEC_ARCH_ARM          (40 << 16)
#define KEXEC_ARCH_S390         (22 << 16)
#define KEXEC_ARCH_SH           (42 << 16)
#define KEXEC_ARCH_MIPS_LE      (10 << 16)
#define KEXEC_ARCH_MIPS         (8 << 16)

// 重启命令
#define LINUX_REBOOT_CMD_KEXEC  0x45584543

11. 实际应用场景

场景1：高可用性集群

void ha_cluster_kexec() {
    // 在 HA 集群中使用 kexec 进行快速故障转移
    // 减少服务中断时间
}

场景2：内核开发调试

void kernel_development_kexec() {
    // 快速测试新内核版本
    // 避免漫长的 BIOS/UEFI 初始化过程
}

场景3：系统维护

void system_maintenance_kexec() {
    // 在维护窗口期间快速重启
    // 最大化可用性
}

12. 注意事项

使用 kexec 时需要注意：

权限要求: 需要 CAP_SYS_BOOT 能力或 root 权限2. 内存需求: 需要足够的内存来加载新内核3. 兼容性: 新内核必须与硬件兼容4. 安全性: 内核镜像需要验证完整性5. 调试困难: kexec 后的调试信息可能有限6. 硬件初始化: 跳过 BIOS/UEFI 可能影响某些硬件

13. 现代替代方案

// kexec_file_load (推荐的现代接口)
long kexec_file_load(int kernel_fd, int initrd_fd,
                    unsigned long cmdline_len,
                    const char *cmdline_ptr,
                    unsigned long flags);

总结

kexec_load 是 Linux 系统中实现快速内核切换的核心系统调用：

关键特性:1. 快速重启: 跳过 BIOS/UEFI 初始化2. 内存加载: 将新内核加载到内存中3. 灵活控制: 支持多种加载选项4. 崩溃支持: 支持崩溃转储功能

主要应用:1. 高可用性系统快速故障恢复2. 内核开发和测试3. 系统维护和升级4. 崩溃分析和调试

使用要点:1. 需要特殊权限和内核支持2. 注意内存和兼容性要求3. 建议使用现代的 kexec_file_load 接口4. 配合用户态工具使用效果更佳

kexec 技术显著提高了 Linux 系统的可用性和维护效率，是企业级系统管理的重要工具。

2025-08-16

Linux系统编程

kill系统调用及示例

kill - 发送信号给进程

函数介绍

kill系统调用用于向指定的进程发送信号。信号是Linux系统中进程间通信的一种方式，用于通知进程发生了某种事件。通过kill可以控制其他进程的行为，如终止、暂停、继续等。

函数原型

#include <sys/types.h>
#include <signal.h>

int kill(pid_t pid, int sig);

功能

向指定进程发送信号，用于进程控制和通信。

参数

pid_t pid: 目标进程ID

0: 发送给指定进程ID
= 0: 发送给当前进程组的所有进程
-1: 发送给有权限发送的所有进程（除init）
< -1: 发送给进程组ID为-pid的所有进程

int sig: 要发送的信号编号

0: 空信号，用于检查进程是否存在
SIGTERM(15): 终止信号（默认）
SIGKILL(9): 强制终止信号
SIGSTOP(17): 暂停信号
SIGCONT(19): 继续信号

返回值

成功时返回0

失败时返回-1，并设置errno：

EINVAL: 信号编号无效
EPERM: 权限不足
ESRCH: 进程不存在

相似函数

raise(): 向当前进程发送信号
killpg(): 向进程组发送信号
signal(): 设置信号处理函数
sigaction(): 更高级的信号处理函数

示例代码

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <errno.h>
#include <string.h>

int main() {
    pid_t current_pid = getpid();
    pid_t parent_pid = getppid();
    
    printf("=== Kill函数示例 ===\n");
    printf("当前进程PID: %d\n", current_pid);
    printf("父进程PID: %d\n", parent_pid);
    
    // 示例1: 发送空信号检查进程是否存在
    printf("\n示例1: 检查进程是否存在\n");
    if (kill(current_pid, 0) == 0) {
        printf("  进程 %d 存在\n", current_pid);
    } else {
        printf("  进程 %d 不存在\n", current_pid);
    }
    
    // 示例2: 向自己发送SIGTERM信号
    printf("\n示例2: 向自己发送SIGTERM信号\n");
    printf("  发送SIGTERM信号前...\n");
    
    // 创建子进程来演示，避免终止主进程
    pid_t child_pid = fork();
    if (child_pid == 0) {
        // 子进程
        printf("  子进程 %d 准备接收信号\n", getpid());
        sleep(1);
        printf("  子进程退出\n");
        exit(0);
    } else if (child_pid > 0) {
        // 父进程
        sleep(1); // 等待子进程准备就绪
        printf("  向子进程 %d 发送SIGTERM信号\n", child_pid);
        if (kill(child_pid, SIGTERM) == 0) {
            printf("  信号发送成功\n");
        } else {
            perror("  kill失败");
        }
        wait(NULL); // 等待子进程结束
    }
    
    // 示例3: 演示不同信号的效果
    printf("\n示例3: 不同信号的效果\n");
    
    // 创建用于演示的子进程
    child_pid = fork();
    if (child_pid == 0) {
        // 子进程循环运行
        printf("  子进程 %d 开始运行...\n", getpid());
        int i = 0;
        while (i < 10) {
            printf("  子进程运行中... %d\n", i++);
            sleep(1);
        }
        printf("  子进程正常退出\n");
        exit(0);
    } else if (child_pid > 0) {
        // 父进程
        sleep(2); // 让子进程运行一会儿
        
        printf("  向子进程 %d 发送SIGSTOP信号(暂停)\n", child_pid);
        kill(child_pid, SIGSTOP);
        sleep(2);
        
        printf("  向子进程 %d 发送SIGCONT信号(继续)\n", child_pid);
        kill(child_pid, SIGCONT);
        sleep(2);
        
        printf("  向子进程 %d 发送SIGTERM信号(终止)\n", child_pid);
        kill(child_pid, SIGTERM);
        wait(NULL);
    }
    
    // 示例4: 错误处理
    printf("\n示例4: 错误处理\n");
    if (kill(999999, SIGTERM) == -1) {
        printf("  向不存在的进程发送信号: %s\n", strerror(errno));
    }
    
    if (kill(parent_pid, 999) == -1) {
        printf("  发送无效信号: %s\n", strerror(errno));
    }
    
    printf("\n程序执行完毕\n");
    return 0;
}

(https://www.calcguide.tech/2025/08/16/kill系统调用及示例/)

2025-08-16

Linux系统编程

keyctl - 密钥管理控制接口

1. 函数介绍

keyctl 是一个 Linux 系统调用，用于管理和操作内核密钥保留服务（Key Retention Service）。它提供了对内核密钥管理子系统的完整控制接口，允许进程创建、检索、更新和删除各种类型的密钥。

密钥保留服务是 Linux 内核的安全基础设施组件，用于安全地存储和管理密钥、密码、证书等敏感信息。

2. 函数原型

#include <sys/types.h>
#include <keyutils.h>

long keyctl(int cmd, ...);

注意：这不是标准 C 库函数，需要通过 syscall() 调用或使用 libkeyutils 库。

3. 功能

执行各种密钥管理操作，包括：

创建和删除密钥
设置和获取密钥属性
链接和取消链接密钥
搜索和查找密钥
设置密钥权限
管理密钥环

4. 常用命令参数

// 密钥管理命令
#define KEYCTL_GET_KEYRING_ID     0  /* 获取密钥环 ID */
#define KEYCTL_JOIN_SESSION_KEYRING 1  /* 加入会话密钥环 */
#define KEYCTL_UPDATE             2  /* 更新密钥 */
#define KEYCTL_REVOKE             3  /* 撤销密钥 */
#define KEYCTL_CHOWN              4  /* 更改密钥所有者 */
#define KEYCTL_SETPERM            5  /* 设置密钥权限 */
#define KEYCTL_DESCRIBE           6  /* 描述密钥 */
#define KEYCTL_CLEAR              7  /* 清空密钥环 */
#define KEYCTL_LINK               8  /* 链接密钥 */
#define KEYCTL_UNLINK             9  /* 取消链接密钥 */
#define KEYCTL_SEARCH            10  /* 搜索密钥 */
#define KEYCTL_READ              11  /* 读取密钥 */
#define KEYCTL_INSTANTIATE       12  /* 实例化密钥 */
#define KEYCTL_NEGATE            13  /* 否定密钥 */
#define KEYCTL_SET_REQKEY_KEYRING 14 /* 设置请求密钥环 */
#define KEYCTL_SET_TIMEOUT       15  /* 设置密钥超时 */
#define KEYCTL_ASSUME_AUTHORITY  16  /* 假设权限 */
#define KEYCTL_GET_SECURITY      17  /* 获取安全上下文 */
#define KEYCTL_SESSION_TO_PARENT 18  /* 会话到父进程 */
#define KEYCTL_REJECT            19  /* 拒绝密钥 */
#define KEYCTL_INSTANTIATE_IOV   20  /* 实例化密钥 (iov) */
#define KEYCTL_INVALIDATE        21  /* 使密钥无效 */
#define KEYCTL_GET_PERSISTENT    22  /* 获取持久密钥环 */

5. 密钥类型

// 常见密钥类型
"user"        // 用户定义的密钥
"logon"       // 登录凭证密钥
"trusted"     // 受信任的密钥
"encrypted"   // 加密密钥
"dns_resolver" // DNS 解析器密钥
"rxrpc"       // RxRPC 密钥
"syzkaller"   // 系统调用模糊测试密钥

6. 返回值

成功时：返回值取决于具体命令
失败时：返回 -1，并设置 errno

7. 常见 errno 错误码

ENOKEY: 密钥不存在
EKEYEXPIRED: 密钥已过期
EKEYREVOKED: 密钥已被撤销
EACCES: 权限不足
EPERM: 操作被拒绝
EINVAL: 参数无效
ENOMEM: 内存不足
EDQUOT: 配额超限
EOPNOTSUPP: 操作不支持

8. 相似函数，或关联函数

add_key(): 添加新密钥
request_key(): 请求密钥
keyctl() 系列函数
/sbin/keyctl: 用户态密钥管理工具
/proc/keys: 查看系统密钥信息
/proc/key-users: 查看密钥用户信息

9. 示例代码

示例1：基本使用 - 密钥创建和管理

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <keyutils.h>
#include <errno.h>
#include <string.h>

#ifndef SYS_keyctl
# define SYS_keyctl 250  // x86_64 架构下的系统调用号
#endif

// keyctl 系统调用包装函数
long keyctl_wrapper(int cmd, ...) {
    va_list args;
    va_start(args, cmd);
    long result = syscall(SYS_keyctl, cmd, va_arg(args, long), 
                         va_arg(args, long), va_arg(args, long), 
                         va_arg(args, long));
    va_end(args);
    return result;
}

int main() {
    printf("=== keyctl 基本使用演示 ===\n");
    
    // 检查密钥支持
    key_serial_t session_keyring = keyctl(KEYCTL_GET_KEYRING_ID, 
                                         KEY_SPEC_SESSION_KEYRING, 0);
    if (session_keyring == -1) {
        printf("错误: 系统不支持密钥保留服务: %s\n", strerror(errno));
        return 1;
    }
    
    printf("✓ 密钥保留服务可用\n");
    printf("会话密钥环 ID: %d\n", session_keyring);
    
    // 创建一个用户密钥
    const char *key_desc = "my_test_key";
    const char *key_data = "This is my secret data";
    
    key_serial_t key = add_key("user", key_desc, key_data, strlen(key_data), 
                              KEY_SPEC_SESSION_KEYRING);
    if (key == -1) {
        printf("创建密钥失败: %s\n", strerror(errno));
        return 1;
    }
    
    printf("✓ 成功创建密钥，ID: %d\n", key);
    
    // 描述密钥
    char description&#91;256];
    long desc_len = keyctl(KEYCTL_DESCRIBE, key, (long)description, 
                          sizeof(description), 0);
    if (desc_len != -1) {
        description&#91;desc_len] = '\0';
        printf("密钥描述: %s\n", description);
    }
    
    // 读取密钥数据
    char read_data&#91;256];
    long data_len = keyctl(KEYCTL_READ, key, (long)read_data, 
                          sizeof(read_data), 0);
    if (data_len != -1) {
        read_data&#91;data_len] = '\0';
        printf("密钥数据: %s\n", read_data);
    }
    
    // 设置密钥超时（30秒后过期）
    if (keyctl(KEYCTL_SET_TIMEOUT, key, 30, 0, 0) == 0) {
        printf("✓ 设置密钥超时为 30 秒\n");
    }
    
    // 获取密钥安全上下文
    char security_context&#91;256];
    long sec_len = keyctl(KEYCTL_GET_SECURITY, key, 
                         (long)security_context, sizeof(security_context), 0);
    if (sec_len != -1) {
        security_context&#91;sec_len] = '\0';
        printf("安全上下文: %s\n", security_context);
    }
    
    // 撤销密钥
    if (keyctl(KEYCTL_REVOKE, key, 0, 0, 0) == 0) {
        printf("✓ 成功撤销密钥\n");
    }
    
    return 0;
}

示例2：密钥环操作

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <keyutils.h>
#include <errno.h>
#include <string.h>

void demonstrate_keyring_operations() {
    printf("=== 密钥环操作演示 ===\n");
    
    // 获取各种密钥环 ID
    key_serial_t session_ring = keyctl(KEYCTL_GET_KEYRING_ID, 
                                      KEY_SPEC_SESSION_KEYRING, 0);
    key_serial_t process_ring = keyctl(KEYCTL_GET_KEYRING_ID, 
                                      KEY_SPEC_PROCESS_KEYRING, 0);
    key_serial_t thread_ring = keyctl(KEYCTL_GET_KEYRING_ID, 
                                     KEY_SPEC_THREAD_KEYRING, 0);
    
    printf("密钥环 ID:\n");
    printf("  会话密钥环: %d\n", session_ring);
    printf("  进程密钥环: %d\n", process_ring);
    printf("  线程密钥环: %d\n", thread_ring);
    
    // 创建自定义密钥环
    key_serial_t custom_ring = add_key("keyring", "my_custom_ring", 
                                      NULL, 0, KEY_SPEC_SESSION_KEYRING);
    if (custom_ring != -1) {
        printf("✓ 创建自定义密钥环: %d\n", custom_ring);
        
        // 在自定义密钥环中创建密钥
        key_serial_t ring_key = add_key("user", "ring_key", 
                                       "data in ring", 12, custom_ring);
        if (ring_key != -1) {
            printf("✓ 在自定义密钥环中创建密钥: %d\n", ring_key);
        }
        
        // 清空自定义密钥环
        if (keyctl(KEYCTL_CLEAR, custom_ring, 0, 0, 0) == 0) {
            printf("✓ 清空自定义密钥环\n");
        }
        
        // 撤销自定义密钥环
        if (keyctl(KEYCTL_REVOKE, custom_ring, 0, 0, 0) == 0) {
            printf("✓ 撤销自定义密钥环\n");
        }
    }
    
    // 加入新的会话密钥环
    key_serial_t new_session = keyctl(KEYCTL_JOIN_SESSION_KEYRING, 
                                     (long)"new_session", 0, 0, 0);
    if (new_session != -1) {
        printf("✓ 加入新的会话密钥环: %d\n", new_session);
    }
}

void demonstrate_key_search() {
    printf("\n=== 密钥搜索演示 ===\n");
    
    // 创建测试密钥
    key_serial_t test_key = add_key("user", "search_test", 
                                   "search data", 11, 
                                   KEY_SPEC_SESSION_KEYRING);
    if (test_key == -1) {
        printf("创建测试密钥失败: %s\n", strerror(errno));
        return;
    }
    
    printf("创建测试密钥: %d\n", test_key);
    
    // 搜索密钥
    key_serial_t found_key = keyctl(KEYCTL_SEARCH, 
                                   KEY_SPEC_SESSION_KEYRING,
                                   (long)"user", (long)"search_test", 0);
    if (found_key != -1) {
        printf("✓ 找到密钥: %d\n", found_key);
        
        // 验证找到的密钥
        if (found_key == test_key) {
            printf("✓ 验证成功：找到的密钥 ID 匹配\n");
        }
    } else {
        printf("搜索密钥失败: %s\n", strerror(errno));
    }
    
    // 撤销测试密钥
    keyctl(KEYCTL_REVOKE, test_key, 0, 0, 0);
}

void demonstrate_key_permissions() {
    printf("\n=== 密钥权限演示 ===\n");
    
    // 创建测试密钥
    key_serial_t perm_key = add_key("user", "perm_test", 
                                   "permission data", 15, 
                                   KEY_SPEC_SESSION_KEYRING);
    if (perm_key == -1) {
        printf("创建权限测试密钥失败: %s\n", strerror(errno));
        return;
    }
    
    printf("创建权限测试密钥: %d\n", perm_key);
    
    // 设置密钥权限
    // 权限格式: possessor|user|group|other
    // 每个字段: view|read|write|search|link|setattr|all
    key_perm_t permissions = KEY_POS_ALL | KEY_USR_VIEW | KEY_USR_READ;
    
    if (keyctl(KEYCTL_SETPERM, perm_key, permissions, 0, 0) == 0) {
        printf("✓ 设置密钥权限成功\n");
        
        // 描述密钥查看权限
        char desc&#91;256];
        long desc_len = keyctl(KEYCTL_DESCRIBE, perm_key, 
                              (long)desc, sizeof(desc), 0);
        if (desc_len != -1) {
            desc&#91;desc_len] = '\0';
            printf("更新后的密钥描述: %s", desc);
        }
    } else {
        printf("设置密钥权限失败: %s\n", strerror(errno));
    }
    
    // 更改密钥所有者（需要特权）
    if (keyctl(KEYCTL_CHOWN, perm_key, getuid(), 0, 0) == 0) {
        printf("✓ 更改密钥所有者成功\n");
    } else {
        if (errno == EPERM) {
            printf("ℹ 更改所有者需要特权权限\n");
        } else {
            printf("更改所有者失败: %s\n", strerror(errno));
        }
    }
    
    // 撤销密钥
    keyctl(KEYCTL_REVOKE, perm_key, 0, 0, 0);
}

int main() {
    printf("=== keyctl 密钥环操作演示 ===\n");
    
    // 检查密钥支持
    if (keyctl(KEYCTL_GET_KEYRING_ID, KEY_SPEC_SESSION_KEYRING, 0) == -1) {
        printf("错误: 系统不支持密钥保留服务\n");
        return 1;
    }
    
    demonstrate_keyring_operations();
    demonstrate_key_search();
    demonstrate_key_permissions();
    
    return 0;
}

示例3：密钥安全和加密操作

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <keyutils.h>
#include <errno.h>
#include <string.h>
#include <time.h>

void demonstrate_encrypted_keys() {
    printf("=== 加密密钥演示 ===\n");
    
    // 检查是否支持加密密钥
    printf("系统支持的密钥类型:\n");
    system("cat /proc/key-types 2>/dev/null | grep -E 'user|encrypted|trusted' || "
           "echo '无法读取密钥类型信息'");
    
    // 创建用户密钥（基础加密）
    const char *encrypted_data = "sensitive_encrypted_data";
    key_serial_t enc_key = add_key("user", "encrypted_secret", 
                                  encrypted_data, strlen(encrypted_data),
                                  KEY_SPEC_SESSION_KEYRING);
    
    if (enc_key != -1) {
        printf("✓ 创建加密数据密钥: %d\n", enc_key);
        
        // 设置超时以增强安全性
        if (keyctl(KEYCTL_SET_TIMEOUT, enc_key, 300, 0, 0) == 0) {  // 5分钟
            printf("✓ 设置密钥超时为 5 分钟\n");
        }
        
        // 读取密钥数据
        char buffer&#91;256];
        long read_len = keyctl(KEYCTL_READ, enc_key, (long)buffer, 
                              sizeof(buffer), 0);
        if (read_len != -1) {
            buffer&#91;read_len] = '\0';
            printf("读取加密数据长度: %ld 字节\n", read_len);
        }
    } else {
        printf("创建加密密钥失败: %s\n", strerror(errno));
    }
}

void demonstrate_key_lifecycle() {
    printf("\n=== 密钥生命周期管理 ===\n");
    
    // 创建密钥
    const char *key_data = "lifecycle_test_data";
    key_serial_t key = add_key("user", "lifecycle_test", 
                              key_data, strlen(key_data),
                              KEY_SPEC_SESSION_KEYRING);
    
    if (key == -1) {
        printf("创建密钥失败: %s\n", strerror(errno));
        return;
    }
    
    printf("1. 创建密钥: %d\n", key);
    
    // 描述密钥状态
    char desc&#91;256];
    long desc_len = keyctl(KEYCTL_DESCRIBE, key, (long)desc, sizeof(desc), 0);
    if (desc_len != -1) {
        desc&#91;desc_len] = '\0';
        printf("2. 密钥状态: %s", desc);
    }
    
    // 更新密钥数据
    const char *new_data = "updated_lifecycle_data";
    if (keyctl(KEYCTL_UPDATE, key, (long)new_data, strlen(new_data), 0) == 0) {
        printf("3. ✓ 更新密钥数据成功\n");
    }
    
    // 设置短期超时
    if (keyctl(KEYCTL_SET_TIMEOUT, key, 10, 0, 0) == 0) {  // 10秒
        printf("4. ✓ 设置 10 秒超时\n");
    }
    
    // 等待几秒观察超时效果
    printf("5. 等待 12 秒观察超时效果...\n");
    sleep(12);
    
    // 尝试访问已过期的密钥
    char buffer&#91;256];
    long read_len = keyctl(KEYCTL_READ, key, (long)buffer, sizeof(buffer), 0);
    if (read_len == -1) {
        if (errno == EKEYEXPIRED) {
            printf("6. ✓ 密钥已正确过期\n");
        } else {
            printf("6. 访问密钥失败: %s\n", strerror(errno));
        }
    }
    
    // 撤销密钥（即使已过期）
    if (keyctl(KEYCTL_REVOKE, key, 0, 0, 0) == 0) {
        printf("7. ✓ 撤销密钥成功\n");
    }
}

void demonstrate_key_security_analysis() {
    printf("\n=== 密钥安全分析 ===\n");
    
    // 显示当前用户的密钥使用情况
    printf("当前密钥使用情况:\n");
    system("cat /proc/key-users 2>/dev/null | head -10 || echo '无法读取密钥用户信息'");
    
    // 显示系统密钥信息
    printf("\n系统密钥信息:\n");
    system("cat /proc/keys 2>/dev/null | head -10 || echo '无法读取密钥信息'");
    
    // 显示密钥配额信息
    printf("\n密钥配额信息:\n");
    system("cat /proc/sys/kernel/keys/* 2>/dev/null || echo '无法读取密钥配额'");
    
    // 安全建议
    printf("\n密钥安全最佳实践:\n");
    printf("  • 使用适当的超时设置\n");
    printf("  • 设置最小必要权限\n");
    printf("  • 定期清理过期密钥\n");
    printf("  • 避免在日志中记录密钥数据\n");
    printf("  • 使用加密存储敏感数据\n");
    printf("  • 监控密钥使用情况\n");
}

int main() {
    printf("=== keyctl 安全和加密操作演示 ===\n");
    
    // 检查密钥支持
    if (keyctl(KEYCTL_GET_KEYRING_ID, KEY_SPEC_SESSION_KEYRING, 0) == -1) {
        printf("错误: 系统不支持密钥保留服务\n");
        return 1;
    }
    
    printf("✓ 密钥保留服务可用\n");
    
    demonstrate_encrypted_keys();
    demonstrate_key_lifecycle();
    demonstrate_key_security_analysis();
    
    return 0;
}

示例4：密钥管理工具和监控

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <keyutils.h>
#include <errno.h>
#include <string.h>
#include <time.h>

void show_key_statistics() {
    printf("=== 密钥统计信息 ===\n");
    
    // 显示密钥数量
    printf("系统密钥统计:\n");
    system("wc -l /proc/keys 2>/dev/null | awk '{print \"总密钥数: \" $1-1}' || "
           "echo '无法统计密钥数量'");
    
    // 显示密钥用户信息
    printf("\n密钥用户统计:\n");
    system("cat /proc/key-users 2>/dev/null | head -5 || echo '无法读取用户统计'");
    
    // 显示密钥类型分布
    printf("\n密钥类型分布:\n");
    system("awk '{print $4}' /proc/keys 2>/dev/null | sort | uniq -c | head -10 || "
           "echo '无法分析密钥类型'");
}

void interactive_key_manager() {
    int choice;
    char input&#91;256];
    
    while (1) {
        printf("\n=== 密钥管理菜单 ===\n");
        printf("1. 列出当前密钥\n");
        printf("2. 创建新密钥\n");
        printf("3. 查找密钥\n");
        printf("4. 删除密钥\n");
        printf("5. 显示密钥统计\n");
        printf("6. 清空会话密钥环\n");
        printf("0. 退出\n");
        printf("请选择操作: ");
        
        if (scanf("%d", &choice) != 1) {
            printf("输入无效，请重新选择\n");
            while (getchar() != '\n');  // 清空输入缓冲区
            continue;
        }
        
        switch (choice) {
            case 1:
                printf("当前会话密钥:\n");
                system("keyctl list @s 2>/dev/null || echo '无法列出密钥'");
                break;
                
            case 2: {
                printf("输入密钥描述: ");
                scanf("%255s", input);
                
                printf("输入密钥数据: ");
                scanf("%255s", input + 128);  // 简化处理
                
                key_serial_t new_key = add_key("user", input, 
                                              input + 128, 
                                              strlen(input + 128),
                                              KEY_SPEC_SESSION_KEYRING);
                if (new_key != -1) {
                    printf("✓ 成功创建密钥: %d\n", new_key);
                } else {
                    printf("❌ 创建密钥失败: %s\n", strerror(errno));
                }
                break;
            }
            
            case 3: {
                printf("输入要查找的密钥描述: ");
                scanf("%255s", input);
                
                key_serial_t found_key = keyctl(KEYCTL_SEARCH,
                                               KEY_SPEC_SESSION_KEYRING,
                                               (long)"user", (long)input, 0);
                if (found_key != -1) {
                    printf("✓ 找到密钥: %d\n", found_key);
                } else {
                    printf("❌ 未找到密钥: %s\n", strerror(errno));
                }
                break;
            }
            
            case 4: {
                printf("输入要删除的密钥 ID: ");
                int key_id;
                if (scanf("%d", &key_id) == 1) {
                    if (keyctl(KEYCTL_REVOKE, key_id, 0, 0, 0) == 0) {
                        printf("✓ 成功撤销密钥: %d\n", key_id);
                    } else {
                        printf("❌ 撤销密钥失败: %s\n", strerror(errno));
                    }
                } else {
                    printf("输入无效\n");
                }
                break;
            }
            
            case 5:
                show_key_statistics();
                break;
                
            case 6:
                if (keyctl(KEYCTL_CLEAR, KEY_SPEC_SESSION_KEYRING, 0, 0, 0) == 0) {
                    printf("✓ 成功清空会话密钥环\n");
                } else {
                    printf("❌ 清空密钥环失败: %s\n", strerror(errno));
                }
                break;
                
            case 0:
                printf("退出密钥管理工具\n");
                return;
                
            default:
                printf("无效选择，请重新输入\n");
                break;
        }
    }
}

void demonstrate_advanced_key_operations() {
    printf("\n=== 高级密钥操作 ===\n");
    
    // 创建密钥环
    key_serial_t keyring = add_key("keyring", "advanced_test_ring", 
                                  NULL, 0, KEY_SPEC_SESSION_KEYRING);
    if (keyring != -1) {
        printf("✓ 创建测试密钥环: %d\n", keyring);
        
        // 在密钥环中创建多个密钥
        for (int i = 1; i <= 3; i++) {
            char desc&#91;32], data&#91;32];
            snprintf(desc, sizeof(desc), "key_%d", i);
            snprintf(data, sizeof(data), "data_%d", i);
            
            key_serial_t key = add_key("user", desc, data, strlen(data), keyring);
            if (key != -1) {
                printf("  ✓ 创建密钥 %s: %d\n", desc, key);
            }
        }
        
        // 列出密钥环内容
        printf("密钥环内容:\n");
        char buffer&#91;1024];
        long len = keyctl(KEYCTL_READ, keyring, (long)buffer, sizeof(buffer), 0);
        if (len != -1) {
            // 简化显示
            printf("  密钥环包含 %ld 字节数据\n", len);
        }
        
        // 清空密钥环
        if (keyctl(KEYCTL_CLEAR, keyring, 0, 0, 0) == 0) {
            printf("✓ 清空密钥环成功\n");
        }
        
        // 撤销密钥环
        keyctl(KEYCTL_REVOKE, keyring, 0, 0, 0);
    }
}

void key_monitoring_demo() {
    printf("\n=== 密钥监控演示 ===\n");
    
    printf("实时密钥监控 (按 Ctrl+C 停止):\n");
    
    // 显示初始状态
    printf("初始密钥数量: ");
    system("wc -l /proc/keys 2>/dev/null | awk '{print $1-1}' || echo 'unknown'");
    
    // 创建测试密钥并监控变化
    key_serial_t test_key = add_key("user", "monitor_test", 
                                   "monitor data", 12, 
                                   KEY_SPEC_SESSION_KEYRING);
    if (test_key != -1) {
        printf("创建测试密钥后数量: ");
        system("wc -l /proc/keys 2>/dev/null | awk '{print $1-1}' || echo 'unknown'");
        
        // 撤销密钥
        keyctl(KEYCTL_REVOKE, test_key, 0, 0, 0);
        printf("撤销测试密钥后数量: ");
        system("wc -l /proc/keys 2>/dev/null | awk '{print $1-1}' || echo 'unknown'");
    }
}

int main() {
    printf("=== keyctl 高级管理和监控工具 ===\n");
    
    // 检查密钥支持
    if (keyctl(KEYCTL_GET_KEYRING_ID, KEY_SPEC_SESSION_KEYRING, 0) == -1) {
        printf("错误: 系统不支持密钥保留服务\n");
        return 1;
    }
    
    printf("✓ 密钥保留服务可用\n");
    
    // 显示系统信息
    printf("\n系统密钥信息:\n");
    system("uname -a");
    
    // 执行高级操作演示
    demonstrate_advanced_key_operations();
    
    // 显示统计信息
    show_key_statistics();
    
    // 监控演示
    key_monitoring_demo();
    
    // 启动交互式管理器（如果需要）
    char choice;
    printf("\n是否启动交互式密钥管理器? (y/N): ");
    if (scanf(" %c", &choice) == 1 && (choice == 'y' || choice == 'Y')) {
        interactive_key_manager();
    }
    
    return 0;
}

10. 密钥权限说明

// 密钥权限位定义
#define KEY_POS_VIEW    0x01000000  /* 持有者可查看 */
#define KEY_POS_READ    0x02000000  /* 持有者可读取 */
#define KEY_POS_WRITE   0x04000000  /* 持有者可写入 */
#define KEY_POS_SEARCH  0x08000000  /* 持有者可搜索 */
#define KEY_POS_LINK    0x10000000  /* 持有者可链接 */
#define KEY_POS_SETATTR 0x20000000  /* 持有者可设置属性 */
#define KEY_POS_ALL     0x3f000000  /* 持有者所有权限 */

#define KEY_USR_VIEW    0x00010000  /* 用户可查看 */
#define KEY_USR_READ    0x00020000  /* 用户可读取 */
#define KEY_USR_WRITE   0x00040000  /* 用户可写入 */
#define KEY_USR_SEARCH  0x00080000  /* 用户可搜索 */
#define KEY_USR_LINK    0x00100000  /* 用户可链接 */
#define KEY_USR_SETATTR 0x00200000  /* 用户可设置属性 */
#define KEY_USR_ALL     0x003f0000  /* 用户所有权限 */

#define KEY_GRP_VIEW    0x00000100  /* 组可查看 */
#define KEY_GRP_READ    0x00000200  /* 组可读取 */
#define KEY_GRP_WRITE   0x00000400  /* 组可写入 */
#define KEY_GRP_SEARCH  0x00000800  /* 组可搜索 */
#define KEY_GRP_LINK    0x00001000  /* 组可链接 */
#define KEY_GRP_SETATTR 0x00002000  /* 组可设置属性 */
#define KEY_GRP_ALL     0x00003f00  /* 组所有权限 */

#define KEY_OTH_VIEW    0x00000001  /* 其他可查看 */
#define KEY_OTH_READ    0x00000002  /* 其他可读取 */
#define KEY_OTH_WRITE   0x00000004  /* 其他可写入 */
#define KEY_OTH_SEARCH  0x00000008  /* 其他可搜索 */
#define KEY_OTH_LINK    0x00000010  /* 其他可链接 */
#define KEY_OTH_SETATTR 0x00000020  /* 其他可设置属性 */
#define KEY_OTH_ALL     0x0000003f  /* 其他所有权限 */

11. 实际应用场景

场景1：网络认证

void network_authentication_keys() {
    // 存储网络认证凭证
    add_key("user", "wifi_password", "secret123", 9, 
            KEY_SPEC_SESSION_KEYRING);
}

场景2：文件系统加密

void filesystem_encryption_keys() {
    // 存储加密文件系统密钥
    add_key("encrypted", "fs_encryption_key", key_data, key_len,
            KEY_SPEC_SESSION_KEYRING);
}

场景3：安全通信

void secure_communication_keys() {
    // 存储 TLS/SSL 会话密钥
    add_key("user", "tls_session_key", session_key, key_len,
            KEY_SPEC_PROCESS_KEYRING);
}

12. 注意事项

使用 keyctl 时需要注意：

权限要求: 某些操作需要特殊权限

内存安全: 密钥数据在内存中的安全处理

生命周期: 正确管理密钥的创建、使用和销毁

超时设置: 合理设置密钥超时以增强安全性

权限控制: 设置最小必要权限

系统资源: 监控密钥使用对系统资源的影响

13. 系统配置检查

# 检查密钥支持
grep CONFIG_KEYS /boot/config-$(uname -r)

# 查看密钥信息
cat /proc/keys
cat /proc/key-users

# 检查密钥配额
cat /proc/sys/kernel/keys/maxkeys
cat /proc/sys/kernel/keys/maxbytes

# 使用 keyctl 工具
keyctl list @s
keyctl show

总结

keyctl 是 Linux 系统中强大的密钥管理接口：

关键特性:1. 安全存储: 内核级密钥安全存储2. 权限控制: 细粒度的访问控制3. 生命周期管理: 完整的密钥生命周期控制4. 多种密钥类型: 支持各种应用场景5. 系统集成: 与 Linux 安全子系统深度集成

主要应用:1. 网络认证和安全通信2. 文件系统加密3. 应用程序密钥管理4. 系统安全服务5. 容器和虚拟化安全

使用要点:1. 需要理解密钥权限模型2. 注意密钥生命周期管理3. 合理设置超时和权限4. 监控密钥使用情况5. 配合用户态工具使用效果更佳

正确使用 keyctl 可以显著提高应用程序的安全性，是现代 Linux 系统安全架构的重要组成部分。

2025-08-16

Linux系统编程

listen系统调用及示例

继续学习 Linux 系统编程中的重要函数。这次我们介绍 listen 函数，它是 TCP 服务器模型中不可或缺的一环，用于将一个已绑定的套接字置于监听状态，准备接收来自客户端的连接请求。

1. 函数介绍

listen 是一个 Linux 系统调用，专门用于 TCP 服务器。它的核心作用是将一个已经绑定到本地地址（通过 bind）的套接字的状态从默认的主动打开（active open）转变为被动打开（passive open）。

简单来说，listen 告诉操作系统内核：

“嘿，内核，我这个套接字（sockfd）已经绑定了一个地址（IP 和端口），现在我想开始监听这个地址，等待客户端的连接请求。请帮我管理这些连接请求，把它们排好队，等我用 accept 来处理。”

你可以把 listen 想象成商店开门营业：

商店（套接字）已经选好了地址（通过 bind）。
listen 就像是老板在门口挂上“营业中”的牌子，并告诉店员（内核）：“有人来敲门（连接请求），先让他们在门外等一会儿（排队），别让他们直接冲进来。”
accept 则像是店员去开门，把排队的顾客（客户端）迎进来，开始一对一的服务。

2. 函数原型

#include <sys/socket.h> // 必需

int listen(int sockfd, int backlog);

3. 功能

启用监听: 将套接字 sockfd 的状态设置为监听模式。

建立队列: 告诉内核为此套接字创建两个队列（具体实现可能有所不同，但概念如此）：

未完成连接队列 (incomplete connection queue)：存放那些正在执行 TCP 三次握手但尚未完成的连接请求。
已完成连接队列 (completed connection queue)：存放那些已经完成 TCP 三次握手、等待服务器程序通过 accept 接受的连接。
限制队列长度: backlog 参数用于提示内核这两个队列的最大总长度。当队列满时，新的连接请求可能会被忽略或拒绝。

4. 参数

int sockfd: 这是一个已经成功调用 bind 的套接字文件描述符。

必须是面向连接的套接字，如 SOCK_STREAM (TCP)。
不能是无连接的套接字，如 SOCK_DGRAM (UDP)。对 UDP 套接字调用 listen 会失败。

int backlog: 这个参数用于指定连接请求队列的最大长度。

它告诉内核，最多允许多少个已完成（或接近完成）的连接请求在此套接字上排队等待 accept。
实际队列长度: 内核可能会将这个值视为一个提示，并可能根据系统资源或配置将其调整为一个不同的、通常是不超过 SOMAXCONN 的值。SOMAXCONN 是系统定义的最大队列长度（在 Linux 上通常是 128 或 4096）。

选择合适的值:

过小: 可能导致客户端连接被拒绝（ECONNREFUSED），特别是在高并发场景下。
过大: 可能消耗过多内核资源。
常见做法: 传统上使用 5 (#define LISTENQ 5)。现代高性能服务器可能会设置一个更大的值，如 128 或 1024。#define LISTENQ 1024 是一个常用的较大值。
现代建议: 可以直接使用 SOMAXCONN 常量，让系统决定最大值。

5. 返回值

成功时: 返回 0。套接字 sockfd 现在处于监听状态。
失败时: 返回 -1，并设置全局变量 errno 来指示具体的错误原因（例如 EADDRINUSE 本地地址已被使用，EBADF sockfd 无效，EINVAL 套接字未绑定或不支持监听，ENOMEM 内存不足等）。

6. 相似函数，或关联函数

socket: 创建套接字。
bind: 将套接字绑定到本地地址，是 listen 的前置步骤。
accept: 从 listen 创建的已完成连接队列中取出一个连接，是 listen 的后续步骤。
connect: 客户端使用此函数向监听的服务器发起连接请求。

7. 示例代码

示例 1：标准的 TCP 服务器 socket -> bind -> listen 流程

这个例子演示了设置一个 TCP 服务器的标准三步流程。

// tcp_listen_server.c
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define PORT 8088
// 使用 SOMAXCONN 作为 backlog，让系统选择合适的最大队列长度
#define BACKLOG SOMAXCONN
// 或者使用一个自定义值，如 #define BACKLOG 128

int main() {
    int server_fd;
    struct sockaddr_in address;
    int opt = 1;

    // 1. 创建套接字 (第一步)
    if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) == 0) {
        perror("socket failed");
        exit(EXIT_FAILURE);
    }
    printf("Step 1: Socket created successfully (fd: %d)\n", server_fd);

    // 2. 设置套接字选项
    if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt))) {
        perror("setsockopt failed");
        close(server_fd);
        exit(EXIT_FAILURE);
    }

    // 3. 配置服务器地址结构
    memset(&address, 0, sizeof(address));
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = INADDR_ANY;
    address.sin_port = htons(PORT);

    // 4. 绑定套接字到地址和端口 (第二步)
    printf("Step 2: Binding socket to port %d...\n", PORT);
    if (bind(server_fd, (struct sockaddr *)&address, sizeof(address)) < 0) {
        perror("bind failed");
        close(server_fd);
        exit(EXIT_FAILURE);
    }
    printf("Socket bound successfully to 0.0.0.0:%d\n", PORT);

    // 5. 使套接字进入监听状态 (第三步，关键)
    printf("Step 3: Putting socket into listening mode with backlog %d...\n", BACKLOG);
    if (listen(server_fd, BACKLOG) < 0) {
        perror("listen failed");
        close(server_fd);
        exit(EXIT_FAILURE);
    }
    printf("Server is now LISTENING on port %d with backlog %d.\n", PORT, BACKLOG);

    printf("\nServer setup complete. Waiting for connections...\n");
    printf("Run a client to connect, e.g., 'telnet localhost %d' or 'nc localhost %d'\n", PORT, PORT);

    // --- 服务器已准备好，可以调用 accept() 来接受连接 ---
    // 按 Ctrl+C 退出程序
    pause(); // 永久挂起，直到收到信号

    close(server_fd);
    printf("Server socket closed.\n");
    return 0;
}

代码解释:

创建套接字: socket(AF_INET, SOCK_STREAM, 0) 创建一个 IPv4 TCP 套接字。

设置选项: setsockopt(… SO_REUSEADDR …) 设置地址重用选项。

绑定地址: bind(…) 将套接字绑定到所有接口 (INADDR_ANY) 的 PORT 端口。

**监听连接 **(关键步骤) 调用 listen(server_fd, BACKLOG)。

server_fd: 要监听的套接字。
BACKLOG: 连接队列的最大长度。这里使用 SOMAXCONN，让系统决定。

调用成功后，服务器套接字进入监听状态。内核开始为该套接字维护连接请求队列。

程序挂起，等待客户端连接。实际的连接处理需要在后续调用 accept()。

示例 2：演示 listen 失败的情况

这个例子演示了在错误的情况下调用 listen 会发生什么。

// listen_failures.c
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    int sock;

    // --- 情况 1: 对未绑定的套接字调用 listen ---
    printf("--- Test 1: listen() on an unbound socket ---\n");
    sock = socket(AF_INET, SOCK_STREAM, 0);
    if (sock < 0) {
        perror("socket failed");
        exit(EXIT_FAILURE);
    }

    if (listen(sock, 5) < 0) {
        perror("listen on unbound socket failed (expected)");
        // 这通常会失败，errno 为 EINVAL
    } else {
        printf("listen on unbound socket unexpectedly succeeded.\n");
    }
    close(sock);

    // --- 情况 2: 对 UDP 套接字调用 listen ---
    printf("\n--- Test 2: listen() on a UDP socket ---\n");
    sock = socket(AF_INET, SOCK_DGRAM, 0);
    if (sock < 0) {
        perror("UDP socket failed");
        exit(EXIT_FAILURE);
    }

    if (listen(sock, 5) < 0) {
        perror("listen on UDP socket failed (expected)");
        // 这会失败，errno 通常为 EOPNOTSUPP (Operation not supported)
    } else {
        printf("listen on UDP socket unexpectedly succeeded.\n");
    }
    close(sock);

    // --- 情况 3: 对已关闭的套接字调用 listen ---
    printf("\n--- Test 3: listen() on a closed socket ---\n");
    sock = socket(AF_INET, SOCK_STREAM, 0);
    if (sock < 0) {
        perror("socket failed");
        exit(EXIT_FAILURE);
    }
    close(sock); // 先关闭

    if (listen(sock, 5) < 0) {
        perror("listen on closed socket failed (expected)");
        // 这会失败，errno 通常为 EBADF (Bad file descriptor)
    } else {
        printf("listen on closed socket unexpectedly succeeded.\n");
    }

    printf("\nAll failure tests completed.\n");
    return 0;
}

代码解释:

测试 1: 创建一个 TCP 套接字后不调用 bind，直接调用 listen。这会失败，通常 errno 为 EINVAL（Invalid argument）。2. 测试 2: 创建一个 UDP (SOCK_DGRAM) 套接字，然后调用 listen。这会失败，通常 errno 为 EOPNOTSUPP（Operation not supported）。3. 测试 3: 创建一个套接字，调用 close 关闭它，然后再调用 listen。这会失败，通常 errno 为 EBADF（Bad file descriptor）。

示例 3：listen 与 accept 的结合使用

这个例子将 listen 和 accept 结合起来，展示一个完整的、但简化的服务器循环。

// listen_accept_demo.c
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h> // inet_ntoa

#define PORT 8089
#define BACKLOG 10

void handle_client(int client_fd, struct sockaddr_in *client_addr) {
    printf("Handling client %s:%d (fd: %d)\n",
           inet_ntoa(client_addr->sin_addr), ntohs(client_addr->sin_port), client_fd);
    // 在实际应用中，这里会进行数据读写
    // 为了演示，我们立即关闭连接
    close(client_fd);
    printf("Closed connection to client %s:%d\n",
           inet_ntoa(client_addr->sin_addr), ntohs(client_addr->sin_port));
}

int main() {
    int server_fd, client_fd;
    struct sockaddr_in address, client_address;
    socklen_t client_addr_len = sizeof(client_address);
    int opt = 1;

    // 1. 创建套接字
    if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) == 0) {
        perror("socket failed");
        exit(EXIT_FAILURE);
    }

    // 2. 设置套接字选项
    if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt))) {
        perror("setsockopt failed");
        close(server_fd);
        exit(EXIT_FAILURE);
    }

    // 3. 配置并绑定地址
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = INADDR_ANY;
    address.sin_port = htons(PORT);

    if (bind(server_fd, (struct sockaddr *)&address, sizeof(address)) < 0) {
        perror("bind failed");
        close(server_fd);
        exit(EXIT_FAILURE);
    }

    // 4. 关键：监听连接
    if (listen(server_fd, BACKLOG) < 0) {
        perror("listen failed");
        close(server_fd);
        exit(EXIT_FAILURE);
    }
    printf("Server listening on port %d (backlog: %d)\n", PORT, BACKLOG);

    printf("Accepting connections for 10 seconds...\n");

    // 5. 循环接受连接 (只接受几个演示)
    time_t start_time = time(NULL);
    int connections_handled = 0;
    while (difftime(time(NULL), start_time) < 10.0) {
        // accept 是阻塞调用，会等待直到有连接或出错
        client_fd = accept(server_fd, (struct sockaddr *)&client_address, &client_addr_len);
        if (client_fd < 0) {
            perror("accept failed");
            continue;
        }

        connections_handled++;
        printf("New connection #%d accepted.\n", connections_handled);
        handle_client(client_fd, &client_address);

        // 简单限制演示连接数
        if (connections_handled >= 3) {
            break;
        }
    }

    printf("Handled %d connections in 10 seconds. Shutting down.\n", connections_handled);
    close(server_fd);
    return 0;
}

如何测试:

编译并运行服务器：gcc -o listen_accept_demo listen_accept_demo.c ./listen_accept_demo

在另一个或多个终端中，快速运行客户端命令：telnet localhost 8089 # 或者 nc localhost 8089

代码解释:

执行标准的 socket -> setsockopt -> bind -> listen 流程。2. 进入一个循环，持续调用 accept(server_fd, …)。3. accept 是一个阻塞调用。如果没有待处理的连接，程序会在此处挂起等待。4. 当有客户端连接请求到达并完成三次握手后，accept 会从已完成连接队列中取出该连接，返回一个新的文件描述符 client_fd，专门用于与该客户端通信。5. 调用 handle_client 函数（这里只是简单地打印信息并关闭连接）。6. 主循环继续调用 accept，处理下一个连接。

重要提示与注意事项:

顺序至关重要: 必须严格按照 socket() -> bind() -> listen() -> accept() 的顺序进行。2. 仅用于面向连接的套接字: listen 只能用于 SOCK_STREAM (TCP) 类型的套接字。对 SOCK_DGRAM (UDP) 调用会失败。3. backlog 的含义: 理解 backlog 是队列长度的提示，而不是严格保证。内核可能会调整它。对于高并发服务器，设置一个较大的 backlog 是明智的。4. accept 是关键: listen 只是设置了监听状态和队列，真正接受连接的操作是由 accept 完成的。5. 错误处理: 始终检查 listen 的返回值。最常见的错误是 EINVAL（套接字未绑定）和 EOPNOTSUPP（套接字类型不支持）。6. 队列溢出: 如果连接请求的速度超过了服务器 accept 的速度，且队列已满，新的连接请求可能会被内核丢弃，客户端会收到连接被拒（ECONNREFUSED）的错误。

总结:

listen 是 TCP 服务器编程模型的核心组件之一。它将一个绑定好的套接字转变为可以接收连接请求的状态，并由内核管理一个连接队列。理解其作用和参数（特别是 backlog）对于构建能够处理并发连接请求的服务器至关重要。它是连接 bind 和 accept 的桥梁。

2025-08-16

Linux系统编程

open_by_handle_at系统调用及示例

open_by_handle_at 函数详解

函数介绍

open_by_handle_at 是 Linux 系统中用于通过文件句柄打开文件的系统调用。可以把文件句柄想象成”文件的身份证号码”，而 open_by_handle_at 就是通过这个”身份证号码”来访问文件的工具。

与传统的通过路径名打开文件不同，open_by_handle_at 不依赖于文件路径，即使文件被移动、重命名或删除后恢复，只要文件系统支持，仍然可以通过句柄访问文件。这就像你通过身份证号码在任何地方都能找到一个人一样。

函数原型

#define _GNU_SOURCE
#include <fcntl.h>

int open_by_handle_at(int mount_fd, struct file_handle *handle, int flags);

功能

open_by_handle_at 函数用于通过文件句柄打开文件，返回一个文件描述符，可以像普通文件一样进行读写操作。

参数

mount_fd: 挂载点文件描述符

可以是任何该文件系统中的文件描述符
通常使用 AT_FDCWD 表示当前工作目录
也可以是该文件系统根目录的文件描述符

handle: 指向 file_handle 结构体的指针

包含之前通过 name_to_handle_at 获取的文件句柄

flags: 文件打开标志

O_RDONLY: 只读打开
O_WRONLY: 只写打开
O_RDWR: 读写打开
O_CREAT, O_TRUNC 等标志不适用（文件必须已存在）

file_handle 结构体

struct file_handle {
    unsigned int  handle_bytes;   /* 句柄数据的字节数 */
    int           handle_type;    /* 句柄类型 */
    unsigned char f_handle&#91;0];    /* 句柄数据（变长数组）*/
};

返回值

成功: 返回文件描述符（非负整数）
失败: 返回 -1，并设置相应的 errno 错误码

常见错误码

EACCES: 权限不足
EBADF: mount_fd 不是有效的文件描述符
EFAULT: handle 指针无效
EINVAL: 参数无效（如 handle 为 NULL 或 flags 无效）
EMFILE: 进程文件描述符过多
ENFILE: 系统文件描述符过多
ENOMEM: 内存不足
ENOSPC: 磁盘空间不足（写操作）
ENOTDIR: mount_fd 不是目录
EOPNOTSUPP: 文件系统不支持文件句柄
ESTALE: 文件句柄已失效（文件可能已被删除）

相似函数或关联函数

name_to_handle_at: 获取文件句柄
open/openat: 通过路径名打开文件
openat2: 增强版的 openat
fstat: 通过文件描述符获取文件状态
read/write: 文件读写操作

示例代码

示例1：基础用法 - 通过句柄打开文件

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <errno.h>
#include <string.h>

// 创建测试文件
int create_test_file(const char *filename) {
    int fd = open(filename, O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd == -1) {
        perror("创建测试文件失败");
        return -1;
    }
    
    const char *content = "这是测试文件的内容\n用于演示通过句柄打开文件的功能\n";
    ssize_t bytes_written = write(fd, content, strlen(content));
    if (bytes_written == -1) {
        perror("写入文件失败");
        close(fd);
        return -1;
    }
    
    printf("创建测试文件: %s (写入 %zd 字节)\n", filename, bytes_written);
    close(fd);
    return 0;
}

// 获取文件句柄
int get_file_handle(const char *filename, struct file_handle **handle, int *mount_id) {
    size_t handle_size = sizeof(struct file_handle);
    *handle = malloc(handle_size);
    if (!*handle) {
        perror("内存分配失败");
        return -1;
    }
    
    (*handle)->handle_bytes = 0;
    
    int result = name_to_handle_at(AT_FDCWD, filename, *handle, mount_id, 0);
    if (result == -1 && errno == EOVERFLOW) {
        handle_size = sizeof(struct file_handle) + (*handle)->handle_bytes;
        free(*handle);
        
        *handle = malloc(handle_size);
        if (!*handle) {
            perror("内存分配失败");
            return -1;
        }
        
        result = name_to_handle_at(AT_FDCWD, filename, *handle, mount_id, 0);
    }
    
    return result;
}

// 通过句柄打开文件
int open_file_by_handle(struct file_handle *handle, int flags) {
    printf("通过句柄打开文件 (标志: 0x%x)...\n", flags);
    
    int fd = open_by_handle_at(AT_FDCWD, handle, flags);
    if (fd != -1) {
        printf("✓ 成功打开文件，文件描述符: %d\n", fd);
        
        // 获取文件信息
        struct stat st;
        if (fstat(fd, &st) == 0) {
            printf("  文件大小: %ld 字节\n", (long)st.st_size);
            printf("  修改时间: %s", ctime(&st.st_mtime));
            printf("  权限: %o\n", st.st_mode & 0777);
        }
        
        return fd;
    } else {
        printf("✗ 打开文件失败: %s\n", strerror(errno));
        return -1;
    }
}

// 读取文件内容
int read_file_content(int fd) {
    char buffer&#91;256];
    ssize_t bytes_read = read(fd, buffer, sizeof(buffer) - 1);
    
    if (bytes_read > 0) {
        buffer&#91;bytes_read] = '\0';
        printf("读取到的内容 (%zd 字节):\n%s", bytes_read, buffer);
        return 0;
    } else if (bytes_read == 0) {
        printf("文件为空\n");
        return 0;
    } else {
        perror("读取文件失败");
        return -1;
    }
}

int main() {
    const char *test_file = "handle_open_test.txt";
    struct file_handle *handle = NULL;
    int mount_id;
    int fd;
    
    printf("=== open_by_handle_at 基础示例 ===\n\n");
    
    // 创建测试文件
    if (create_test_file(test_file) == -1) {
        return 1;
    }
    
    // 获取文件句柄
    printf("\n1. 获取文件句柄:\n");
    if (get_file_handle(test_file, &handle, &mount_id) == 0) {
        printf("  ✓ 成功获取文件句柄\n");
        printf("  挂载 ID: %d\n", mount_id);
        printf("  句柄类型: %d\n", handle->handle_type);
        printf("  句柄大小: %u 字节\n", handle->handle_bytes);
    } else {
        printf("  ✗ 获取文件句柄失败: %s\n", strerror(errno));
        unlink(test_file);
        return 1;
    }
    
    // 通过句柄以只读方式打开文件
    printf("\n2. 通过句柄以只读方式打开文件:\n");
    fd = open_file_by_handle(handle, O_RDONLY);
    if (fd != -1) {
        read_file_content(fd);
        close(fd);
    }
    
    // 通过句柄以读写方式打开文件
    printf("\n3. 通过句柄以读写方式打开文件:\n");
    fd = open_file_by_handle(handle, O_RDWR);
    if (fd != -1) {
        printf("  向文件追加内容...\n");
        const char *append_content = "追加的内容\n";
        lseek(fd, 0, SEEK_END);  // 移动到文件末尾
        ssize_t bytes_written = write(fd, append_content, strlen(append_content));
        if (bytes_written > 0) {
            printf("  ✓ 成功追加 %zd 字节\n", bytes_written);
        }
        
        // 重新读取文件内容
        printf("  重新读取文件内容:\n");
        lseek(fd, 0, SEEK_SET);  // 移动到文件开头
        read_file_content(fd);
        
        close(fd);
    }
    
    // 测试错误情况
    printf("\n4. 测试错误情况:\n");
    printf("  尝试使用无效句柄打开文件:\n");
    struct file_handle invalid_handle = {0};
    int invalid_fd = open_by_handle_at(AT_FDCWD, &invalid_handle, O_RDONLY);
    if (invalid_fd == -1) {
        printf("    ✓ 正确处理无效句柄: %s\n", strerror(errno));
    }
    
    // 清理资源
    if (handle) {
        free(handle);
    }
    unlink(test_file);
    
    printf("\n=== 文件句柄打开特点 ===\n");
    printf("1. 路径无关: 不依赖文件路径名\n");
    printf("2. 持久性: 文件移动后仍可访问\n");
    printf("3. 安全性: 防止路径遍历攻击\n");
    printf("4. 唯一性: 每个文件有唯一句柄\n");
    printf("5. 系统级: 由内核维护，无法伪造\n");
    printf("\n");
    printf("使用场景:\n");
    printf("1. 文件监控系统\n");
    printf("2. 备份和同步工具\n");
    printf("3. 容器文件系统\n");
    printf("4. 网络文件传输\n");
    printf("5. 安全文件访问\n");
    
    return 0;
}

示例2：文件句柄的持久性和安全性

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <errno.h>
#include <string.h>
#include <time.h>

// 文件信息结构体
struct persistent_file {
    char original_name&#91;256];
    char current_name&#91;256];
    struct file_handle *handle;
    int mount_id;
    time_t create_time;
};

// 创建测试文件
int create_test_file_with_content(const char *filename, const char *content) {
    int fd = open(filename, O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (fd == -1) {
        perror("创建测试文件失败");
        return -1;
    }
    
    ssize_t bytes_written = write(fd, content, strlen(content));
    if (bytes_written == -1) {
        perror("写入文件失败");
        close(fd);
        return -1;
    }
    
    printf("创建测试文件: %s (%zd 字节)\n", filename, bytes_written);
    close(fd);
    return 0;
}

// 获取文件句柄
int get_file_handle_safe(const char *filename, struct file_handle **handle, int *mount_id) {
    size_t handle_size = sizeof(struct file_handle);
    *handle = malloc(handle_size);
    if (!*handle) {
        return -1;
    }
    
    (*handle)->handle_bytes = 0;
    
    int result = name_to_handle_at(AT_FDCWD, filename, *handle, mount_id, 0);
    if (result == -1 && errno == EOVERFLOW) {
        handle_size = sizeof(struct file_handle) + (*handle)->handle_bytes;
        free(*handle);
        
        *handle = malloc(handle_size);
        if (!*handle) {
            return -1;
        }
        
        result = name_to_handle_at(AT_FDCWD, filename, *handle, mount_id, 0);
    }
    
    return result;
}

// 通过句柄安全地打开文件
int open_file_by_handle_safe(struct file_handle *handle, int flags, const char *description) {
    printf("通过句柄打开文件: %s\n", description ? description : "未知文件");
    
    int fd = open_by_handle_at(AT_FDCWD, handle, flags);
    if (fd != -1) {
        printf("  ✓ 成功打开文件 (fd: %d)\n", fd);
        return fd;
    } else {
        printf("  ✗ 打开文件失败: %s\n", strerror(errno));
        return -1;
    }
}

// 读取并验证文件内容
int verify_file_content(int fd, const char *expected_content, const char *description) {
    if (lseek(fd, 0, SEEK_SET) == -1) {
        perror("定位文件开头失败");
        return -1;
    }
    
    char *buffer = malloc(strlen(expected_content) + 1);
    if (!buffer) {
        perror("内存分配失败");
        return -1;
    }
    
    ssize_t bytes_read = read(fd, buffer, strlen(expected_content));
    if (bytes_read > 0) {
        buffer&#91;bytes_read] = '\0';
        printf("  %s内容验证: ", description ? description : "");
        if (strcmp(buffer, expected_content) == 0) {
            printf("通过 ✓\n");
            free(buffer);
            return 0;
        } else {
            printf("失败 ✗\n");
            printf("    期望: %s", expected_content);
            printf("    实际: %s", buffer);
            free(buffer);
            return -1;
        }
    } else {
        perror("读取文件失败");
        free(buffer);
        return -1;
    }
}

int main() {
    struct persistent_file file_info;
    const char *original_name = "persistent_original.txt";
    const char *renamed_name = "persistent_renamed.txt";
    const char *content = "这是持久化文件的内容\n创建时间: ";
    
    printf("=== 文件句柄持久性和安全性示例 ===\n\n");
    
    // 构造带时间戳的内容
    char full_content&#91;512];
    time_t now = time(NULL);
    snprintf(full_content, sizeof(full_content), "%s%s", content, ctime(&now));
    
    // 创建测试文件
    printf("1. 创建测试文件:\n");
    if (create_test_file_with_content(original_name, full_content) == -1) {
        return 1;
    }
    
    strncpy(file_info.original_name, original_name, sizeof(file_info.original_name) - 1);
    file_info.original_name&#91;sizeof(file_info.original_name) - 1] = '\0';
    file_info.create_time = now;
    
    // 获取文件句柄
    printf("\n2. 获取文件句柄:\n");
    if (get_file_handle_safe(original_name, &file_info.handle, &file_info.mount_id) == 0) {
        printf("  ✓ 成功获取文件句柄\n");
        printf("  挂载 ID: %d\n", file_info.mount_id);
        printf("  句柄大小: %u 字节\n", file_info.handle->handle_bytes);
    } else {
        printf("  ✗ 获取文件句柄失败: %s\n", strerror(errno));
        unlink(original_name);
        return 1;
    }
    
    // 通过句柄访问原始文件
    printf("\n3. 通过句柄访问原始文件:\n");
    int fd = open_file_by_handle_safe(file_info.handle, O_RDONLY, "原始文件");
    if (fd != -1) {
        if (verify_file_content(fd, full_content, "原始文件") == 0) {
            printf("  ✓ 原始文件内容验证通过\n");
        }
        close(fd);
    }
    
    // 重命名文件
    printf("\n4. 重命名文件 (模拟文件移动):\n");
    if (rename(original_name, renamed_name) == 0) {
        printf("  ✓ 成功重命名文件: %s -> %s\n", original_name, renamed_name);
        strncpy(file_info.current_name, renamed_name, sizeof(file_info.current_name) - 1);
        file_info.current_name&#91;sizeof(file_info.current_name) - 1] = '\0';
    } else {
        printf("  ✗ 重命名文件失败: %s\n", strerror(errno));
        free(file_info.handle);
        unlink(original_name);
        return 1;
    }
    
    // 通过句柄访问重命名后的文件
    printf("\n5. 通过句柄访问重命名后的文件:\n");
    fd = open_file_by_handle_safe(file_info.handle, O_RDONLY, "重命名后的文件");
    if (fd != -1) {
        if (verify_file_content(fd, full_content, "重命名文件") == 0) {
            printf("  ✓ 重命名文件内容验证通过\n");
            printf("  ✓ 证明: 文件句柄不受文件名变化影响\n");
        }
        close(fd);
    }
    
    // 创建符号链接并测试
    printf("\n6. 创建符号链接测试:\n");
    const char *symlink_name = "persistent_symlink.txt";
    if (symlink(renamed_name, symlink_name) == 0) {
        printf("  ✓ 创建符号链接: %s -> %s\n", symlink_name, renamed_name);
        
        // 获取符号链接的句柄
        struct file_handle *symlink_handle = NULL;
        int symlink_mount_id;
        if (get_file_handle_safe(symlink_name, &symlink_handle, &symlink_mount_id) == 0) {
            printf("  ✓ 获取符号链接句柄成功\n");
            
            // 通过符号链接句柄打开
            fd = open_file_by_handle_safe(symlink_handle, O_RDONLY, "符号链接");
            if (fd != -1) {
                if (verify_file_content(fd, full_content, "符号链接") == 0) {
                    printf("  ✓ 符号链接内容验证通过\n");
                }
                close(fd);
            }
            free(symlink_handle);
        }
        unlink(symlink_name);
    }
    
    // 测试不同打开标志
    printf("\n7. 测试不同打开标志:\n");
    
    // 只读打开
    printf("  只读打开 (O_RDONLY):\n");
    fd = open_file_by_handle_safe(file_info.handle, O_RDONLY, "只读模式");
    if (fd != -1) {
        printf("    ✓ 只读打开成功\n");
        close(fd);
    }
    
    // 读写打开
    printf("  读写打开 (O_RDWR):\n");
    fd = open_file_by_handle_safe(file_info.handle, O_RDWR, "读写模式");
    if (fd != -1) {
        printf("    ✓ 读写打开成功\n");
        close(fd);
    }
    
    // 只写打开
    printf("  只写打开 (O_WRONLY):\n");
    fd = open_file_by_handle_safe(file_info.handle, O_WRONLY, "只写模式");
    if (fd != -1) {
        printf("    ✓ 只写打开成功\n");
        close(fd);
    }
    
    // 尝试写入只读打开的文件
    printf("  测试写入权限:\n");
    fd = open_file_by_handle_safe(file_info.handle, O_RDONLY, "只读模式测试写入");
    if (fd != -1) {
        const char *test_write = "测试写入";
        ssize_t write_result = write(fd, test_write, strlen(test_write));
        if (write_result == -1) {
            printf("    ✓ 正确拒绝写入操作: %s\n", strerror(errno));
        } else {
            printf("    ✗ 意外允许写入操作\n");
        }
        close(fd);
    }
    
    // 清理资源
    printf("\n8. 清理资源:\n");
    free(file_info.handle);
    unlink(renamed_name);
    printf("  ✓ 清理完成\n");
    
    printf("\n=== 文件句柄安全性和持久性总结 ===\n");
    printf("安全性优势:\n");
    printf("1. 路径无关: 不受符号链接攻击影响\n");
    printf("2. 权限控制: 仍然遵循文件系统权限\n");
    printf("3. 系统级: 由内核维护，无法伪造\n");
    printf("4. 访问控制: 可以通过打开标志控制访问权限\n");
    printf("\n");
    printf("持久性优势:\n");
    printf("1. 文件移动: 重命名后句柄仍然有效\n");
    printf("2. 目录重组: 目录结构调整不影响句柄\n");
    printf("3. 跨会话: 可以在不同进程间传递\n");
    printf("4. 稳定标识: 提供稳定的文件标识机制\n");
    printf("\n");
    printf("适用场景:\n");
    printf("1. 文件监控和审计\n");
    printf("2. 备份和同步系统\n");
    printf("3. 容器文件系统\n");
    printf("4. 网络文件传输\n");
    printf("5. 安全文件访问控制\n");
    
    return 0;
}

示例3：完整的文件句柄管理工具

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <errno.h>
#include <string.h>
#include <getopt.h>
#include <time.h>

// 配置结构体
struct handle_tool_config {
    char *filename;
    char *handle_file;
    int create_handle;
    int open_file;
    int show_info;
    int verbose;
    int flags;
    char *output_file;
};

// 保存文件句柄到文件
int save_handle_to_file(const struct file_handle *handle, int mount_id, const char *filename) {
    FILE *fp = fopen(filename, "wb");
    if (!fp) {
        perror("打开句柄文件失败");
        return -1;
    }
    
    // 写入挂载 ID
    if (fwrite(&mount_id, sizeof(mount_id), 1, fp) != 1) {
        perror("写入挂载 ID 失败");
        fclose(fp);
        return -1;
    }
    
    // 写入句柄大小
    if (fwrite(&handle->handle_bytes, sizeof(handle->handle_bytes), 1, fp) != 1) {
        perror("写入句柄大小失败");
        fclose(fp);
        return -1;
    }
    
    // 写入句柄类型
    if (fwrite(&handle->handle_type, sizeof(handle->handle_type), 1, fp) != 1) {
        perror("写入句柄类型失败");
        fclose(fp);
        return -1;
    }
    
    // 写入句柄数据
    if (fwrite(handle->f_handle, handle->handle_bytes, 1, fp) != 1) {
        perror("写入句柄数据失败");
        fclose(fp);
        return -1;
    }
    
    fclose(fp);
    printf("✓ 句柄已保存到: %s\n", filename);
    return 0;
}

// 从文件加载文件句柄
struct file_handle* load_handle_from_file(const char *filename, int *mount_id) {
    FILE *fp = fopen(filename, "rb");
    if (!fp) {
        perror("打开句柄文件失败");
        return NULL;
    }
    
    // 读取挂载 ID
    if (fread(mount_id, sizeof(*mount_id), 1, fp) != 1) {
        perror("读取挂载 ID 失败");
        fclose(fp);
        return NULL;
    }
    
    // 读取句柄大小
    unsigned int handle_bytes;
    if (fread(&handle_bytes, sizeof(handle_bytes), 1, fp) != 1) {
        perror("读取句柄大小失败");
        fclose(fp);
        return NULL;
    }
    
    // 分配句柄内存
    size_t handle_size = sizeof(struct file_handle) + handle_bytes;
    struct file_handle *handle = malloc(handle_size);
    if (!handle) {
        perror("内存分配失败");
        fclose(fp);
        return NULL;
    }
    
    handle->handle_bytes = handle_bytes;
    
    // 读取句柄类型
    if (fread(&handle->handle_type, sizeof(handle->handle_type), 1, fp) != 1) {
        perror("读取句柄类型失败");
        free(handle);
        fclose(fp);
        return NULL;
    }
    
    // 读取句柄数据
    if (fread(handle->f_handle, handle_bytes, 1, fp) != 1) {
        perror("读取句柄数据失败");
        free(handle);
        fclose(fp);
        return NULL;
    }
    
    fclose(fp);
    printf("✓ 从 %s 加载句柄成功\n", filename);
    return handle;
}

// 获取文件句柄
int get_file_handle_safe(const char *filename, struct file_handle **handle, int *mount_id) {
    size_t handle_size = sizeof(struct file_handle);
    *handle = malloc(handle_size);
    if (!*handle) {
        return -1;
    }
    
    (*handle)->handle_bytes = 0;
    
    int result = name_to_handle_at(AT_FDCWD, filename, *handle, mount_id, 0);
    if (result == -1 && errno == EOVERFLOW) {
        handle_size = sizeof(struct file_handle) + (*handle)->handle_bytes;
        free(*handle);
        
        *handle = malloc(handle_size);
        if (!*handle) {
            return -1;
        }
        
        result = name_to_handle_at(AT_FDCWD, filename, *handle, mount_id, 0);
    }
    
    return result;
}

// 通过句柄打开文件
int open_file_by_handle_safe(struct file_handle *handle, int flags, const char *description) {
    if (description) {
        printf("通过句柄打开文件: %s\n", description);
    }
    
    int fd = open_by_handle_at(AT_FDCWD, handle, flags);
    if (fd != -1) {
        if (description) {
            printf("  ✓ 成功打开文件 (fd: %d)\n", fd);
        }
        return fd;
    } else {
        if (description) {
            printf("  ✗ 打开文件失败: %s\n", strerror(errno));
        }
        return -1;
    }
}

// 显示句柄信息
void show_handle_info(const struct file_handle *handle, int mount_id) {
    printf("=== 文件句柄信息 ===\n");
    printf("挂载 ID: %d\n", mount_id);
    printf("句柄类型: %d\n", handle->handle_type);
    printf("句柄大小: %u 字节\n", handle->handle_bytes);
    
    printf("句柄数据 (十六进制): ");
    for (unsigned int i = 0; i < handle->handle_bytes && i < 64; i++) {
        printf("%02x", handle->f_handle&#91;i]);
    }
    if (handle->handle_bytes > 64) {
        printf("...(还有 %u 字节)", handle->handle_bytes - 64);
    }
    printf("\n");
}

// 复制文件内容
int copy_file_content(int src_fd, const char *output_filename) {
    int dst_fd = open(output_filename, O_CREAT | O_WRONLY | O_TRUNC, 0644);
    if (dst_fd == -1) {
        perror("创建输出文件失败");
        return -1;
    }
    
    char buffer&#91;4096];
    ssize_t bytes_read, bytes_written;
    off_t total_bytes = 0;
    
    while ((bytes_read = read(src_fd, buffer, sizeof(buffer))) > 0) {
        bytes_written = write(dst_fd, buffer, bytes_read);
        if (bytes_written != bytes_read) {
            perror("写入输出文件失败");
            close(dst_fd);
            return -1;
        }
        total_bytes += bytes_written;
    }
    
    if (bytes_read == -1) {
        perror("读取源文件失败");
        close(dst_fd);
        return -1;
    }
    
    close(dst_fd);
    printf("✓ 成功复制 %ld 字节到: %s\n", (long)total_bytes, output_filename);
    return 0;
}

// 显示帮助信息
void show_help(const char *program_name) {
    printf("用法: %s &#91;选项]\n", program_name);
    printf("\n选项:\n");
    printf("  -f, --file=FILE        源文件名\n");
    printf("  -h, --handle=FILE      句柄文件名\n");
    printf("  -c, --create           创建文件句柄\n");
    printf("  -o, --open             通过句柄打开文件\n");
    printf("  -i, --info             显示句柄信息\n");
    printf("  -r, --read-only        以只读方式打开\n");
    printf("  -w, --read-write       以读写方式打开\n");
    printf("  -O, --output=FILE      输出文件名（用于复制）\n");
    printf("  -v, --verbose          详细输出\n");
    printf("  --help                 显示此帮助信息\n");
    printf("\n示例:\n");
    printf("  %s -f /etc/passwd -c -h passwd.handle    # 创建句柄\n", program_name);
    printf("  %s -h passwd.handle -o -r                # 通过句柄只读打开\n", program_name);
    printf("  %s -h passwd.handle -o -w -O copy.txt    # 通过句柄复制文件\n", program_name);
    printf("  %s -h passwd.handle -i                   # 显示句柄信息\n", program_name);
}

int main(int argc, char *argv&#91;]) {
    struct handle_tool_config config = {
        .filename = NULL,
        .handle_file = NULL,
        .create_handle = 0,
        .open_file = 0,
        .show_info = 0,
        .verbose = 0,
        .flags = O_RDONLY,
        .output_file = NULL
    };
    
    printf("=== 文件句柄管理工具 ===\n\n");
    
    // 解析命令行参数
    static struct option long_options&#91;] = {
        {"file",      required_argument, 0, 'f'},
        {"handle",    required_argument, 0, 'h'},
        {"create",    no_argument,       0, 'c'},
        {"open",      no_argument,       0, 'o'},
        {"info",      no_argument,       0, 'i'},
        {"read-only", no_argument,       0, 'r'},
        {"read-write", no_argument,      0, 'w'},
        {"output",    required_argument, 0, 'O'},
        {"verbose",   no_argument,       0, 'v'},
        {"help",      no_argument,       0, 1000},
        {0, 0, 0, 0}
    };
    
    int opt;
    while ((opt = getopt_long(argc, argv, "f:h:coirwO:v", long_options, NULL)) != -1) {
        switch (opt) {
            case 'f':
                config.filename = optarg;
                break;
            case 'h':
                config.handle_file = optarg;
                break;
            case 'c':
                config.create_handle = 1;
                break;
            case 'o':
                config.open_file = 1;
                break;
            case 'i':
                config.show_info = 1;
                break;
            case 'r':
                config.flags = O_RDONLY;
                break;
            case 'w':
                config.flags = O_RDWR;
                break;
            case 'O':
                config.output_file = optarg;
                break;
            case 'v':
                config.verbose = 1;
                break;
            case 1000:  // --help
                show_help(argv&#91;0]);
                return 0;
            default:
                fprintf(stderr, "使用 '%s --help' 查看帮助信息\n", argv&#91;0]);
                return 1;
        }
    }
    
    // 验证参数
    if (!config.create_handle && !config.open_file && !config.show_info) {
        show_help(argv&#91;0]);
        return 0;
    }
    
    struct file_handle *handle = NULL;
    int mount_id;
    
    // 如果需要创建句柄
    if (config.create_handle && config.filename) {
        if (access(config.filename, F_OK) != 0) {
            fprintf(stderr, "文件不存在: %s\n", config.filename);
            return 1;
        }
        
        printf("创建文件句柄: %s\n", config.filename);
        
        if (get_file_handle_safe(config.filename, &handle, &mount_id) == 0) {
            printf("✓ 成功获取文件句柄\n");
            
            if (config.show_info) {
                show_handle_info(handle, mount_id);
            }
            
            if (config.handle_file) {
                if (save_handle_to_file(handle, mount_id, config.handle_file) == 0) {
                    printf("✓ 句柄保存成功\n");
                } else {
                    fprintf(stderr, "句柄保存失败\n");
                    free(handle);
                    return 1;
                }
            }
        } else {
            fprintf(stderr, "获取文件句柄失败: %s\n", strerror(errno));
            return 1;
        }
    }
    // 如果需要加载句柄
    else if (config.handle_file) {
        if (access(config.handle_file, F_OK) != 0) {
            fprintf(stderr, "句柄文件不存在: %s\n", config.handle_file);
            return 1;
        }
        
        handle = load_handle_from_file(config.handle_file, &mount_id);
        if (!handle) {
            return 1;
        }
        
        if (config.show_info) {
            show_handle_info(handle, mount_id);
        }
    } else {
        fprintf(stderr, "需要指定文件或句柄文件\n");
        show_help(argv&#91;0]);
        return 1;
    }
    
    // 通过句柄打开文件
    if (config.open_file && handle) {
        int fd = open_file_by_handle_safe(handle, config.flags, 
                                         config.filename ? config.filename : "加载的句柄");
        if (fd != -1) {
            printf("✓ 文件打开成功\n");
            
            // 获取文件信息
            struct stat st;
            if (fstat(fd, &st) == 0) {
                printf("文件信息:\n");
                printf("  大小: %ld 字节\n", (long)st.st_size);
                printf("  权限: %o\n", st.st_mode & 0777);
                printf("  修改时间: %s", ctime(&st.st_mtime));
            }
            
            // 如果指定了输出文件，复制内容
            if (config.output_file) {
                if (copy_file_content(fd, config.output_file) != 0) {
                    close(fd);
                    free(handle);
                    return 1;
                }
            }
            // 否则显示部分内容
            else if (config.flags & (O_RDONLY | O_RDWR)) {
                printf("文件内容预览:\n");
                char buffer&#91;512];
                ssize_t bytes_read = read(fd, buffer, sizeof(buffer) - 1);
                if (bytes_read > 0) {
                    buffer&#91;bytes_read] = '\0';
                    // 只显示前 200 个字符
                    if (strlen(buffer) > 200) {
                        buffer&#91;200] = '\0';
                        printf("%s...\n", buffer);
                    } else {
                        printf("%s\n", buffer);
                    }
                }
            }
            
            close(fd);
        } else {
            free(handle);
            return 1;
        }
    }
    
    // 清理资源
    if (handle) {
        free(handle);
    }
    
    printf("\n=== 文件句柄工具使用建议 ===\n");
    printf("适用场景:\n");
    printf("1. 文件监控和审计系统\n");
    printf("2. 备份和同步工具\n");
    printf("3. 容器和虚拟化环境\n");
    printf("4. 网络文件传输\n");
    printf("5. 安全文件访问控制\n");
    printf("\n");
    printf("安全建议:\n");
    printf("1. 妥善保管句柄文件\n");
    printf("2. 使用适当的文件权限\n");
    printf("3. 验证句柄的有效性\n");
    printf("4. 及时关闭文件描述符\n");
    printf("5. 处理句柄失效的情况\n");
    printf("\n");
    printf("性能优化:\n");
    printf("1. 批量处理多个文件\n");
    printf("2. 缓存常用文件句柄\n");
    printf("3. 异步操作大文件\n");
    printf("4. 合理设置缓冲区大小\n");
    
    return 0;
}

编译和运行说明

# 编译示例程序
gcc -o open_by_handle_at_example1 example1.c
gcc -o open_by_handle_at_example2 example2.c
gcc -o open_by_handle_at_example3 example3.c

# 运行示例
./open_by_handle_at_example1
./open_by_handle_at_example2
./open_by_handle_at_example3 --help

# 基本操作示例
./open_by_handle_at_example3 -f /etc/passwd -c -h passwd.handle
./open_by_handle_at_example3 -h passwd.handle -o -r
./open_by_handle_at_example3 -h passwd.handle -i
./open_by_handle_at_example3 -h passwd.handle -o -w -O passwd_copy.txt

系统要求检查

# 检查内核版本（需要 2.6.39+）
uname -r

# 检查文件系统支持
grep -i handle /boot/config-$(uname -r)

# 检查系统调用支持
grep -w open_by_handle_at /usr/include/asm/unistd_64.h

# 查看文件系统类型
df -T /etc/passwd

# 检查当前用户权限
id

重要注意事项

内核版本: 需要 Linux 2.6.39+ 内核支持

文件系统: 不是所有文件系统都支持文件句柄

权限要求: 需要对文件有适当访问权限

错误处理: 始终检查返回值和 errno

内存管理: 正确分配和释放句柄内存

文件描述符: 及时关闭打开的文件描述符

句柄失效: 处理文件删除导致的句柄失效

实际应用场景

文件监控: 监控特定文件的变更而不依赖路径

备份系统: 标识和跟踪备份文件

容器技术: 容器内文件系统管理

网络传输: 安全的文件标识和传输

审计系统: 文件访问审计和追踪

数据库系统: 文件标识和管理

最佳实践

// 安全的文件句柄打开函数
int safe_open_by_handle(struct file_handle *handle, int flags, const char *description) {
    // 验证参数
    if (!handle) {
        errno = EINVAL;
        return -1;
    }
    
    // 验证标志
    if (flags & (O_CREAT | O_EXCL | O_TRUNC)) {
        fprintf(stderr, "警告: 文件句柄打开不支持创建/截断标志\n");
        flags &= ~(O_CREAT | O_EXCL | O_TRUNC);
    }
    
    // 打开文件
    int fd = open_by_handle_at(AT_FDCWD, handle, flags);
    if (fd == -1) {
        switch (errno) {
            case EACCES:
                fprintf(stderr, "权限不足访问文件");
                if (description) fprintf(stderr, ": %s", description);
                fprintf(stderr, "\n");
                break;
            case ESTALE:
                fprintf(stderr, "文件句柄已失效");
                if (description) fprintf(stderr, ": %s", description);
                fprintf(stderr, "\n");
                break;
            case EOPNOTSUPP:
                fprintf(stderr, "文件系统不支持文件句柄");
                if (description) fprintf(stderr, ": %s", description);
                fprintf(stderr, "\n");
                break;
        }
    } else if (description) {
        printf("通过句柄成功打开文件: %s (fd: %d)\n", description, fd);
    }
    
    return fd;
}

// 句柄管理结构体
typedef struct {
    struct file_handle *handle;
    int mount_id;
    int fd;
    char *filename;
    time_t create_time;
} handle_manager_t;

// 初始化句柄管理器
int handle_manager_init(handle_manager_t *mgr, const char *filename) {
    mgr->filename = strdup(filename);
    if (!mgr->filename) {
        return -1;
    }
    
    mgr->create_time = time(NULL);
    mgr->fd = -1;
    mgr->handle = NULL;
    
    return get_file_handle_safe(filename, &mgr->handle, &mgr->mount_id);
}

// 通过句柄打开文件
int handle_manager_open(handle_manager_t *mgr, int flags) {
    if (mgr->fd != -1) {
        close(mgr->fd);
    }
    
    mgr->fd = safe_open_by_handle(mgr->handle, flags, mgr->filename);
    return mgr->fd;
}

// 清理句柄管理器
void handle_manager_cleanup(handle_manager_t *mgr) {
    if (mgr->fd != -1) {
        close(mgr->fd);
        mgr->fd = -1;
    }
    
    if (mgr->handle) {
        free(mgr->handle);
        mgr->handle = NULL;
    }
    
    if (mgr->filename) {
        free(mgr->filename);
        mgr->filename = NULL;
    }
}

这些示例展示了 open_by_handle_at 函数的各种使用方法，从基础的句柄打开到完整的管理工具，帮助你全面掌握 Linux 系统中通过文件句柄访问文件的机制。