lseek系统调用及示例

这次我们介绍 lseek 函数，它用于在已打开的文件中移动文件读写位置指针 (file offset)。

1. 函数介绍 Link to heading

lseek 是一个 Linux 系统调用，用于设置或获取与指定文件描述符 fd 相关联的文件偏移量 (file offset)。文件偏移量是一个非负整数，表示下一次 read 或 write 操作将在文件中的哪个位置开始。你可以把它想象成书签，标记了你当前在文件这本“书”中读到或写到的位置。

2. 函数原型 Link to heading

#include <sys/types.h> // 通常需要
#include <unistd.h>    // 必需

off_t lseek(int fd, off_t offset, int whence);

3. 功能 Link to heading

根据 whence 参数的值，lseek 将文件偏移量设置为以下位置之一：

从文件开始处计算 (SEEK_SET): 文件偏移量被设置为 offset 字节。
从当前文件偏移量计算 (SEEK_CUR): 文件偏移量被设置为当前偏移量 + offset 字节。
从文件末尾计算 (SEEK_END): 文件偏移量被设置为文件长度 + offset 字节。

lseek 本身不执行任何 I/O 操作，它只是修改内核中与该文件描述符相关联的偏移量值。下一次 read 或 write 调用将从这个新位置开始。

此外，lseek 也可以用来获取当前的文件偏移量，或者用来判断一个文件描述符是否指向的是一个可以寻址的文件（如普通文件、块设备）还是一个流式设备（如管道、套接字）。

4. 参数 Link to heading

int fd: 这是已打开文件的文件描述符。
off_t offset: 这是一个相对于 whence 指定点的偏移量（以字节为单位）。它可以是正数、负数或零。
int whence: 这个参数指定了 offset 的基准点。它必须是以下常量之一：
- SEEK_SET: 文件开始处 (偏移量为 0)。
- SEEK_CUR: 当前文件偏移量。
- SEEK_END: 文件末尾（偏移量等于文件大小）。

5. 返回值 Link to heading

成功时: 返回新的文件偏移量（从文件开始处算起的字节数）。
失败时: 返回 (off_t) -1，并设置全局变量 errno 来指示具体的错误原因（例如 EBADF fd 无效或不可寻址，EINVAL whence 无效或产生的偏移量负数等）。

重要提示:

对于管道 (pipe)、FIFO、套接字 (socket) 等流式设备，lseek 调用会失败，并将 errno 设置为 ESPIPE (Illegal seek)。
lseek 允许将文件偏移量设置到文件末尾之后的位置。在这种情况下，文件并未真正增大，但下一次写入操作会在该位置开始，文件中间的空洞部分通常会被读作 \0 (空字节)，并且不占用实际磁盘空间（直到被写入数据）。

6. 相似函数，或关联函数 Link to heading

read, write: 这两个函数的读写操作都基于当前的文件偏移量，并且在操作后会自动更新偏移量。
pread, pwrite: 这两个函数允许在一次调用中指定读写的位置，而不会改变文件的当前偏移量，它们内部结合了 lseek 和 read/write 的功能。
fstat: 可以用来获取文件的状态信息，包括文件大小，有时在使用 lseek 前后会用它来确定文件边界。

7. 示例代码 Link to heading

示例 1：获取和设置文件偏移量 Link to heading

这个例子演示了如何使用 lseek 获取当前偏移量，移动到文件开头，以及移动到文件末尾。

#include <sys/types.h> // off_t
#include <unistd.h>    // lseek, read, write, close
#include <fcntl.h>     // open
#include <stdio.h>     // perror, printf
#include <stdlib.h>    // exit
#include <string.h>    // strlen

#define BUFFER_SIZE 128

int main() {
    int fd;
    off_t current_pos, file_size;
    char buffer[BUFFER_SIZE];
    ssize_t bytes_read;

    // 打开一个文件进行读写 (假设 "example_seek.txt" 存在且有内容)
    fd = open("example_seek.txt", O_RDWR);
    if (fd == -1) {
        perror("Error opening file");
        exit(EXIT_FAILURE);
    }

    // 1. 获取当前文件偏移量 (初始时通常是 0)
    current_pos = lseek(fd, 0, SEEK_CUR);
    if (current_pos == -1) {
        perror("Error getting current position");
        close(fd);
        exit(EXIT_FAILURE);
    }
    printf("Initial file offset: %ld\n", (long)current_pos);

    // 2. 读取一些数据，偏移量会自动前进
    bytes_read = read(fd, buffer, 10); // 读取10个字节
    if (bytes_read > 0) {
        buffer[bytes_read] = '\0'; // 确保字符串结束
        printf("Read %zd bytes: '%s'\n", bytes_read, buffer);
    } else if (bytes_read == -1) {
        perror("Error reading file");
        close(fd);
        exit(EXIT_FAILURE);
    }

    // 3. 再次获取当前文件偏移量 (应该是 10)
    current_pos = lseek(fd, 0, SEEK_CUR);
    if (current_pos == -1) {
        perror("Error getting current position after read");
        close(fd);
        exit(EXIT_FAILURE);
    }
    printf("File offset after reading 10 bytes: %ld\n", (long)current_pos);


    // 4. 移动到文件末尾，并获取文件大小
    file_size = lseek(fd, 0, SEEK_END);
    if (file_size == -1) {
        perror("Error seeking to end of file");
        close(fd);
        exit(EXIT_FAILURE);
    }
    printf("File size is: %ld bytes\n", (long)file_size);

    // 5. 移动到文件开头
    if (lseek(fd, 0, SEEK_SET) == -1) {
        perror("Error seeking to beginning of file");
        close(fd);
        exit(EXIT_FAILURE);
    }
    printf("Moved file offset back to the beginning.\n");

    // 6. 可以再次读取或写入...

    if (close(fd) == -1) {
        perror("Error closing file");
        exit(EXIT_FAILURE);
    }

    printf("File closed.\n");
    return 0;
}

代码解释:

以读写模式打开文件。
使用 lseek(fd, 0, SEEK_CUR) 获取当前偏移量（初始为 0）。
读取 10 个字节，文件偏移量自动变为 10。
再次使用 lseek(fd, 0, SEEK_CUR) 确认偏移量已更新。
使用 lseek(fd, 0, SEEK_END) 将偏移量移到文件末尾，其返回值即为文件大小。
使用 lseek(fd, 0, SEEK_SET) 将偏移量重置到文件开头。

示例 2：在文件末尾追加数据（不使用 O_APPEND） Link to heading

虽然 open 的 O_APPEND 标志很方便，但 lseek 提供了更灵活的控制。这个例子演示如何手动移动到文件末尾来实现追加。

#include <sys/types.h> // off_t
#include <unistd.h>    // lseek, write, close
#include <fcntl.h>     // open
#include <stdio.h>     // perror, printf
#include <stdlib.h>    // exit
#include <string.h>    // strlen

int main() {
    int fd;
    off_t file_end_pos;
    const char *append_data = "Appended using lseek!\n";
    ssize_t bytes_written;

    // 以读写模式打开文件 (不使用 O_APPEND)
    fd = open("append_example.txt", O_RDWR | O_CREAT, 0644);
    if (fd == -1) {
        perror("Error opening/creating file");
        exit(EXIT_FAILURE);
    }

    printf("File 'append_example.txt' opened with fd: %d\n", fd);

    // 将文件偏移量移动到文件末尾
    file_end_pos = lseek(fd, 0, SEEK_END);
    if (file_end_pos == -1) {
        perror("Error seeking to end of file");
        close(fd);
        exit(EXIT_FAILURE);
    }
    printf("Current file size (before append): %ld bytes\n", (long)file_end_pos);

    // 在文件末尾写入新数据
    bytes_written = write(fd, append_data, strlen(append_data));
    if (bytes_written == -1) {
        perror("Error writing to file");
        close(fd);
        exit(EXIT_FAILURE);
    } else if ((size_t)bytes_written != strlen(append_data)) {
         fprintf(stderr, "Warning: Incomplete write. Wrote %zd of %zu bytes.\n",
                 bytes_written, strlen(append_data));
    } else {
        printf("Successfully appended %zd bytes to the file.\n", bytes_written);
    }

    // 可选：再次检查文件大小
    file_end_pos = lseek(fd, 0, SEEK_END);
    if (file_end_pos != -1) {
        printf("New file size (after append): %ld bytes\n", (long)file_end_pos);
    }


    if (close(fd) == -1) {
        perror("Error closing file");
        exit(EXIT_FAILURE);
    }

    printf("File closed.\n");
    return 0;
}

代码解释:

以读写模式打开（或创建）文件，不使用 O_APPEND。
调用 lseek(fd, 0, SEEK_END) 将文件偏移量精确地定位在文件的末尾。
调用 write 将新数据写入，数据自然就被追加到了文件末尾。
最后再次 lseek 到末尾来确认文件的新大小。

示例 3：判断文件描述符是否可寻址 Link to heading

lseek 可以用来检测一个文件描述符是否指向的是可以随机访问的设备（如普通文件）还是流式设备（如管道、套接字）。

#include <sys/types.h> // off_t
#include <unistd.h>    // lseek, pipe, close
#include <stdio.h>     // perror, printf
#include <stdlib.h>    // exit

// 函数：检查 fd 是否可寻址
int is_seekable(int fd) {
    off_t pos;
    pos = lseek(fd, 0, SEEK_CUR); // 尝试获取当前偏移量
    if (pos == -1) {
        // 如果失败，检查 errno
        if (errno == ESPIPE) {
            return 0; // ESPIPE 表示 Illegal seek，即不可寻址
        } else {
            // 其他错误，可能文件描述符无效等
            perror("lseek error when checking seekability");
            return -1; // 发生了其他错误
        }
    }
    return 1; // 成功获取偏移量，说明是可寻址的
}


int main() {
    int fd_file, fd_pipe[2];
    int result;

    // 打开一个普通文件
    fd_file = open("/etc/passwd", O_RDONLY);
    if (fd_file == -1) {
        perror("Error opening /etc/passwd");
        exit(EXIT_FAILURE);
    }

    // 创建一个管道
    if (pipe(fd_pipe) == -1) {
        perror("Error creating pipe");
        close(fd_file);
        exit(EXIT_FAILURE);
    }

    // 检查文件描述符
    result = is_seekable(fd_file);
    if (result == 1) {
        printf("fd %d (file) is seekable.\n", fd_file);
    } else if (result == 0) {
        printf("fd %d (file) is NOT seekable.\n", fd_file);
    } else {
        printf("Error checking seekability of fd %d (file).\n", fd_file);
    }

    result = is_seekable(fd_pipe[0]); // 检查管道读端
    if (result == 1) {
        printf("fd %d (pipe read end) is seekable.\n", fd_pipe[0]);
    } else if (result == 0) {
        printf("fd %d (pipe read end) is NOT seekable.\n", fd_pipe[0]);
    } else {
        printf("Error checking seekability of fd %d (pipe read end).\n", fd_pipe[0]);
    }

    result = is_seekable(fd_pipe[1]); // 检查管道写端
    if (result == 1) {
        printf("fd %d (pipe write end) is seekable.\n", fd_pipe[1]);
    } else if (result == 0) {
        printf("fd %d (pipe write end) is NOT seekable.\n", fd_pipe[1]);
    } else {
        printf("Error checking seekability of fd %d (pipe write end).\n", fd_pipe[1]);
    }

    // 清理
    close(fd_file);
    close(fd_pipe[0]);
    close(fd_pipe[1]);

    return 0;
}

代码解释:

定义了一个 is_seekable 函数，它尝试对给定的 fd 执行 lseek(fd, 0, SEEK_CUR)。
如果 lseek 返回 -1 并且 errno 是 ESPIPE，则说明该 fd 不可寻址（如管道、套接字）。
如果 lseek 成功返回当前偏移量，则说明该 fd 是可寻址的（如普通文件）。
在 main 函数中，分别对一个普通文件和管道的两端调用 is_seekable 函数进行测试。

通过这些例子，你应该能够理解 lseek 如何用于控制文件访问位置，以及它在判断文件类型方面的作用。