博远课题笔记之——命令行参数解析

1. DISCLAIMER
2. Introduction
3. What the hell is Option ?
4. So, how to use it?
- 4.1. Using getopt(...) to parse options
  - 4.1.1. Options, but with parameters
- 4.2. Using getopt_long(...)
5. The Rabbit Hole ?

1. DISCLAIMER

免责声明: 本文内容仅为我的 个人理解, 不保证完全正确, 需要更精确的信息来源请自己 RTFM

2. Introduction

事情的起因是, fake-terminal 这个项目需要有解析用户输入命令的功能, 例如对于用户输入 ls -alR <dir> , 我们的交互式界面需要能把它解析成这样:

ls -a -l -R <dir>

然后再将参数与输出格式对应并打印出信息

听上去是一个不困难的工作, 但是在面对复杂的参数时, 需要解析的内容也变得更多, 且更复杂, ~~甚至让我滋生了 "用 Racket 搓个 DSL 出来" 这样的混邪想法~~

当然, 这样是不好的, 因为这是一个 C Project, 那我也应该 do it in the UNIX way.

所以我去 RTFSC 了，并在 coreutils 的源码中发现了一个奇怪的程序—— getopt

3. What the hell is Option ?

顾名思义, getopt 代表 get options 的意思, 所谓的 option 就是指令中 - 或 -- (这种形式被称为 long-option) 后面接的字符串, 每一个参数代表一个不同的功能, 比如:

ls -alR <dir>

-a: 显示隐藏文件
-l: 打印详细信息
-R: 递归打印

代表着执行 ls 程序, 使用 l,a,R 这三个 options , 并给予它们一个参数 <dir> (没错, 你可以给 option 传参)

我们可以用更严谨的方式来定义 POSIX-Shell 中的 options

option 以 - 开头, 若 - 的数量为一, 则为 short-option 反之则为 long-option
如果有多个 short-option , 则可以连在一起, 即 -a -b -c == -abc
每个 option 都可以接收参数
long-option 可以与 short-option 等价 (例如 --help 与 -h)

getopt() 帮助开发者在开发命令行工具时, 不需要手写 parser 来解析出 option , 而是让你预先设定好程序支持的 options , 它来帮你做匹配并返回结果

4. So, how to use it?

遇事不决读 [Manual]…

man -k getopt
man 3 getopt

[手册]中告诉我们, getopt() 在 unistd.h 与 getopt.h 这两个头文件中, 定义如下:

#include <unistd.h>

int getopt(int argc, char *const argv[],
            const char *optstring);

extern char *optarg;
extern int optind, opterr, optopt;

#include <getopt.h>

int getopt_long(int argc, char *const argv[],
            const char *optstring,
            const struct option *longopts, int *longindex);
int getopt_long_only(int argc, char *const argv[],
            const char *optstring,
            const struct option *longopts, int *longindex);

这三个函数的使用场景如表格所示:

函数名	使用场景
getopt(…)	仅需要解析 `short-option`
getopt_long(…)	同时解析 `short-option` 与 `long-option`
getopt_long_only(…)	仅需要解析 `long-option`

不难得出, getopt_long() 在这三者中泛用性最强, 但在学习它的使用之前, 我们先从比较简单的 getopt(...) 入手

4.1. Using `getopt(...)` to parse options

int getopt(int argc, char *const argv[],
            const char *optstring);

getopt(...) 接受三个参数:

argc : 程序主函数接受到的 args 数量
argv[] : 程序接受到的 argument vector
optstring : 一个包含了所有正确 options 的字串, getopt 使用它做解析

(注: 关于 argc 与 argv[], 这不在本文的解释范围内, 请参阅 ISO C 规范或 Glibc 手册的第 25.1 节 Program Arguments)

需要注意的是, 调用 getopt(...) 时只会解析一次, 所以需要通过循环重复解析, 直到 getopt(...) 的返回值为 -1, 此时才解析完毕

我们可以通过一段伪代码来查看 getopt(..) 的使用方式:

#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    ...//DO SOMETHING

    while (1)
    {
          int opt = getopt(argc, argv, "abc"); // -a -b -c are accepted
          if (opt == -1)
          {
              break; // done parsing
          }
          printf("%c\n", (char)opt); // print the parsed option, one by one
    }

    ... //DO SOMETHING

    return 0;
}

假设用户输入了一个不在 optstring 中的 option, getopt(...) 会返回 ?

此时如果我们希望获得用户输入的无效 option , 可以参考 [Manual] 中列出的外部变量:

extern char *optarg;
extern int optind, opterr, optopt;

变量	作用
*optarg	当前 option 的参数
optind	当下 argv 的 index
opterr	默认为 0 值, 非零时代表无效的 option
optopt	存放用户输入的无效 option

所以只需要访问 optopt 就行了

4.1.1. Options, but with parameters

先前的定义中, 我们提到过 option 可以接收参数, 那么我们该怎么让 optget(...) 来处理参数呢?

通过阅读 [Manual] 中的示例, 发现其实只需要微调 optstring 就可以了:

: 代表该 option 一定需要参数
:: 代表该 option 参数可选

"a:b::c"
- -a 必须需要参数
- -b 可以选择性传参
- -c 不需要参数

如果需要获取参数可以之间访问上文提到的 *optarg 变量

4.2. Using `getopt_long(...)`

int getopt_long(int argc, char *const argv[],
              const char *optstring,
              const struct option *longopts, int *longindex);

前三个参数和之前的一样, 后面的突然就看不懂了, 于是继续读 [Manual]…

longindex

长选项在 longopts 中的 index

struct option

查阅 [手册] 可得:

longopts is a pointer to the first element of an array of struct option
declared in <getopt.h> as

    struct option {
           const char *name;
           int         has_arg;
           int        *flag;
           int         val;
    };

The meanings of the different fields are:

name   is the name of the long option.

has_arg
          is:  no_argument (or 0) if the option does not take an argument;
          required_argument (or 1) if the option requires an argument;  or
          optional_argument  (or  2) if the option takes an optional argu‐
          ment.

flag   specifies how results are returned for a long option.   If  flag
          is  NULL,  then  getopt_long()  returns  val.  (For example, the
          calling program may set val to the equivalent short option char‐
          acter.)   Otherwise, getopt_long() returns 0, and flag points to
          a variable which is set to val if the option is found, but  left
          unchanged if the option is not found.

val    is  the value to return, or to load into the variable pointed to
          by flag.

The last element of the array has to be filled with zeros.

If longindex is not NULL, it points to a variable which is set  to  the
index of the long option relative to longopts.

所以对于 option 结构体：

*name: option 的名字
has_args: option 是否含有参数 (0->无参数; 1->有参数; 2->可选参数)
*flag: 需要改变的 flag
val: flag 的改变值 / 函数返回值

这么说还是很抽象, 比如 flag 是干什么的, 莫名其妙就出现了

这里给一个 option 结构体的使用例 (伪代码):

int flag;

struct option foo_options[] =
{
    {"a", 0, &flag, 1},
    {"b", 1, &flag, 0},
    {"c", 0, NULL, 'c'},
    {0, 0, 0, 0} // null terminator, like '\0'
};

在这个示例中, foo_options[2] 的 name 被设置为 c, 返回值被设置为 'c' , 于是开发者可以用 switch 语句来处理返回值, 执行 --c 所对应的语句了

对于 --a, --b 这两个 option , 它们都返回 0 , 但会修改 flag 的值, 然后开发者就可以用 flag 来做其它的判断

5. The Rabbit Hole ?

目前, 我们已经可以成功让命令行程序解析程序的 options 了, 但是, getopt 可能并非很多情况下的最优解

Glibc 的手册中提到了一个不属于 POSIX Shell 规范中的程序 argp , 它提供了比 getopt 更多的功能与更友好的界面

美中不足的是, 它的可移植性没有 getopt 好, 需要手动安装依赖

(另请参阅: Glibc 手册 25.3 节 Parsing Program Options with Argp)