Rust 生命周期基础

Rust 除了所有权机制外，另一大特色就是生命周期，虽然每个编程语言必须处理生命周期，但是基本都不用程序员关心。Rust 中生命周期则非常重要，几乎和所有权机制一样，不理解生命周期就无法愉快的编写 Rust 代码。

let 语句会绑定一个作用域

生命周期的概念离不开作用域，所谓的生命周期就是指变量能存活多久，更具体的讲，就是变量能在哪些作用域中生效。

rust 中没一行 let 绑定语句，都会隐式的开辟一个作用域：

rust

let x = 0;
let y = &x;
let z = &y;

如果去掉语法糖，原始的中间码大概如下：

rust

// NOTE: 这里不是真正的 Rust 代码，而是去掉语法糖的中间伪代码，无法编译执行。
'a: {
    let x: i32 = 0;
    'b: {
        // lifetime used is 'b because that's good enough.
        let y: &'b i32 = &'b x;
        'c: {
            // ditto on 'c
            let z: &'c &'b i32 = &'c y; // "a reference to a reference to an i32" (with lifetimes annotated)
        }
    }
}

一个{} 表示一个作用域，每次 let 绑定都会嵌套一个作用域。’a，’b, ‘c 分别是作用域标签，事实上也是一种生命周期类型；没错 Rust 中生命周期也是一种类型。这个下一节再详细讨论。

对于函数，rust 也会隐式划定作用域：

rust

fn as_str(data: &u32) -> &str {
    let s = format!("{}", data);
    &s // 编译报错❌
}

注意这个代码是无法编译通过的，但是为了学习，我们需要从生命周期的底层来研究它为什么编译不过，报错信息如下：

rust

--> src/main.rs:1:11
  |
1 | fn as_str(data: &u32) -> &str {
  |           ^^^^ help: if this is intentional, prefix it with an underscore: `_data`
  |
  = note: `#[warn(unused_variables)]` on by default

error[E0515]: cannot return reference to local variable `s`
 --> src/main.rs:3:5
  |
3 |     &s
  |     ^^ returns a reference to data owned by the current function

很明显，这里讲一个 local 变量的引用作为返回值 return 出去；local 变量 s 在离开作用域(当前函数体{})后就会出栈，它执行的堆空间也会被释放。所以 &str 将变成一个悬垂指针被返回出去。Rust 在编译阶段就避免了这个问题。

我们去掉语法糖，看看它的生命周期代码：

rust

fn as_str<'a>(data: &'a u32) -> &'a str {
    'b: {
        let s = format!("{}", data);
        return &'a s;
    }
}

Rust 首先会隐式的给函数参数和返回值都加上生命周期参数 ‘a ，然后{} 内也会对应一个生命周期(或者叫作用域标签)’b 。然后在’b 中需要返回 ‘a s ；由于 ’b 和 ‘a 没有明显的包含关系。所以它们的生命周期没有任何关联，不能返回。

要想简单的修正代码，可以把返回值改为 String :

rust

fn as_str(data: &u32) -> String {
    format!("{}", data)
}

返回改为 String，表示返回了一个带有所有权的 String，而不是引用。这里我们不展开太多，还是继续聚焦在生命周期上。

生命周期缺省规则

理论上，每个 Rust 定义的地方都需要明确指定生命周期，Rust 编译器需要生命周期的信息来做编译检查。

但在部分场景下 Rust 编译器会做自动推断，来隐式指定生命周期；如果编译器无法推断，或者编译器推断的不准确(编译期为了保证安全，推断会非常保守)的情况下。就需要手动介入进行生命周期的指定。

编译器推断生命周期的规则如下：

每个引用参数都有自己的生命周期参数
如果入参的参数只有一个，Rust 默认会将这个入参的生命周期应用到所有的输出参数上 (上边的示例代码就是)。
如果有多个入参，但是其中一个是 &self &mut self ，那么&self &mut self 的生命周期会被应用到所有的输出参数上。

通过上述规则，Rust 可以自动推断出一部分生命周期，从而让开发者少写很多样板代码。

示例：

rust

fn print(s: &str);                                      // elided
fn print<'a>(s: &'a str);                               // expanded

fn debug(lvl: usize, s: &str);                          // elided
fn debug<'a>(lvl: usize, s: &'a str);                   // expanded

fn substr(s: &str, until: usize) -> &str;               // elided
fn substr<'a>(s: &'a str, until: usize) -> &'a str;     // expanded

fn get_str() -> &str;                                   // ILLEGAL

fn frob(s: &str, t: &str) -> &str;                      // ILLEGAL

fn get_mut(&mut self) -> &mut T;                        // elided
fn get_mut<'a>(&'a mut self) -> &'a mut T;              // expanded

fn args<T: ToCStr>(&mut self, args: &[T]) -> &mut Command                  // elided
fn args<'a, 'b, T: ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command // expanded

fn new(buf: &mut [u8]) -> BufWriter;                    // elided
fn new(buf: &mut [u8]) -> BufWriter<'_>;                // elided (with `rust_2018_idioms`)
fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a>          // expanded

一些场景下，我们不需要显式的写生命周期参数；但如果编译器无法推断出生命周期，即上述缺省规则无法识别的时候，Rust 会要求我们手动写生命周期参数。

rust

fn s1_or_s2(s1: &str, s2: &str) -> &str { // 编译报错❌
    if s1.len() > s2.len() {
        s1
    } else {
        s2
    }
}

这个在其他语言中看似很常见的代码，在 Rust 下会编译报错：

rust

--> src/main.rs:2:36
  |
2 | fn s1_or_s2(s1: &str, s2: &str) -> &str {
  |                 ----      ----     ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `s1` or `s2`
help: consider introducing a named lifetime parameter
  |
2 | fn s1_or_s2<'a>(s1: &'a str, s2: &'a str) -> &'a str {
  |            ++++      ++           ++          ++

这个其实才是 Rust 的魅力所在，它会强迫你在写代码的时候，思考很多以前没考虑过的细节；报错信息说的很明确，返回值 &str 无法指定生命周期。原因是返回值 &str 可能来自入参 s1，也可能来自入参 s2，这个要在代码运行的时候才能知道；但如果 s1 或者 s2 这两个入参生命周期不够长；那么返回值 &str 就会变成悬垂指针，来看代码：

rust

fn s1_or_s2(s1: &str, s2: &str) -> &str {
    if s1.len() > s2.len() {
        s1
    } else {
        s2
    }
}

fn main() {
    let ret;
    {
        // 某处代码作用域
        let s1 = String::from("hello");
        let s2 = String::from("world");
        ret = s1_or_s2(&s1, &s2);
        // s1, s2 即将超出作用域，堆内存 hello 和 world 都将被释放
    }
    ret; // ret 变为悬垂指针
}

这就是一个潜在的内存错误代码示例，实际代码中，{} 的作用域可能被嵌套的更隐蔽。

Rust 是一门严格保证内存安全的语言，所以不允许出现这种情况；具体措施就是编译器检查，但是对于这种场景，编译器无法获取更多信息来检查作用域，所以报错：要求开发者显式声明作用域参数，来辅助编译器检查错误：

rust

fn s1_or_s2<'a>(s1: &'a str, s2: &'a str) -> &'a str {
    if s1.len() > s2.len() {
        s1
    } else {
        s2
    }
}

这时，我们手动给代码加上 ‘a 生命周期生命；’a 事实上是一种泛型参数，表示某种生命周期类型。入参 s1 s2 和返回值都声明为 ‘a 类型的生命周期；表示入参 s1 s2 的生命周期至少要活的和返回值一样长，甚至更长。这里有一个数学概念自反性：

rust

a <= a; // 对于任意实数恒成立

所以，虽然都声明为 ‘a ；但要表达的意思是：入参生命周期 ≥ 返回值生命周期。因为只有这样才能保证返回值不会变为悬垂指针。修改代码后，编译报错就正常了：

rust

fn s1_or_s2<'a>(s1: &'a str, s2: &'a str) -> &'a str {
    if s1.len() > s2.len() {
        s1
    } else {
        s2
    }
}

fn main() {
    let ret;
    {
        // 某处代码作用域
        let s1 = String::from("hello");
        let s2 = String::from("world");
        ret = s1_or_s2(&s1, &s2); // 编译报错❌

        // s1, s2 即将超出作用域，堆内存 hello 和 world 都将被释放
    }
    ret; // ret 变为悬垂指针
}

报错信息：

rust

error[E0597]: `s1` does not live long enough
  --> src/main.rs:16:24
   |
14 |         let s1 = String::from("hello");
   |             -- binding `s1` declared here
15 |         let s2 = String::from("world");
16 |         ret = s1_or_s2(&s1, &s2);
   |                        ^^^ borrowed value does not live long enough
...
19 |     }
   |     - `s1` dropped here while still borrowed
20 |     ret; // ret 变为悬垂指针
   |     --- borrow later used here

error[E0597]: `s2` does not live long enough
  --> src/main.rs:16:29
   |
15 |         let s2 = String::from("world");
   |             -- binding `s2` declared here
16 |         ret = s1_or_s2(&s1, &s2);
   |                             ^^^ borrowed value does not live long enough
...
19 |     }
   |     - `s2` dropped here while still borrowed
20 |     ret; // ret 变为悬垂指针
   |     --- borrow later used here

关键错误信息：

s1 does not live long enough
s2 does not live long enough

前边提到，Rust 中生命周期也是一种类型，既然是类型；为了更好的理解生命周期，是时候进一步了解类型系统的相关概念；这个在下一篇具体展开：结合Rust生命周期类型说说协变/逆变。