使用 Prometheus 构建可观察的 Ent 应用

2021年8月12日 · 阅读需 12 分钟

可观测性是一个系统对其内部状态能够被外部测量的质量。
当计算机程序演变为成熟的生产系统时，这种质量变得越来越重要。
让软件系统更可观测的一种方式是导出指标，即以某种可外部可见的方式报告运行系统状态的定量描述。例如，暴露一个 HTTP 端点，以便我们可以看到自进程启动以来发生了多少错误。本文将探讨如何使用 Prometheus 构建更可观测的 Ent 应用。

What is Ent?

Ent 是一个简洁但功能强大的 Go 语言实体框架，可轻松构建和维护具有大型数据模型的应用。

What is Prometheus?

Prometheus 是由 SoundCloud 工程师于 2012 年开发的开源监控系统。它内嵌时序数据库，并与多种第三方系统集成。Prometheus 客户端通过一个 HTTP 端点（通常为 /metrics）公开进程的指标，Prometheus scrapers 定期（通常 30 秒一次）轮询该端点并将数据写入时序数据库。

Prometheus 只是 metric 收集后端的一类示例。还有许多其他的后端，例如 AWS CloudWatch、InfluxDB 等，在业界得到广泛使用。本文后期将讨论如何在任何此类后端上实现统一、基于标准的集成。

Working with Prometheus

要使用 Prometheus 暴露应用指标，需要创建一个 Prometheus Collector，Collector 从服务器收集一组指标。

在本例中，我们将使用两种可以存储在 Collector 中的指标类型：计数器（Counters）和直方图（Histograms）。计数器是单调递增的累积指标，表示某个事件已发生的次数，常用于计数服务器处理的请求数或发生的错误数。直方图将观测值划分为可配置大小的桶，常用于表示延迟分布（例如，多少请求在 5 ms、10 ms、100 ms、1 s 等之内完成）。另外，Prometheus 允许按标签拆分指标。这在按端点名称拆分请求计数器时尤其有用。

下面演示如何使用官方 Go 客户端创建这样的 Collector。为此，我们使用客户端中的 promauto 包，简化创建 Collector 的过程。以下是一个计数器 Collector 的简单示例（用于统计总请求数或请求错误数）：

package example

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    // 动态标签列表
    labelNames = []string{"endpoint", "error_code"}

    // 创建计数器 Collector
    exampleCollector = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "endpoint_errors",
            Help: "Number of errors in endpoints",
        },
        labelNames,
    )
)

// 使用时设置动态标签的值，然后递增计数器
func incrementError() {
    exampleCollector.WithLabelValues("/create-user", "400").Inc()
}

Ent Hooks

Hooks 是 Ent 的一个特性，允许在更改数据实体的操作前后添加自定义逻辑。

变更（mutation）是修改数据库中某些内容的操作。
有 5 种变更类型：

Create
UpdateOne
Update
DeleteOne
Delete

Hooks 是接受 ent.Mutator 并返回 mutator 的函数。它们的工作方式类似于流行的 HTTP 中间件模式。

package example

import (
    "context"

    "entgo.io/ent"
)

func exampleHook() ent.Hook {
    // 用于初始化 Hook
    return func(next ent.Mutator) ent.Mutator {
        return ent.MutateFunc(func(ctx context.Context, m ent.Mutation) (ent.Value, error) {
            // 变更前执行的逻辑
            v, err := next.Mutate(ctx, m)
            if err != nil {
                // 变更后出现错误时的逻辑
            }
            // 变更后执行的逻辑
            return v, err
        })
    }
}

在 Ent 中，变更 Hook 有两种类型——schema Hook 和 runtime Hook。
schema Hook 主要用于对特定实体类型定义自定义变更逻辑，例如将实体创建同步到其他系统。
runtime Hook 则用于定义更全局的逻辑，例如日志、指标、追踪等。

对于我们的使用场景，绝对应该使用 runtime Hook，因为要真正有价值，需要在所有实体类型、所有操作上导出指标：

package example

import (
    "entprom/ent"
    "entprom/ent/hook"
)

func main() {
    client, _ := ent.Open("sqlite3", "file:ent?mode=memory&cache=shared&_fk=1")

    // 仅在用户变更上添加 Hook
    client.User.Use(exampleHook())

    // 仅在更新操作上添加 Hook
    client.Use(hook.On(exampleHook(), ent.OpUpdate|ent.OpUpdateOne))
}

Exporting Prometheus Metrics for an Ent Application

所有前置知识已完成，下面直接演示如何将 Prometheus 与 Ent Hook 结合使用，创建可观测的应用。此示例的目标是使用 Hook 导出以下指标：

指标名称	描述
ent_operation_total	ent 变更操作的数量
ent_operation_error	失败的 ent 变更操作的数量
ent_operation_duration_seconds	每个操作的耗时（秒）

每个指标都会按标签拆分为两个维度：

mutation_type：正在变更的实体类型（User、BlogPost、Account 等）。
mutation_op：执行的操作（Create、Delete 等）。

让我们先定义 Collector：

// Ent 动态维度
const (
    mutationType = "mutation_type"
    mutationOp   = "mutation_op"
)

var entLabels = []string{mutationType, mutationOp}

// 创建总操作计数器 Collector
func initOpsProcessedTotal() *prometheus.CounterVec {
    return promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "ent_operation_total",
            Help: "Number of ent mutation operations",
        },
        entLabels,
    )
}

// 创建错误计数器 Collector
func initOpsProcessedError() *prometheus.CounterVec {
    return promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "ent_operation_error",
            Help: "Number of failed ent mutation operations",
        },
        entLabels,
    )
}

// 创建耗时直方图 Collector
func initOpsDuration() *prometheus.HistogramVec {
    return promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "ent_operation_duration_seconds",
            Help: "Time in seconds per operation",
        },
        entLabels,
    )
}

接下来定义新的 Hook：

// Hook 初始化 Collector，开始时计数总数，变更错误时计数错误，结束时记录耗时
func Hook() ent.Hook {
    opsProcessedTotal := initOpsProcessedTotal()
    opsProcessedError := initOpsProcessedError()
    opsDuration := initOpsDuration()
    return func(next ent.Mutator) ent.Mutator {
        return ent.MutateFunc(func(ctx context.Context, m ent.Mutation) (ent.Value, error) {
            // 变更前开始计时
            start := time.Now()
            // 从变更中提取动态标签
            labels := prometheus.Labels{mutationType: m.Type(), mutationOp: m.Op().String()}
            // 递增总操作计数
            opsProcessedTotal.With(labels).Inc()
            // 执行变更
            v, err := next.Mutate(ctx, m)
            if err != nil {
                // 出错时递增错误计数
                opsProcessedError.With(labels).Inc()
            }
            // 停止计时
            duration := time.Since(start)
            // 记录耗时（秒）
            opsDuration.With(labels).Observe(duration.Seconds())
            return v, err
        })
    }
}

Connecting the Prometheus Collector to our Service

定义 Hook 后，下面演示如何将其连接到应用并使用 Prometheus 提供一个暴露指标的端点：

package main

import (
    "context"
    "log"
    "net/http"

    "entprom"
    "entprom/ent"

    _ "github.com/mattn/go-sqlite3"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func createClient() *ent.Client {
    c, err := ent.Open("sqlite3", "file:ent?mode=memory&cache=shared&_fk=1")
    if err != nil {
        log.Fatalf("failed opening connection to sqlite: %v", err)
    }
    ctx := context.Background()
    // 运行自动迁移工具
    if err := c.Schema.Create(ctx); err != nil {
        log.Fatalf("failed creating schema resources: %v", err)
    }
    return c
}

func handler(client *ent.Client) func(w http.ResponseWriter, r *http.Request) {
    return func(w http.ResponseWriter, r *http.Request) {
        ctx := context.Background()
        // 运行操作
        _, err := client.User.Create().SetName("a8m").Save(ctx)
        if err != nil {
            http.Error(w, err.Error(), http.StatusBadRequest)
            return
        }
    }
}

func main() {
    // 创建 Ent client 并迁移
    client := createClient()
    // 使用 Hook
    client.Use(entprom.Hook())
    // 简单处理器，执行 DB 操作
    http.HandleFunc("/", handler(client))
    // 该端点向 Prometheus 发送指标以供收集
    http.Handle("/metrics", promhttp.Handler())
    log.Println("server starting on port 8080")
    // 运行服务器
    log.Fatal(http.ListenAndServe(":8080", nil))
}

访问 /（使用 curl 或浏览器）若干次后，访问 /metrics，你会看到 Prometheus 客户端的输出：

# HELP ent_operation_duration_seconds Time in seconds per operation
# TYPE ent_operation_duration_seconds histogram
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="0.005"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="0.01"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="0.025"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="0.05"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="0.1"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="0.25"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="0.5"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="1"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="2.5"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="5"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="10"} 2
ent_operation_duration_seconds_bucket{mutation_op="OpCreate",mutation_type="User",le="+Inf"} 2
ent_operation_duration_seconds_sum{mutation_op="OpCreate",mutation_type="User"} 0.000265669
ent_operation_duration_seconds_count{mutation_op="OpCreate",mutation_type="User"} 2
# HELP ent_operation_error Number of failed ent mutation operations
# TYPE ent_operation_error counter
ent_operation_error{mutation_op="OpCreate",mutation_type="User"} 1
# HELP ent_operation_total Number of ent mutation operations
# TYPE ent_operation_total counter
ent_operation_total{mutation_op="OpCreate",mutation_type="User"} 2

在上部可见直方图的计算结果，展示了每个“桶”中操作的数量。随后可见总操作数和错误数。每个指标后面都有描述，可在 Prometheus 仪表盘中查询时查看。

Prometheus 客户端只是 Prometheus 架构中的一部分。若要运行完整系统，包括会轮询端点的 scraper、存储指标并可回答查询的 Prometheus，以及简易 UI 与之交互，建议阅读官方文档或使用此示例仓库中的 docker-compose.yaml。

Future Work on Observability in Ent

正如前面提到的，当前已有大量 metric 收集后端可用，Prometheus 只是其中一个成功项目。虽然这些解决方案在自托管 vs SaaS、不同存储引擎与查询语言等多方面存在差异——从指标报告客户端的角度来看，它们几乎是一样的。

在这种情况下，良好的软件工程原则建议通过接口将具体后端从客户端抽象出来。后端随后实现此接口，客户端即可轻松切换不同实现。近年来，这类变革在行业中已普及。以 Open Container Initiative 或 Service Mesh Interface 为例，它们都致力于为问题空间定义标准接口。该接口旨在形成标准实现生态系统。

在可观测性领域，同样的趋势正在发生：OpenCensus 与 OpenTracing 正在合并为 OpenTelemetry。

尽管发布类似本文所示的 Ent + Prometheus 扩展听起来很诱人，我们坚信可观测性应该通过基于标准的方法来解决。我们邀请各位加入讨论，探讨 Ent 的合适实现方式。

Wrap-Up

本文从介绍 Prometheus——流行的开源监控解决方案开始。随后回顾了 Ent 的 Hook 特性，展示了如何将两者结合，构建可观测的应用。最后讨论了 Ent 可观测性的未来，并邀请大家参与讨论，一同塑造其发展。

有任何问题？需要起步帮助？随时加入我们的 Discord 服务器或 Slack 频道。

备注

更多 Ent 新闻与更新：

订阅我们的新闻通讯
在 Twitter 上关注我们
加入 Gophers Slack 的 #ent 频道
加入我们的 Ent Discord 服务器

What is Ent?​

What is Prometheus?​

Working with Prometheus​

Ent Hooks​

Exporting Prometheus Metrics for an Ent Application​

Connecting the Prometheus Collector to our Service​

Future Work on Observability in Ent​

Wrap-Up​