在 NixOS 25.11 上运行 Niri

从 GDM 到 systemd 依赖地狱

使用 Niri 已有一个月,感到极大满足。

24/8/25 chat

正好碰上放假,所以是时候该试试 NixOS1 了。

表现 #

  1. 登入 Niri 后有几率无法拉起 waybar / mako 等组件,提示 condition unmet 或 连不到 wl display
crash log 例
 1May 01 08:13:25 qemu systemd[2030]: Starting Lightweight Wayland notification daemon...
 2May 01 08:13:25 qemu systemd[2030]: Started polkit-mate-authentication-agent-1.service.
 3May 01 08:13:25 qemu systemd[2030]: Started swaybg.service.
 4May 01 08:13:25 qemu systemd[2030]: Started swayidle.service.
 5May 01 08:13:25 qemu systemd[2030]: Started Highly customizable Wayland bar for Sway and Wlroots based compositors..
 6May 01 08:13:25 qemu swayidle[146979]: 2026-05-01 08:13:25 - [Line 1086] Unable to connect to the compositor. If your compositor is running, check or set the WAYLAND_DISPLAY environment variable.
 7May 01 08:13:25 qemu swaybg[146974]: 2026-05-01 08:13:25 - [main.c:616] Unable to connect to the compositor. If your compositor is running, check or set the WAYLAND_DISPLAY environment variable.
 8May 01 08:13:25 qemu systemd[2030]: swaybg.service: Main process exited, code=exited, status=1/FAILURE
 9May 01 08:13:25 qemu systemd[2030]: swaybg.service: Failed with result 'exit-code'.
10May 01 08:13:25 qemu systemd[2030]: swayidle.service: Main process exited, code=exited, status=253/n/a
11May 01 08:13:25 qemu systemd[2030]: swayidle.service: Failed with result 'exit-code'.
12May 01 08:13:25 qemu mako[146986]: failed to create display
13May 01 08:13:25 qemu systemd[2030]: Started Lightweight Wayland notification daemon.
14May 01 08:13:25 qemu systemd[2030]: mako.service: Main process exited, code=exited, status=1/FAILURE
15May 01 08:13:25 qemu systemd[2030]: mako.service: Failed with result 'exit-code'.
16May 01 08:13:25 qemu polkit-mate-authentication-agent-1[146972]: cannot open display: 
17May 01 08:13:25 qemu systemd[2030]: polkit-mate-authentication-agent-1.service: Main process exited, code=exited, status=1/FAILURE
18May 01 08:13:25 qemu systemd[2030]: polkit-mate-authentication-agent-1.service: Failed with result 'exit-code'.
19May 01 08:13:25 qemu waybar[146980]: cannot open display: 
20May 01 08:13:25 qemu systemd[2030]: waybar.service: Main process exited, code=exited, status=1/FAILURE
21May 01 08:13:25 qemu systemd[2030]: waybar.service: Failed with result 'exit-code'.
22May 01 08:13:25 qemu xdg-desktop-portal-gnome[147020]: Failed to open service channel Wayland connection, portals dialogs may missbehave (Cannot invoke method; proxy is for the well-known name org.gnome.Mutter.ServiceChannel without an owner, and proxy was constructed with the G_DBUS_PROXY_FLAGS_DO_NOT_AUTO_START flag).
23May 01 08:13:25 qemu xdg-desktop-portal-gnome[147020]: Non-compatible display server, exposing settings only.
24May 01 08:13:25 qemu xdg-desktop-portal-gtk[147042]: cannot open display: 
25May 01 08:13:25 qemu systemd[2030]: xdg-desktop-portal-gtk.service: Main process exited, code=exited, status=1/FAILURE
26May 01 08:13:25 qemu systemd[2030]: xdg-desktop-portal-gtk.service: Failed with result 'exit-code'.
27May 01 08:13:25 qemu systemd[2030]: Failed to start Portal service (GTK/GNOME implementation).
  1. 登出 Niri 第二次尝试登录时 gdm 提示 Authentication error,界面无法操作,只能重启 display-manager.service 解决

GDM Authentication Error

TL;DR #

  • 等待 NixOS 推送 GNOME 50 (nixpkgs#501286)
  • 写 systemd 服务时带上脑子

Phase I: almost timing #

P1 直觉上是送分题,也许是启动时存在某种 race,在 reach graphical-session.target 前加个延迟即可。

同时翻到了 NixOS 对 x11 session 下 DM 的 hack,写了一个 wrapper 用于拉起上述 target,可以通过设置

1{
2  systemd.services.display-manager.environment = {
3    XDG_CURRENT_DESKTOP = "X-NIXOS-SYSTEMD-AWARE";
4  };
5}

绕过。显然 wayland 窗管并不吃这套,所以暂且不知 nixpkgs#493701 是何意味。

最终类似 niri#3177,增加妙妙检测如下:

 1{
 2  # idea from uwsm's wayland-session-waitenv.service
 3  systemd.user.services.niri-waits-for-wldisp-env = {
 4    after = [ "niri.service" ];
 5    bindsTo = [ "niri.service" ];
 6    partOf = [ "niri.service" ];
 7    before = [ "graphical-session.target" ];
 8    serviceConfig = {
 9      Type = "oneshot";
10      RemainAfterExit = true;
11      ExecStart = pkgs.writeShellScript "niri-waits-for-wldisp-env" ''
12
13        for i in {1..5}; do
14          if ${pkgs.systemd}/bin/systemctl --user show-environment | grep -iq niri; then
15            niri_found=1; break
16          fi
17          sleep 0.2
18        done
19
20        if [ -z "$niri_found" ]; then
21          echo "niri not running" >&2
22          exit 0
23        fi
24
25        for i in {1..10}; do
26          ${pkgs.systemd}/bin/systemctl --user show-environment | grep -q '^WAYLAND_DISPLAY=' && exit 0
27          sleep 0.5
28        done
29
30        echo "WAYLAND_DISPLAY not imported after 5s" >&2
31        exit 1
32      '';
33    };
34    wantedBy = [ "niri.service" ];
35  };
36}

重启 DM 后可以稳定拉起相关组件。看起来问题得到了解决,但此时下定论为时过早。

Phase II: GDM #

来到 P2,首先检视 Niri 的 ExecMain{Code, Status} 返回值,似乎一切正常。

GNOME 49 有一项行为变更 (gdm !289),会话不再通过 dbus-run-session 启动。当然,niri-session 已经正确处理 dbus 连接,不过仍然值得一试:

 1{
 2  systemd.user.services.niri = {
 3    serviceConfig = {
 4      # restores gdm!289
 5      ExecStart = [
 6        ""
 7        "${pkgs.dbus}/bin/dbus-run-session --dbus-daemon=${pkgs.dbus}/bin/dbus-daemon -- ${pkgs.niri}/bin/niri --session"
 8      ];
 9    };
10  };
11}

不幸地,猜谜阶段到此结束。启用 GDM debug,观察日志输出,注意到:

1qemu systemd[1]: Started Session c19 of User gdm-greeter.
2qemu gdm[29190]: Gdm: GdmDisplay: Session never registered, failing
3qemu gdm[29190]: Gdm: Child process -29228 was already dead.
4qemu gdm[29190]: Gdm: GdmDisplay: Session never registered, failing
5qemu gdm[29190]: Gdm: Child process -29228 was already dead.

当然上述片段是成文时去 journal 里扒拉来的。不过正是其中 session never registered 警告引起了笔者的注意,来自 gdm-display.c#L701,对应 session.registered$此处检查。

走到这里基本就有数了。所谓人之初性本善,守序中立的做法自然是对照 50.x 把对应 行为 backport 下来,至少需要 gdm !285 !350;初步试了下是能够注册上,但还有别的问题。

起码尽力了,不得行就只能勉为其难按照一开始想好的答案来了(笑)

 1diff --git a/daemon/gdm-local-display-factory.c b/daemon/gdm-local-display-factory.c
 2index ad2e65cf7..11e0cea90 100644
 3--- a/daemon/gdm-local-display-factory.c
 4+++ b/daemon/gdm-local-display-factory.c
 5@@ -621,11 +621,7 @@ on_display_status_changed (GdmDisplay             *display,
 6                 break;
 7         case GDM_DISPLAY_MANAGED:
 8 #if defined(ENABLE_USER_DISPLAY_SERVER)
 9-                g_signal_connect_object (display,
10-                                         "notify::session-registered",
11-                                         G_CALLBACK (on_session_registered_cb),
12-                                         factory,
13-                                         0);
14+                finish_waiting_displays_on_seat (factory, "seat0");
15 #endif
16                 break;
17         case GDM_DISPLAY_WAITING_TO_FINISH:
18-- 
192.45.4

值得一提的是 nixpkgs 打补丁还算方便,增加 overlays 即可,如:

 1{
 2  # ..snip..
 3
 4  outputs = inputs@{ nixpkgs, ... }:
 5  let
 6    overlays = [(import ./pkgs {})];
 7  in {
 8    nixosConfigurations = {
 9      qemu = nixpkgs.lib.nixosSystem {
10        modules = [
11          { nixpkgs.overlays = overlays; }
12        ];
13      };
14    };
15  };
16}

对应 ./pkgs/default.nix 如下:

 1# ..snip..
 2
 3final: prev:
 4{
 5  gdm = prev.gdm.overrideAttrs (old: {
 6    patches = (old.patches or [ ]) ++ [
 7      ./gdm-49.2/always-registered.patch
 8    ];
 9  });
10}

Phase III: s-d dependency hell #

P2 得到了解决,但是不难发现还有惊喜:退出重进是正常了,但是 P1 重现,依然会有一半概率桌面组件拉不起来。

观察发现 Niri 退出后 graphical-session.target 仍然 active,因此再次进入桌面时我们的妙妙脚本无法起到延迟效果。Niri 的登出通过触发 niri-shutdown.target 进行,与前者 conflict,因此没道理出现这种情况。

为了更好观察退出行为,尝试直接停止 graphical-session.target,即

 1{
 2  systemd.user.services.niri = {
 3      serviceConfig = {
 4      ExecStop = [
 5        ""
 6        "/run/current-system/systemd/bin/systemctl --user stop graphical-session.target"
 7      ];
 8    };
 9  };
10  systemd.user.targets.niri-shutdown = {
11    conflicts = lib.mkForce [ "niri.service" ];
12  };
13}

行为上没有区别。那就故技重施,上检测:

 1{
 2  systemd.user.services.niri-waits-for-graceful-shutdown = {
 3    serviceConfig = {
 4      Type = "oneshot";
 5      ExecStart = pkgs.writeShellScript "niri-waits-for-graceful-shutdown" ''
 6
 7        for i in {1..20}; do
 8          systemctl --user is-active --quiet graphical-session.target || exit 0
 9          sleep 0.5
10        done
11
12        echo "graphical-session.target does not shutdown after 10s" >&2
13        exit 1
14      '';
15    };
16  };
17  # systemd.user.services.niri = {
18  #    serviceConfig = {
19  #     ExecStop = [
20  #       ""
21  #       "/run/current-system/systemd/bin/systemctl --user stop graphical-session.target"
22  #     ];
23  #   };
24  # };
25  systemd.user.targets.niri-shutdown = {
26    # conflicts = lib.mkForce [ "niri.service" ];
27    after = lib.mkForce [ "niri-waits-for-graceful-shutdown.service" ];
28    wants = lib.mkForce [ "niri-waits-for-graceful-shutdown.service" ];
29  };
30}

似乎也没有明显的延迟触发。

跟踪日志输出,注意到三者的停止顺序符合预期,但 ~1s 后 graphical-session.target 又重新启动,经过 ~5s 退出。非常明显,有服务给它拉起来了!

> systemctl --user list-dependencies --reverse graphical-session.target
graphical-session.target
● ├─niri.service
● ├─polkit-mate-authentication-agent-1.service
○ ├─swaybg.service
× ├─swayidle.service
● └─xdg-desktop-portal-gnome.service

swaybg 和 swayidle 的服务配置都是从 Niri wiki: Example systemd Setup 上抄来的,删除 Requisite= 后不再出现在 dependents 列表中,问题终于彻底解决。

此外,GNOME Shell 下发现有部分组件尝试启动,Wants/WantedBy 应该为窗管的服务,例:

 1@@ -23,7 +23,10 @@ lib.mkMerge [
 2     # niri
 3     programs.waybar = {
 4       enable = true;
 5-      systemd.enable = true;
 6+      systemd = {
 7+        enable = true;
 8+        targets = lib.mkForce [ "niri.service" ];
 9+      };
10     };
11     home.packages = with pkgs; [ fuzzel swaybg ];
12
13@@ -49,7 +52,7 @@ lib.mkMerge [
14         Restart = "on-failure";
15       };
16       Install = {
17-        WantedBy = [ "graphical-session.target" ];
18+        WantedBy = [ "niri.service" ];
19       };
20     };
21   }

  1. 聊天较为随意,提请读者注意区分 Nix DSL、nixpkgs 与 NixOS;图中指的是 NixOS ↩︎