Practicing Pragmatism

(Re)installing JunOS on an EX-3200

TL;DR: An earlier version of the software was erroneously released (15.1R{3,5}) and later recalled.. I bet the same is true for the version I attempt to install here (15.1R7.9).

Yeah, I have some aging network equipment that's entirely overkill for my home network. The switch is an EX3200-48T which means it has plenty of ports and a little bit of power (over ethernet :)). I use this as my top-of-rack switch.. which is basically a mish-mash of WAN, LAN, and IoT traffic all switched and routed there (partially).

I found that it was giving some odd error messages while configuring the switch. I turned first to see what software versions I had: they were old. And the existing configuration? It was from me, but it was outdated & from another set up where the noise wasn't an issue.

Seeing as I was going to wipe and start clean slate with config, I figured I may as well also give it a fresh installed copy of JunOS. On blogs and, most importantly, Juniper's website the steps outlined included a “hit space when the loader tells you to” step. I was slightly confused by this as the loader immediately loads a “kernel” and continues on its way. So where the heck do you hit space? And where was this prompt at?

Usually, I'm pretty patient for these things, but this one got me pretty well:

The boot was loading some kernel, but that kernel is the loader that's referenced by docs. I've copied the below from a boot where I managed to hit space during the loader's execution which allowed me to run the install command (specifically install --format file:///$jinstall_pkg_tgz).

In other boots, this console was chock full of output and swiftly moved into starting the system proper. I suppose my eyes need another exam. Or that I should have used more time to examine with my eyes – probably that one.

The loader loading the loader where hitting space does something:

U-Boot 1.1.6 (Mar 28 2011 - 04:05:40)

Board: EX3200-48T 4.14
EPLD:  Version 10.0 (0x82)
DRAM:  Initializing (512 MB)
FLASH: 8 MB

Firmware Version: --- 01.00.00 ---
USB:   scanning bus for devices... 3 USB Device(s) found
       scanning bus for storage devices... 2 Storage Device(s) found

ELF file is 32 bit
Consoles: U-Boot console  

FreeBSD/PowerPC U-Boot bootstrap loader, Revision 2.4
(builder@dagmath.juniper.net, Mon Mar 28 01:49:54 UTC 2011)
Memory: 512MB
bootsequencing is enabled
bootsuccess is set
new boot device = disk0s2:
Loading /boot/defaults/loader.conf 
/kernel data=0xb71e7c+0xd12b0 syms=[0x4+0x9dce0+0x4+0xec41d]

# [there's a significant gap in time here]

Hit [Enter] to boot immediately, or space bar for command prompt.

Type '?' for a list of commands, 'help' for more detailed help.
loader>

Here's the outcome for those interested: install --format file:///$jinstall_pkg_tgz

loader> install --format file:///jinstall-ex-3200-15.1R7.9-domestic-signed.tgz
Package /jinstall-ex-3200-15.1R7.9-domestic-signed.tgz is signed...
/kernel data=0x69b300+0x93624 syms=[0x4+0x67960+0x4+0xa051b]
Kernel entry at 0x800000c0 ...
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1996-2018, Juniper Networks, Inc.
All rights reserved.
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
JUNOS 15.1R7.9 #0: 2018-09-11 05:30:26 UTC
    builder@watatsumi.juniper.net:/volume/build/junos/15.1/release/15.1R7.9/obj/powerpc/junos/bsd/kernels/INSTALL-EX/kernel
can't re-use a leaf (all_slot_serialid)!
Timecounter "decrementer" frequency 37500000 Hz quality 0
cpu0: Freescale e500v2 core revision 2.3
cpu0: HID0 80004080<EMCP,TBEN,EN_MAS7_UPDATE>
real memory  = 490733568 (468 MB)
avail memory = 479338496 (457 MB)
ETHERNET SOCKET BRIDGE initialising
Initializing EXSERIES properties ...
nexus0: <Powerpc Nexus device>
ocpbus0: <on-chip peripheral bus> on nexus0
openpic0: <OpenPIC in on-chip peripheral bus> iomem 0xfef40000-0xfef600b3 on ocpbus0
memctl0: <mpc85xx memory ECC monitor> iomem 0xfef20000-0xfef20e5b,0xfef02000-0xfef02e5b irq 32,34 on ocpbus0
ECC not enabled to report errors 0xc3000000
device_attach: memctl0 attach returned 6
i2c0: <MPC85XX OnChip i2c Controller> iomem 0xfef03000-0xfef03014 irq 59 on ocpbus0
i2c1: <MPC85XX OnChip i2c Controller> iomem 0xfef03100-0xfef03114 irq 59 on ocpbus0
lbc0: <Freescale Local Bus Controller> iomem 0xfef05000-0xfef05fff,0xff000000-0xffffffff irq 22 on ocpbus0
cfi0: <AMD/Fujitsu - 8MB> iomem 0xff800000-0xffffffff on lbc0
syspld0 iomem 0xff000000-0xff00ffff on lbc0
uart0: <16550 or compatible> iomem 0xfef04500-0xfef0450f irq 58 on ocpbus0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> iomem 0xfef04600-0xfef0460f irq 58 on ocpbus0
tsec0: <eTSEC ethernet controller> iomem 0xfef24000-0xfef24fff irq 45,46,50 on ocpbus0
tsec0: hardware MAC address 40:b4:f0:63:78:ff
miibus0: <MII bus> on tsec0
e1000phy0: <Marvell 88E1112 Gigabit PHY> on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto
pcib0: <Freescale MPC8544 PCI host controller> iomem 0xfef08000-0xfef08fff,0xf0000000-0xf3ffffff on ocpbus0
pci0: <PCI bus> on pcib0
pci0: <serial bus, USB> at device 18.0 (no driver attached)
ehci0: <Philips ISP156x USB 2.0 controller> mem 0xf0001000-0xf00010ff irq 22 at device 18.2 on pci0
usb0: EHCI version 1.0
usb0: <Philips ISP156x USB 2.0 controller> on ehci0
usb0: USB revision 2.0
uhub0: Philips EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
umass0: STMicroelectronics ST72682  High Speed Mode, rev 2.00/2.10, addr 2
umass1: CHIPSBNK v3.3.8.8, rev 2.00/1.00, addr 3
pcib1: <Freescale MPC8544 PCI Express host controller> iomem 0xfef0a000-0xfef0afff,0xe0000000-0xe3ffffff,0xec000000-0xec0fffff irq 42 on ocpbus0
pci1: <PCI bus> on pcib1
pcib2: <PCI-PCI bridge> at device 0.0 on pci1
pci2: <PCI bus> on pcib2
pci2: <memory> at device 0.0 (no driver attached)
pcib3: <Freescale MPC8544 PCI Express host controller> iomem 0xfef0b000-0xfef0bfff,0xe8000000-0xebffffff,0xec200000-0xec2fffff irq 43 on ocpbus0
pci3: <PCI bus> on pcib3
pcib4: <PCI-PCI bridge> at device 0.0 on pci3
pci4: <PCI bus> on pcib4
pci4: <memory> at device 0.0 (no driver attached)
Initializing product: 38 ..
###PCB Group initialized for udppcbgroup
###PCB Group initialized for tcppcbgroup
md0: Preloaded image </isofs-install-ex> 22267904 bytes at 0x808367a8
da1 at umass-sim1 bus 1 target 0 lun 0
da1: <CHIPSBNK v3.3.8.8 5.00> Removable Direct Access SCSI-2 device 
da1: 40.000MB/s transfers
da1: 3920MB (8028160 512 byte sectors: 255H 63S/T 499C)
da0 at umass-sim0 bus 0 target 0 lun 0
da0: <ST ST72682 2.10> Removable Direct Access SCSI-2 device 
da0: 40.000MB/s transfers
da0: 1000MB (2048000 512 byte sectors: 64H 32S/T 1000C)
Kernel thread "wkupdaemon" (pid 43) exited prematurely.
Trying to mount root from cd9660:/dev/md0
[: -eq: unexpected operator
1+0 records in
1+0 records out
512 bytes transferred in 0.000285 secs (1795555 bytes/sec)
Media check on da0 on ex platforms
Formatting installation disk...
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 269.710252 secs (3887787 bytes/sec)
Computing slice and partition sizes for /dev/da0 ...
Formatting target media /dev/da0 ...
Preparing to create slices on /dev/da0
/dev/da0: 2048000 sectors [C:1000 H:64 S:32 SS:512]
Shrinking slice 1 by 256 blocks for alignment
1+0 records in
1+0 records out
512 bytes transferred in 0.000290 secs (1766023 bytes/sec)
Creating slices:
g c1000 h64 s32
p 1    0xA5 256 382720
p 2    0xA5 382976 382976
p 3    0xA5 765952 1024000
p 4    0xA5 1789952 258048
a 1
******* Working on device /dev/da0 *******
fdisk: invalid fdisk partition table found
Computing layout of partitions in /dev/da0s1...
Shrinking partition a by 1792 blocks for alignment
Labeling /dev/da0s1:
bsdlabel: write to disk label supressed - label was as follows:
# /dev/da0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:   380672      256    unused        0     0       
  c:   382720        0    unused        0     0         # "raw" part, don't edit
/dev/da0s1a: 185.9MB (380668 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 46.47MB, 2974 blks, 6016 inodes.
        with soft updates
super-block backups (for fsck -b #) at:
 32, 95200, 190368, 285536
Computing layout of partitions in /dev/da0s2...
Labeling /dev/da0s2:
bsdlabel: write to disk label supressed - label was as follows:
# /dev/da0s2:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:   382720      256    unused        0     0       
  c:   382976        0    unused        0     0         # "raw" part, don't edit
/dev/da0s2a: 186.9MB (382716 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 46.72MB, 2990 blks, 6016 inodes.
        with soft updates
super-block backups (for fsck -b #) at:
 32, 95712, 191392, 287072
Computing layout of partitions in /dev/da0s3...
Shrinking partition d by 256 blocks for alignment
Labeling /dev/da0s3:
bsdlabel: write to disk label supressed - label was as follows:
# /dev/da0s3:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  c:  1024000        0    unused        0     0         # "raw" part, don't edit
  d:   767744      256    unused        0     0       
  e:   256000   768000    unused        0     0       
/dev/da0s3d: 374.9MB (767740 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 93.72MB, 5998 blks, 12032 inodes.
        with soft updates
super-block backups (for fsck -b #) at:
 32, 191968, 383904, 575840
/dev/da0s3e: 125.0MB (255996 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 31.25MB, 2000 blks, 4096 inodes.
        with soft updates
super-block backups (for fsck -b #) at:
 32, 64032, 128032, 192032
Computing layout of partitions in /dev/da0s4...
Shrinking partition d by 256 blocks for alignment
Labeling /dev/da0s4:
bsdlabel: write to disk label supressed - label was as follows:
# /dev/da0s4:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  c:   258048        0    unused        0     0         # "raw" part, don't edit
  d:   128768      256    unused        0     0       
/dev/da0s4d: 62.9MB (128764 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 15.72MB, 1006 blks, 2048 inodes.
        with soft updates
super-block backups (for fsck -b #) at:
 32, 32224, 64416, 96608
[: -eq: unexpected operator
Installing disk1:/jinstall-ex-3200-15.1R7.9-domestic-signed.tgz
/dev/da0s3d: 374.9MB (767740 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 93.72MB, 5998 blks, 12032 inodes.
        with soft updates
super-block backups (for fsck -b #) at:
 32, 191968, 383904, 575840
Verified jinstall-ex-3200-15.1R7.9-domestic.tgz signed by PackageProductionRSA_2018 method RSA2048+SHA1
date: connect: Can't assign requested address
Checking package integrity...
Running requirements check first for jbundle-ex-3200-15.1R7.9-...
Running pre-install for jbundle-ex-3200-15.1R7.9-...
Installing jbundle-ex-3200-15.1R7.9- in /tmp/installer.tmp/pa4731.35/jbundle-ex-3200-15.1R7.9-domestic.x4731...
Running post-install for jbundle-ex-3200-15.1R7.9-...
Verified SHA1 checksum of fips-mode-powerpc-15.1R7.9.tgz
Verified SHA1 checksum of jboot-ex-15.1R7.9.tgz
Verified SHA1 checksum of jdocs-ex-15.1R7.9.tgz
Verified SHA1 checksum of junos-ex-15.1R7.9.tgz
Verified SHA1 checksum of junos-ex-3200-15.1R7.9.tgz
Verified SHA1 checksum of jweb-ex-15.1R7.9.tgz
Adding fips-mode-powerpc-15.1R7.9.tgz...
Adding jdocs-ex-15.1R7.9.tgz...
Adding junos-ex-3200-15.1R7.9.tgz..

And, guess what, my problem persists.

/usr/libexec/ld-elf.so.1: Shared object "libcmd-parser.so.1" not found, required by "jdhcpd"

Wanna know why?

This technical bulletin explains why: https://kb.juniper.net/InfoCenter/index?id=TSB17033&page=content

TL;DR: An earlier version of the software was erroneously released and later recalled.. I bet the same is true for the version I attempted to install (15.1R7.9).

Sigh.

Bash and CDPATH

Fun fact, having CDPATH exported will cause it to also emit any fuzzy matched to stdout.

I spent entirely too long digging into a Makefile where it had something clever like:

check-gofmt:
    $(eval HAS_DELTA := $(shell cd src && gofmt -l $(PKGS))
    if [[ -n "$HAS_DELTA" ]]; then echo "reformat your code please:\n$HAS_DELTA"; exit 1; fi

Turns out that the path src didn't exist, but because I had mistakenly exported a CDPATH variable in my .bashrc (which has since been corrected):

export CDPATH="some/path:another/path/with/a/src/sub-directory"

There was a matching sub directory path that Make's $(shell cd src && ...) matched and emitted which made that clever if [[ -n "$HAS_DELTA" ]] really upset. When it failed, the message printed was the fuzzy-matched path itself.. which wasn't helpful in trying to diagnose things.

How did I diagnose this? Tracing!

# Turn on shell tracing to see where the output is printed:
SHELL = /usr/bin/bash -x

check-gofmt:
    # ...

Another fun Emacs bug: dap-mode

Failed to load breakpoints for the current workspace with error: (error "Hash table data is not a list of even length") [209 times]

Emacs debugger frames

Debugger entered--Lisp error: (error "Hash table data is not a list of even length")
  read-from-string("#s(hash-table size 65 test equal rehash-size 1.5 r...")
  dap--read-from-file("/home/jake/.local/doom/etc/.dap-b...")
  dap--after-initialize()
  dap-mode(1)
  dap-auto-configure-mode(1)
  lsp-configure-buffer()
  #f(compiled-function (buf) #<bytecode 0x232e511>)(#<buffer options.go>)
  mapc(#f(compiled-function (buf) #<bytecode 0x232e511>) (#<buffer options.go> #<buffer options_test.go>))
  lsp--on-request(#s(lsp--workspace :ewoc nil :server-capabilities #<hash-table equal 21/21 0xabb17bd> :registered-server-capabilities (#s(lsp--registered-capability :id "workspace/didChangeWatchedFiles-630" :method "workspace/didChangeWatchedFiles" :options #<hash-table equal 1/1 0x1bc7c05>) #s(lsp--registered-capability :id "workspace/didChangeWatchedFiles-629" :method "workspace/didChangeWatchedFiles" :options #<hash-table equal 1/1 0xa76b4dd>)) :root "/home/jake/w/ft/src/FT" :client #s(lsp--client :language-id "go" :add-on? nil :new-connection (:connect #f(compiled-function (filter sentinel name environment-fn) #<bytecode 0x11cb031>) :test\? #f(compiled-function () #<bytecode 0x11caecd>)) :ignore-regexps nil :ignore-messages nil :notification-handlers #<hash-table equal 0/65 0x3a85529> :request-handlers #<hash-table equal 0/65 0x3a854a1> :response-handlers #<hash-table eql 0/65 0x3a853c1> :prefix-function nil :uri-handlers #<hash-table equal 0/65 0x3a85339> :action-handlers #<hash-table equal 0/65 0x3a852b1> :major-modes (go-mode go-dot-mod-mode) :activation-fn nil :priority 0 :server-id gopls :multi-root nil :initialization-options nil :custom-capabilities nil :library-folders-fn lsp-go--library-default-directories :before-file-open-fn nil :initialized-fn nil :remote? nil :completion-in-comments? t :path->uri-fn nil :uri->path-fn nil :environment-fn nil :after-open-fn #f(compiled-function () #<bytecode 0x11c3aa1>) :async-request-handlers #<hash-table equal 0/65 0x3a85229> :download-server-fn nil :download-in-progress? nil :buffers nil) :host-root nil :proc #<process gopls> :cmd-proc #<process gopls> :buffers (#<buffer options.go> #<buffer options_test.go>) :semantic-tokens-faces nil :semantic-tokens-modifier-faces nil :extra-client-capabilities nil :status initialized :metadata #<hash-table equal 0/65 0x4a26ef9> :watches #<hash-table equal 0/65 0x4a27021> :workspace-folders nil :last-id 0 :status-string nil :shutdown-action nil :diagnostics #<hash-table equal 4/65 0x4a27425> :work-done-tokens #<hash-table equal 0/65 0x4a277c5>) #<hash-table equal 4/4 0x266334d>)
  lsp--parser-on-message(#<hash-table equal 4/4 0x266334d> #s(lsp--workspace :ewoc nil :server-capabilities #<hash-table equal 21/21 0xabb17bd> :registered-server-capabilities (#s(lsp--registered-capability :id "workspace/didChangeWatchedFiles-630" :method "workspace/didChangeWatchedFiles" :options #<hash-table equal 1/1 0x1bc7c05>) #s(lsp--registered-capability :id "workspace/didChangeWatchedFiles-629" :method "workspace/didChangeWatchedFiles" :options #<hash-table equal 1/1 0xa76b4dd>)) :root "/home/jake/w/ft/src/FT" :client #s(lsp--client :language-id "go" :add-on? nil :new-connection (:connect #f(compiled-function (filter sentinel name environment-fn) #<bytecode 0x11cb031>) :test\? #f(compiled-function () #<bytecode 0x11caecd>)) :ignore-regexps nil :ignore-messages nil :notification-handlers #<hash-table equal 0/65 0x3a85529> :request-handlers #<hash-table equal 0/65 0x3a854a1> :response-handlers #<hash-table eql 0/65 0x3a853c1> :prefix-function nil :uri-handlers #<hash-table equal 0/65 0x3a85339> :action-handlers #<hash-table equal 0/65 0x3a852b1> :major-modes (go-mode go-dot-mod-mode) :activation-fn nil :priority 0 :server-id gopls :multi-root nil :initialization-options nil :custom-capabilities nil :library-folders-fn lsp-go--library-default-directories :before-file-open-fn nil :initialized-fn nil :remote? nil :completion-in-comments? t :path->uri-fn nil :uri->path-fn nil :environment-fn nil :after-open-fn #f(compiled-function () #<bytecode 0x11c3aa1>) :async-request-handlers #<hash-table equal 0/65 0x3a85229> :download-server-fn nil :download-in-progress? nil :buffers nil) :host-root nil :proc #<process gopls> :cmd-proc #<process gopls> :buffers (#<buffer options.go> #<buffer options_test.go>) :semantic-tokens-faces nil :semantic-tokens-modifier-faces nil :extra-client-capabilities nil :status initialized :metadata #<hash-table equal 0/65 0x4a26ef9> :watches #<hash-table equal 0/65 0x4a27021> :workspace-folders nil :last-id 0 :status-string nil :shutdown-action nil :diagnostics #<hash-table equal 4/65 0x4a27425> :work-done-tokens #<hash-table equal 0/65 0x4a277c5>))
  #f(compiled-function (proc input) #<bytecode 0x4a27041>)(#<process gopls> "Content-Length: 160641\15\n\15\n{\"jsonrpc\":\"2.0\",\"method...")

That's a lot of times. The upstream issue seems to suggest that this is caused by an signal interruption where the file became corrupt.

Okay, but I don't even use dap-mode even though the package is installed and configured in doom-emacs. My guess is that at some point I hit C-z (or something else) and Emacs didn't get to finish writing to this file (regardless of whether or not there was user data in there). When it loads the file – to then read it into a hash table – the data is found to be invalid or unusable. I'm not sure which.

However, the thing errored 209 times in a very tight loop.

Needless to say, Emacs wasn't happy. I made it happy by deleting the file that it was trying to load:

→ rm ~/.local/doom/etc/.dap-breakpoints

Emacs' recursive edit, service with a bug?

While using query-replace I entered into the recursive edit mode – I've struggled to use this in the past for any great effect.. – so its no wonder I managed to /leave/ this mode on.

Later on during my session I tried to start up an emacsclient (with -nw to use the tty in a terminal emulator) and found that Emacs didn't want to respond to any keypress :(

Well. What do you do? You turn to a debugging tool of course! In this case, I have the handy M-x doom/toggle-profiler command to use from the excellent doom-emacs framework. This command toggles the CPU and memory profilers that Emacs has built in to the “on” position and records away. So, after starting the profiler all I had to do was start up my emacsclient client again and see what the heck was going on.

In my head I thought it could be:

An issue with my unusual invocation (fzf | xargs --open-tty --no-run-if-empty -t -- emacsclient -nw)
The main loop was stuck on a piece of code
$TERM, Emacs, and the terminal's tcinfo were in disagreement and could be spewing unknown/unhandled control characters around

Without digging into things to hard I found that the recursive edit was still in progress.. because it was the parent frame for every other frame recorded:

Note that the top frames are from first starting query-replace!

Neat.

Exiting the recursive edit fixed the locked sessions and everything is working much better now :)

Can't say its a bug for sure – yet – because I'd need to try coming at this from the repro case to make a helpful report. Which I just might do.

Installing from a Nix Flake

I wanted to get home-manager installed on a system of mine. Most of them have been set up to support the use of flakes and so far... its caused a lot of headaches in practice & operation. Don't read this wrong way – I'm happy with the results, but still learning the new pathways.

Anyhow, I wanted to do the equivalent of nix-env -iA nixpkgs.hello with a flake. None of the documentation seems to make any suggestion nor do they actually support Nix Flakes directly (nix-env is a very very old tool in the Nix world, along with concepts like nix-channel!). So, to install something from a flake, I got creative:

# Add entry to registry for aliased name (home-manager):
nix registry add home-manager github:nix-community/home-manager
# Build and add the default package to the user profile:
nix build 'home-manager#' --profile $HOME/.nix-profile
# Observe!
command -v home-manager
#=> /home/jake/.nix-profile/bin/home-manager

It's there!

Whether or not features in this particular tool (home-manager) supports running under this configuration is a different story.

I'll find out either way.

Indirect Mains: jumping into mains for testing & with dynamic deployment config

Here's a problem I was trying to resolve for myself. Being me, a engineer by trade who designs robust and future thinking solutions (so I can be safely lazy later), I have a tendency to over design, over think, and over build for the present scope of need. I've delivered on many of these designs by splitting areas of high return from features & nice-to-haves.. yadayada. Point is that at some point I catch myself before I lose the carefully stacked idea in my mind and make sure I get it out there!

In this case: I had set up a cluster of nodes running Hashicorp's Vault that I wanted to use to vend dynamic credentials for InfluxDB, MQTT, OpenID/JWT Authentication & Verification, Secrets storage, and various KMS functions.

These nodes provide their secrets just fine.. which is good? I promise they're only telling secrets to the right folks. To this very aim I found myself becoming more and more annoyed at the thought that I'd have to write yet another configuration template that I would have to deploy alongside my applications. Instead, I thought: huh, we have this fancy new embed package in Go and it probably could be used to make this a bit better – that'd be neat to be able to use Vault's rich Go clients (and many many other Go based behemoths.. ahem, k8s).

Ideas

I have a handful of approaches I want to take here.

Provide alternative entrypoint in ELF executable that hits an appended software entrypoint that's responsible for fetching credentials prior to the application starting up. The formatted credentials inserted applicable argv and environment variables by the appended entrypoint then configure the desired end application with runtime fetched secrets and configuration values (the process replaced by exec, or by literally pointing to its address and starting over? I have no idea how to make ELF magic work).
Provide in-memory file descriptors to closed files inherited by a child process
Hook into the native code with wrapping credentials fetches (the “we have the source Luke!” idea where I stop getting fancy). This assumes that the application itself provide an interface to “run it”, like: PlainMain(context.Context, string, []string) error.
(3a) Hook into the native code by sharing entries in the embedded filesystem tree! No idea whether or not this is even possible to do. I learned about the added package just after 1.16 was released. Either way, I'll try this out too.
(3b) Call embedded memory blobs from the embed filesystem and set memory locations with credential bearing cstrings (probably not much better than #!/usr/bin/env sh\n\nexec ./silly -pw Hunter1, really).

Replacing spinning metal in an Xbox One

I have a very old Xbox One with aging metal hurtling around to fetch games, which gives me and my partner a chance to grab coffee, maybe take the dog out, and have 30 seconds to spare before the next loading screen. Our Xbox came with a Kinect and Sunset Overdrive – Day One Edition – a fantastic game. I really mean it when I say it's old.

Recently though, we've gotten back to playing through some uncompleted games and they load slower than molasses. Final Fantasy XV? Yeah, go out and get a coffee, a gluten free bagel, and then come back, eat, and get ready to play!!

Waiting a long time between loads, whether due to Game Overs or not, is not fun.

So? What can we do? Replace the spinning platter(s) of metal in the console and move on to snappy solid state storage!

That's what exactly what I did.

Uncase the Xbox (while its fully disconnected, naturally)
Remove the spinning metal disk
Plug in the spinning metal disk into a machine along with the new image
Use dd to copy the image from the original disk to the, ideally, equally sized or larger, solid state disk that'll go back in its place. I've seen it written that there's limitations on the actual disk size supported, I went with a 2.5” 1TB Samsung 860 EVO to replace the 2.5” 5200 RPM Seagate laptop hard drive that was in there and had no issue.
After dd-ing, attach the disk to a Window's “box” (I used a handy VM that I passed the block device into).
With the Windows host, use diskmgr to resize the largest existing NTFS partition to occupy any newly available space (on your new disk, if its bigger than the original).
Remove from the Windows machine (unmount, detach, etc.) and extract the disk to be inserted into the Xbox
Insert the disk and reassemble the Xbox case with the disk secured where the old spinning hard drive was.
Power back on and skip the trip to the coffee shop, because you're playing on a console with solid state storage! I think we saw a minute and a half (90+ seconds) drop to about 30 seconds to load Final Fantasy XV, I didn't take down good notes of the time, sorry.

I had hoped this post to be more “full”, but life got in the way and had to move with short notice winter 2020 (just before the holidays here in the US). If there's details folks would like to see added, please reach out to me – I'd be happy to add more in that case: editor AT prag.dev

Docker being excised from k8s: good

Warning: this is more or a less a #rant about #docker & #containers (and is of my own strong opinions about containers).

It shouldn't matter whether or not docker is dying, dead, or otherwise. Some folks seem to be elated that Docker will no longer be supported with Kubernetes, other folks seem confused. And there's others, like myself, who see Docker as having built something incredible and having stymied itself trying to transform into a business.

The way I see it, docker's introduction fundamentally changed the way the software, and its infrastructure sub-industry, operate. However, the business side of things made me think that there must have been a significant internal imbalance where idealistic technologists, engineers, and those marketing their efforts were fully at odds. Luckily for us moby (moby having been renamed from Docker to disassociate tech from the business side of things) spawned what is now the Open Containers Initiative and its many specifications for container technologies.

Docker isn't a fundamental player any longer. We have Docker to thank, praise, and continue to use & appreciate for their contributions to Linux primitives – cgroups, psi, some systemd developments, and more. Docker is a fantastic development tool and, despite growing beyond their core components, continues to innovate in building out tools for the community – buildkit, linuxkit, docker-compose (remember when that was python?).

Personally, I have to thank dockerd and its roots for pulling me into its “containers” orbit, teaching me Go, and offering a platform on which to build out my own ideas.

I'm excited for what Docker can become having been liberated from Kubelet. They've done turnabouts before, and I hope this time Docker improves its uses of containerd, runc, and other CRI runtimes while leaving aside related, but distracting efforts. Sheer unbridled intuition tells me that we're going to see them either innovate, thriving on the shoulders of foundational tech.. or fall flat with the community making moby the tool they want and need it to become.

As always, this is my opinion and does not reflect anyone else's or any of my employer's perspective on these matters.

Over the years I have appreciated docker, understood its purpose, exercised it to great results, and hope only to it morph into its next phase in our collective engineering toolboxes. Kubernetes doesn't need Docker and Docker doesn't need to be the bedrock (or in the critical path, as it were) of Kubernetes. Being independent, a tool once again safely malleable, makes for interesting opportunities.

Building kernel modules on NixOS

NixOS is a favorite of mine. I waste little opportunity to mention it when coworkers lament or when ideas come up that are not only deftly solved by Nix and solved in a way that's remarkably similar to their own lofty goals or vision. I'm not concerned with even promising more posts about #Nix, #NixOS, and Nixpkgs – they're coming. Whether I want them to or not. Be prepared.

NixOS is a favorite, but it's also quirky as hell.

The #linux distribution isn't your typical one: it has everything content addressed, wrapped to deal with this, and is immutable in nature (ostensibly so). With the advantages (more to come, or read the NixOS website) that it brings, it also brings along its deficiencies and disadvantages. In my case, and for the focus of this post, this recently bit me when trying to make a proprietary driver work for the video capture card that I bought: a Blackmagic Design DeckLink 4K Mini Recorder (whoa, that's a stupidly long name).

The driver from the card's company is distributed behind a “click to accept the terms of this license” button and, more disappointingly, with proprietary binary blobs. Yay.

For most [^1] Linux users, this isn't an issue. Companies that “support Linux users” will usually [^1] build their blobs with the same toolchain and shared libraries used by popular distributions. When they don't, I've seen them drop in their own shared libraries and plan on the user having at least a compatible linker and kernel ABI [^2]. There's a fun story here about some ancient IBM TTS system.. for another time. In my case, NixOS being a not-popular nor “typical” distribution, I had issues.

NixOS' kernel sources don't land in /usr/src. It's kernel modules don't land in /lib/modules/$(uname -r). They're in /run/current-system/kernel-modules. Ugh. The differences don't end there and they're nuanced. If a linker was hardcoded into the executable, proprietary binary blobs.. well, on my system those paths are wrong. I don't install the .deb, or the .rpm. Typically, I install software my distribution packages with commands like nix-env -iA nixos.ripgrep. These commands wind up pulling in runtime dependencies (or build dependencies for a fallback build) and place the “package” in /nix/store. This works very well for appropriately licensed and distributed software – but the closed source ones.. they take the cake for adventures to places I didn't contemplate going.

These drivers, admittedly, were not closed source. So? Was it easy? Not a walk in the park, but not too bad. I've dealt with worse on NixOS.

The Linux solution published by the fine folks at Blackmagic Design, included a few utilities, helper executables, and, of course, the kernel module for the PCIe card itself. There's a separate SDK vended too – it's needed for building ffmpeg with decklink support, but that's very out of scope.

Back to the kernel module.

This expression handles a couple things:

unpack the assumptive tarball (it's basically an rpm but not)
patching the upstream sources (to actually work.. thanks to Arch Linux maintainers here!)
building the kernel module (it's a more-or-less standard process)
ripping out “impure” [^3] or unneeded references (using nuke-refs)

I'm not explaining Nix expressions in this go, maybe another time. I'm continuing on through anyway.

{ stdenv
, fetchpatch
, nukeReferences
, linuxPackages
, kernel ? linuxPackages.kernel
, version
, src
}:

stdenv.mkDerivation {
  name = "blackmagic-${version}-module-${kernel.modDirVersion}";
  inherit version;

  buildInputs = [ nukeReferences ];

  kernel = kernel.dev;
  kernelVersion = kernel.modDirVersion;

  inherit src;
  patches = [
    (fetchpatch {
      name = "fix-get_user_pages-and-mmap_lock.patch";
      url = "https://aur.archlinux.org/cgit/aur.git/plain/02-fix-get_user_pages-and-mmap_lock.patch?h=decklink&id=8f19ef584c0603105415160d2ba4e8dfa47495ce";
      sha256 = "08m4qwrk0vg8rix59y591bjih95d2wp6bmm1p37nyfvhi2n9jw2m";
    })
    (fetchpatch {
      name = "fix-have_unlocked_ioctl.patch";
      url = "https://aur.archlinux.org/cgit/aur.git/plain/03-fix-have_unlocked_ioctl.patch?h=decklink&id=8f19ef584c0603105415160d2ba4e8dfa47495ce";
      sha256 = "0j9p62qa4mc6ir2v4fzrdapdrvi1dabrjrx1c295pwa3vmsi1x4f";
    })
  ];

  postUnpack = ''
    cd */usr/src
    sourceRoot="$(pwd -P)"
  '';

  buildPhase = ''
    cd $sourceRoot/blackmagic-''${version}*/
    # missing some "touch" commands, make sure they exist for build.
    touch .bmd-support.o.cmd
    make -C $kernel/lib/modules/$kernelVersion/build modules "M=$(pwd -P)"

    cd $sourceRoot/blackmagic-io-''${version}*/
    # missing some "touch" commands, make sure they exist for build.
    touch .blackmagic.o.cmd
    make -C $kernel/lib/modules/$kernelVersion/build modules "M=$(pwd -P)"

    cd $sourceRoot
  '';

  installPhase = ''
    mkdir -p $out/lib/modules/$kernelVersion/misc
    for x in $(find . -name '*.ko'); do
      nuke-refs $x
      cp $x $out/lib/modules/$kernelVersion/misc/
    done
  '';

  meta.platforms = [ "x86_64-linux" ];
}

Building the kernel module was a piece of cake, really. The meat of the “interesting parts” here was a snippet I kept from another round of kernel module hackery (oh, FireWire..):

    make -C $kernel/lib/modules/$kernelVersion/build modules "M=$(pwd -P)"

The above line builds the appropriate kernel module with, and for, the provided kernel. That's pretty typical of any given kernel module – in tree and out of tree modules. When the above derivation is realised (ie: built), the derivation (ie: a package, but not in the traditional sense) outputs a kernel module – the .ko – such that it'll appear at /run/current-system/kernel-modules/lib/modules/5.9.10/misc. modprobe is configured on NixOS hosts to use this directory, so all is well!

But, that's not all. I mentioned that there were helper executables. They're needed in order to setup a capture device when the PCI probe matches one by way of udev. Without the helpers, the device is halfway initialized – software that's capable of using the correct IOCTLs can't even see the devices until the helpers have a chat with the kernel module.

In full glory, here's the tools. Just the tools. There's still the SDK too..

{ stdenv
, autoPatchelfHook
, makeWrapper
, libGL, libGLU
, libuuid
, dbus_libs
, alsaLib
, xorg
, fontconfig
, freetype
, glib
, version
, src
, mediaexpress ? null
}:

stdenv.mkDerivation {
  pname = "blackmagic-tools";
  inherit version;

  inherit src mediaexpress;

  donStrip = true;

  nativeBuildInputs = [ makeWrapper autoPatchelfHook ];
  buildInputs = [ libGL libGLU libuuid dbus_libs freetype fontconfig glib alsaLib ]
                ++ (with xorg; [libxcb libXrender libICE libX11 libXinerama libXrandr libSM ]);


  postUnpack = ''
    if [[ -s "$mediaexpress" ]]; then
      tar -C $sourceRoot --strip-components=1 -xf $mediaexpress
    fi
  '';

  doBuild = false;

  installPhase = ''
    bins=( $(cd usr/bin; ls) )
    # Add the helpers to bin. They're needed by udev rules and the supporting
    # systemd service.
    bins+=( DesktopVideoNotifier DesktopVideoHelper )
    # Prune vended systemd units and the symlinked bin stubs.
    rm -rf usr/lib/systemd usr/bin

    cp -R usr $out

    # Replace symlinks with wrappers to include dynamicly dlopen'd libraries.
    # patchelf may corrupt the executables when adding a static entry that would
    # normally influence the RPATH.
    libBin=$out/lib/blackmagic/DesktopVideo
    for x in "''${bins[@]}"; do
      towrap="$out/lib/blackmagic/MediaExpress/$x"
      if ! [[ -x "$towrap" ]]; then
        towrap="$libBin/$x"
      fi
      makeWrapper $towrap $out/bin/$x \
        --prefix LD_LIBRARY_PATH ':' $out/lib \
        --prefix QT_PLUGIN_PATH ':' $libBin/plugins
    done

    # Need to substitute the executable path in these rules to use the $out/bin
    # path.
    mkdir -p $out/lib/udev/rules.d
    substitute \
      etc/udev/rules.d/55-blackmagic.rules \
      $out/lib/udev/rules.d/55-blackmagic.rules \
      --replace /usr/lib/blackmagic/DesktopVideo/DesktopVideoNotifier \
               $out/lib/blackmagic/DesktopVideo/DesktopVideoNotifier
  '';

  preFixup = ''
    # add $out/lib to the RPATH set on executables to use bundled version of
    # Qt5.
    runtimeDependencies+=" $out"
  '';
}

The tools provided the helpful little viewer and card setting utilities, right next to DesktopVideoNotifier and DesktopVideoHelper which provide the other half of capture card initialization. So, because I was being pragmatic, I lumped together these tools and did “brain surgery” in the context of executables by using patchelf (by way of autoPatchelf): the bundled shared libraries and expected runtime libraries, which were expected to be present like the popular crowds, were all in fact shoehorned into the executables' RPATH. And then wrapped.

As it turns out, wrappers are a good thing. Nixpkgs uses them liberally to improve the flexibility and reusability of its more build-time-consuming derivations (not as a goal per-se). This saves on build time and also disk space – if you wrap the thing you want to customize in a way that permits stubbing or subbing out dependencies, runtime configuration, or plain 'ol dependency resolution then you can easily change your mind later on. If you read through packages in the Nixpkgs repository, you'll come to see that its not uncommon to have packages several layers deep of derivations. Often times they're layered specifically to allow users to override some characteristic or improve composition of multiple derivations to produce a final useful environment.

You might have noticed there's missing.. everything.. that should have connected these up to be something. In this case, you'll have to deal with it. I returned the capture card after struggling to keep it and the devices plugged into it on the correct output/input formats – on top of the whole “we're NVIDIA and what is a GPL?” thing here in November of 2020 which made CUDA/NVENC workloads non-functional with Linux 5.9.

I've got the code hanging around and arguably should flesh this out more. I used more words than planned anyhow, so this'll be it.

[^1]: I'm not going to even try to quantify

[^2]: This is cool stuff, there's lot's to read on this topic, foo

[^3]: Just search for impure in the Nix manual and Nixpkg's, purity is a nuisance to the solution that Nix proposes.