Volunteer Suicide on Debian Day and other avoidable deaths

Debian, Volunteer, Suicide

Feeds

February 03, 2026

hackergotchi for Jonathan Dowland

Jonathan Dowland

FOSDEM 2026 talk recording available

FOSDEM 2026 was great! I hope to blog a proper postmortem in due course. But for now, The video of my talk is up, as are my slides with speaker notes and links.

03 February, 2026 04:21PM

February 01, 2026

hackergotchi for Benjamin Mako Hill

Benjamin Mako Hill

What do people do when they edit Wikipedia through Tor?

Note: I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface these for folks who missed them, I will be periodically (re)publishing blog posts about some “older” published projects. This post is closely based on a previously published post by Kaylea Champion on the Community Data Science Blog.

Many individuals use Tor to reduce their visibility to widespread internet surveillance.

One popular approach to protecting our privacy online is to use the Tor network. Tor protects users from being identified by their IP address, which can be tied to a physical location. However, if you’d like to contribute to Wikipedia using Tor, you’ll run into a problem. Although most IP addresses can edit without an account, Tor users are blocked from editing.

Tor users attempting to contribute to Wikipedia are shown a screen that informs them that they are not allowed to edit Wikipedia.

Other research by my team has shown that Wikipedia’s attempt to block Tor is imperfect and that some people have been able to edit despite the ban. As part of this work, we built a dataset of more than 11,000 contributions to Wikipedia via Tor and used quantitative analysis to show that contributions from Tor were of about the same quality as contributions from other new editors and other contributors without accounts. Of course, given the unusual circumstances Tor-based contributors face, we wondered whether a deeper look at the content of their edits might reveal more about their motives and the kinds of contributions they seek to make. Kaylea Champion (then a student, now faculty at UW Bothell) led a qualitative investigation to explore these questions.

Given the challenges of studying anonymity seekers, we designed a novel “forensic” qualitative approach inspired by techniques common in computer security and criminal investigation. We applied this new technique to a sample of 500 editing sessions and categorized each session based on what the editor seemed to be intending to do.

Most of the contributions we found fell into one of the two following categories:

  • Many contributions were quotidian attempts to add to the encyclopedia. Tor-based editors added facts, fixed typos, and updated train schedules. There’s no way to know whether these individuals knew they were just getting lucky in their ability to edit or were patiently reloading to evade the ban.
  • Second, we found harassing comments and vandalism. Unwelcome conduct is common in online environments, and it is sometimes more common when the likelihood of being identified decreases. Some of the harassing comments we observed were direct responses to being banned as a Tor user.

Although these were most of what we observed, we also found evidence of several types of contributor intent:

  • We observed activism: when a contributor tried to draw attention to journalistic accounts of environmental and human rights abuses committed by a mining company, only to have editors linked to the mining company repeatedly remove their edits. Another example involved an editor trying to diminish the influence of proponents of alternative medicine.
  • We also observed quality maintenance activities when editors used Wikipedia’s rules on appropriate sourcing to remove personal websites being cited in conspiracy theories.
  • We saw edit wars with Tor editors participating in a back-and-forth removal and replacement of content as part of a dispute, in some cases countering the work of an experienced Wikipedia editor whom even other experienced editors had gauged to be biased.
  • Finally, we saw Tor-based editors participating in non-article discussions, such as investigations into administrator misconduct and protests against the Wikipedia platform’s mistrust of Tor editors.
An exploratory mapping of our themes in terms of the value a type of contribution represents to the Wikipedia community and the importance of anonymity in facilitating it. Anonymity-protecting tools play a critical role in facilitating contributions on the right side of the figure, while edits on the left are more likely to occur even when anonymity is impossible. Contributions toward the top reflect valuable forms of participation in Wikipedia, while edits at the bottom reflect damage.

In all, these themes led us to reflect on how the risks individuals face when contributing to online communities sometimes diverge from the risks the communities face in accepting their work. Expressing minoritized perspectives, maintaining community standards even when you may be targeted by the rulebreaker, highlighting injustice, or acting as a whistleblower can be very risky for an individual, and may not be possible without privacy protections. Of course, in platforms seeking to support the public good, such knowledge and accountability may be crucial.


This work was published as a paper at CSCW: Kaylea Champion, Nora McDonald, Stephanie Bankes, Joseph Zhang, Rachel Greenstadt, Andrea Forte, and Benjamin Mako Hill. 2019. A Forensic Qualitative Analysis of Contributions to Wikipedia from Anonymity Seeking Users. Proceedings of the ACM on Human-Computer Interaction. 3, CSCW, Article 53 (November 2019), 26 pages. https://doi.org/10.1145/3359155

This project was conducted by Kaylea Champion, Nora McDonald, Stephanie Bankes, Joseph Zhang, Rachel Greenstadt, Andrea Forte, and Benjamin Mako Hill. This work was supported by the National Science Foundation (awards CNS-1703736 and CNS-1703049) and included the work of two undergraduates supported through an NSF REU supplement.

01 February, 2026 12:19PM by Benjamin Mako Hill

hackergotchi for Junichi Uekawa

Junichi Uekawa

Got rid of documents I had for last year's Tax return.

Got rid of documents I had for last year's Tax return. Now I have the least document in my bookshelf out of the year.

01 February, 2026 05:54AM by Junichi Uekawa

Russ Allbery

Review: Paladin's Faith

Review: Paladin's Faith, by T. Kingfisher

Series: The Saint of Steel #4
Publisher: Red Wombat Studio
Copyright: 2023
ISBN: 1-61450-614-0
Format: Kindle
Pages: 515

Paladin's Faith is the fourth book in T. Kingfisher's loosely connected series of fantasy novels about the berserker former paladins of the Saint of Steel. You could read this as a standalone, but there are numerous (spoilery) references to the previous books in the series.

Marguerite, who was central to the plot of the first book in the series, Paladin's Grace, is a spy with a problem. An internal power struggle in the Red Sail, the organization that she's been working for, has left her a target. She has a plan for how to break their power sufficiently that they will hopefully leave her alone, but to pull it off she's going to need help. As the story opens, she is working to acquire that help in a very Marguerite sort of way: breaking into the office of Bishop Beartongue of the Temple of the White Rat.

The Red Sail, the powerful merchant organization Marguerite worked for, makes their money in the salt trade. Marguerite has learned that someone invented a cheap and reproducible way to extract salt from sea water, thus making the salt trade irrelevant. The Red Sail wants to ensure that invention never sees the light of day, and has forced the artificer into hiding. Marguerite doesn't know where they are, but she knows where she can find out: the Court of Smoke, where the artificer has a patron.

Having grown up in Anuket City, Marguerite was familiar with many clockwork creations, not to mention all the ways that they could go horribly wrong. (Ninety-nine times out of a hundred, it was an explosion. The hundredth time, it ran amok and stabbed innocent bystanders, and the artificer would be left standing there saying, "But I had to put blades on it, or how would it rake the leaves?" while the gutters filled up with blood.)

All Marguerite needs to put her plan into motion is some bodyguards so that she's not constantly distracted and anxious about being assassinated. Readers of this series will be unsurprised to learn that the bodyguards she asks Beartongue for are paladins, including a large broody male one with serious self-esteem problems.

This is, like the other books in this series, a slow-burn romance with infuriating communication problems and a male protagonist who would do well to seek out a sack of hammers as a mentor. However, it has two things going for it that most books in this series do not: a long and complex plot to which the romance takes a back seat, and Marguerite, who is not particularly interested in playing along with the expected romance developments. There are also two main paladins in this story, not just one, and the other is one of the two female paladins of the Saint of Steel and rather more entertaining than Shane.

I generally like court intrigue stories, which is what fills most of this book. Marguerite is an experienced operative, so the reader gets some solid competence porn, and the paladins are fish out of water but are also unexpectedly dangerous, which adds both comedy and satisfying table-turning. I thoroughly enjoyed the maneuvering and the culture clashes. Marguerite is very good at what she does, knows it, and is entirely uninterested in other people's opinions about that, which short-circuits a lot of Shane's most annoying behavior and keeps the story from devolving into mopey angst like some of the books in this series have done.

The end of this book takes the plot in a different direction that adds significantly to the world-building, but also has a (thankfully short) depths of despair segment that I endured rather than enjoyed. I am not really in the mood for bleak hopelessness in my fiction at the moment, even if the reader is fairly sure it will be temporary. But apart from that, I thoroughly enjoyed this book from beginning to end. When we finally meet the artificer, they are an absolute delight in that way that Kingfisher is so good at. The whole story is infused with the sense of determined and competent people refusing to stop trying to fix problems. As usual, the romance was not for me and I think the book would have been better without it, but it's less central to the plot and therefore annoyed me less than any of the books in this series so far.

My one major complaint is the lack of gnoles, but we get some new and intriguing world-building to make up for it, along with a setup for a fifth book that I am now extremely curious about.

By this point in the series, you probably know if you like the general formula. Compared to the previous book, Paladin's Hope, I thought Paladin's Faith was much stronger and more interesting, but it's clearly of the same type. If, like me, you like the plots but not the romance, the plot here is more substantial. You will have to decide if that makes up for a romance in the typical T. Kingfisher configuration.

Personally, I enjoyed this quite a bit, except for the short bleak part, and I'm back to eagerly awaiting the next book in the series.

Rating: 8 out of 10

01 February, 2026 04:54AM

January 31, 2026

hackergotchi for Joey Hess

Joey Hess

the local weather

Snow coming. I'm tuned into the local 24 hour slop weather stream. AI generated, narrated, up to the minute radar and forecast graphics. People popping up on the live weather map with questions "snow soon?" (They pay for the privilege.) LLM generating reply that riffs on their name. Tuned to keep the urgency up, something is always happening somewhere, scanners are pulling the police reports, live webcam description models add verisimilitude to the description of the morning commute. Weather is happening.

In the subtext, climate change is happening. Weather is a growth industry. The guy up in Kentucky coal country who put this thing together is building an empire. He started as just another local news greenscreener. Dropped out and went twitch weather stream. Hyping up tornado days and dicy snow forecasts. Nowcasting, hyper individualized, interacting with chat. Now he's automated it all. On big days when he's getting real views, the bot breaks into his live streams, gives him a break.

Only a few thousand watching this morning yet. Perfect 2026 grade slop. Details never quite right, but close enough to keep on in the background all day. Nobody expects a perfect forecast after all, and it's fed from the National Weather Center discussion too. We still fund those guys? Why bother when a bot can do it?

He knows why he's big in these states, these rural areas. Understands the target audience. Airbrushed AI aesthetics are ok with them, receive no pushback. Flying more under the radar coastally, but weather is big there and getting bigger. The local weather will come for us all.

6 inches of snow covering some ground mount solar panels with a vertical solar panel fence behind them free of snow except cute little caps

(Not fiction FYI.)

31 January, 2026 06:25PM

hackergotchi for Michael Prokop

Michael Prokop

apt, SHA-1 keys + 2026-02-01

You might have seen Policy will reject signature within a year warnings in apt(-get) update runs like this:

root@424812bd4556:/# apt update
Get:1 http://foo.example.org/debian demo InRelease [4229 B]
Hit:2 http://deb.debian.org/debian trixie InRelease
Hit:3 http://deb.debian.org/debian trixie-updates InRelease
Hit:4 http://deb.debian.org/debian-security trixie-security InRelease
Get:5 http://foo.example.org/debian demo/main amd64 Packages [1097 B]
Fetched 5326 B in 0s (43.2 kB/s)
All packages are up to date.
Warning: http://foo.example.org/debian/dists/demo/InRelease: Policy will reject signature within a year, see --audit for details

root@424812bd4556:/# apt --audit update
Hit:1 http://foo.example.org/debian demo InRelease
Hit:2 http://deb.debian.org/debian trixie InRelease
Hit:3 http://deb.debian.org/debian trixie-updates InRelease
Hit:4 http://deb.debian.org/debian-security trixie-security InRelease
All packages are up to date.    
Warning:  http://foo.example.org/debian/dists/demo/InRelease: Policy will reject signature within a year, see --audit for details
Audit:  http://foo.example.org/debian/dists/demo/InRelease: Sub-process /usr/bin/sqv returned an error code (1), error message is:
   Signing key on 54321ABCD6789ABCD0123ABCD124567ABCD89123 is not bound:
              No binding signature at time 2024-06-19T10:33:47Z
     because: Policy rejected non-revocation signature (PositiveCertification) requiring second pre-image resistance
     because: SHA1 is not considered secure since 2026-02-01T00:00:00Z
Audit: The sources.list(5) entry for 'http://foo.example.org/debian' should be upgraded to deb822 .sources
Audit: Missing Signed-By in the sources.list(5) entry for 'http://foo.example.org/debian'
Audit: Consider migrating all sources.list(5) entries to the deb822 .sources format
Audit: The deb822 .sources format supports both embedded as well as external OpenPGP keys
Audit: See apt-secure(8) for best practices in configuring repository signing.
Audit: Some sources can be modernized. Run 'apt modernize-sources' to do so.

If you ignored this for the last year, I would like to tell you that 2026-02-01 is not that far away (hello from the past if you’re reading this because you’re already affected).

Let’s simulate the future:

root@424812bd4556:/# apt --update -y install faketime
[...]
root@424812bd4556:/# export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/faketime/libfaketime.so.1 FAKETIME="2026-08-29 23:42:11" 
root@424812bd4556:/# date
Sat Aug 29 23:42:11 UTC 2026

root@424812bd4556:/# apt update
Get:1 http://foo.example.org/debian demo InRelease [4229 B]
Hit:2 http://deb.debian.org/debian trixie InRelease                                 
Err:1 http://foo.example.org/debian demo InRelease
  Sub-process /usr/bin/sqv returned an error code (1), error message is: Signing key on 54321ABCD6789ABCD0123ABCD124567ABCD89123 is not bound:            No binding signature at time 2024-06-19T10:33:47Z   because: Policy rejected non-revocation signature (PositiveCertification) requiring second pre-image resistance   because: SHA1 is not considered secure since 2026-02-01T00:00:00Z
[...]
Warning: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. OpenPGP signature verification failed: http://foo.example.org/debian demo InRelease: Sub-process /usr/bin/sqv returned an error code (1), error message is: Signing key on 54321ABCD6789ABCD0123ABCD124567ABCD89123 is not bound:            No binding signature at time 2024-06-19T10:33:47Z   because: Policy rejected non-revocation signature (PositiveCertification) requiring second pre-image resistance   because: SHA1 is not considered secure since 2026-02-01T00:00:00Z
[...]
root@424812bd4556:/# echo $?
100

Now, the proper solution would have been to fix the signing key underneath (via e.g. sq cert lint &dash&dashfix &dash&dashcert-file $PRIVAT_KEY_FILE > $PRIVAT_KEY_FILE-fixed).

If you don’t have access to the according private key (e.g. when using an upstream repository that has been ignoring this issue), you’re out of luck for a proper fix.

But there’s a workaround for the apt situation (related see apt commit 0989275c2f7afb7a5f7698a096664a1035118ebf):

root@424812bd4556:/# cat /usr/share/apt/default-sequoia.config
# Default APT Sequoia configuration. To overwrite, consider copying this
# to /etc/crypto-policies/back-ends/apt-sequoia.config and modify the
# desired values.
[asymmetric_algorithms]
dsa2048 = 2024-02-01
dsa3072 = 2024-02-01
dsa4096 = 2024-02-01
brainpoolp256 = 2028-02-01
brainpoolp384 = 2028-02-01
brainpoolp512 = 2028-02-01
rsa2048  = 2030-02-01

[hash_algorithms]
sha1.second_preimage_resistance = 2026-02-01    # Extend the expiry for legacy repositories
sha224 = 2026-02-01

[packets]
signature.v3 = 2026-02-01   # Extend the expiry

Adjust this according to your needs:

root@424812bd4556:/# mkdir -p /etc/crypto-policies/back-ends/

root@424812bd4556:/# cp /usr/share/apt/default-sequoia.config /etc/crypto-policies/back-ends/apt-sequoia.config

root@424812bd4556:/# $EDITOR /etc/crypto-policies/back-ends/apt-sequoia.config

root@424812bd4556:/# cat /etc/crypto-policies/back-ends/apt-sequoia.config
# APT Sequoia override configuration
[asymmetric_algorithms]
dsa2048 = 2024-02-01
dsa3072 = 2024-02-01
dsa4096 = 2024-02-01
brainpoolp256 = 2028-02-01
brainpoolp384 = 2028-02-01
brainpoolp512 = 2028-02-01
rsa2048  = 2030-02-01

[hash_algorithms]
sha1.second_preimage_resistance = 2026-09-01    # Extend the expiry for legacy repositories
sha224 = 2026-09-01

[packets]
signature.v3 = 2026-02-01   # Extend the expiry

Then we’re back into the original situation, being a warning instead of an error:

root@424812bd4556:/# apt update
Hit:1 http://deb.debian.org/debian trixie InRelease
Get:2 http://foo.example.org/debian demo InRelease [4229 B]
Hit:3 http://deb.debian.org/debian trixie-updates InRelease
Hit:4 http://deb.debian.org/debian-security trixie-security InRelease
Warning: http://foo.example.org/debian/dists/demo/InRelease: Policy will reject signature within a year, see --audit for details
[..]

Please note that this is a workaround, and not a proper solution.

31 January, 2026 01:57PM by mika

hackergotchi for Benjamin Mako Hill

Benjamin Mako Hill

Dialogue

Me: Do you want your coffee in a Japanese or Western style tea cup?

M: Yunomi.

Me: Apparently not as well as you think I do!

31 January, 2026 10:59AM by Benjamin Mako Hill

Russ Allbery

Review: Dragon Pearl

Review: Dragon Pearl, by Yoon Ha Lee

Series: Thousand Worlds #1
Publisher: Rick Riordan Presents
Copyright: 2019
ISBN: 1-368-01519-0
Format: Kindle
Pages: 315

Dragon Pearl is a middle-grade space fantasy based on Korean mythology and the first book of a series.

Min is a fourteen-year-old girl living on the barely-terraformed world of Jinju with her extended family. Her older brother Jun passed the entrance exams for the Academy and left to join the Thousand Worlds Space Forces, and Min is counting the years until she can do the same. Those plans are thrown into turmoil when an official investigator appears at their door claiming that Jun deserted to search for the Dragon Pearl. A series of impulsive fourteen-year-old decisions lead to Min heading for a spaceport alone, determined to find her brother and prove his innocence.

This would be a rather improbable quest for a young girl, but Min is a gumiho, one of the supernaturals who live in the Thousand Worlds alongside non-magical humans. Unlike the more respectable dragons, tigers, goblins, and shamans, gumiho are viewed with suspicion and distrust because their powers are useful for deception. They are natural shapeshifters who can copy the shapes of others, and their Charm ability lets them influence people's thoughts and create temporary illusions of objects such as ID cards. It will take all of Min's powers, and some rather lucky coincidences, to infiltrate the Space Forces and determine what happened to her brother.

It's common for reviews of this book to open with a caution that this is a middle-grade adventure novel and you should not expect a story like Ninefox Gambit. I will be boring and repeat that caution. Dragon Pearl has a single first-person viewpoint and a very linear and straightforward plot. Adult readers are unlikely to be surprised by plot twists; the fun is the world-building and seeing how Min manages to work around plot obstacles.

The world-building is enjoyable but not very rigorous. Min uses and abuses Charm with the creative intensity of a Dungeons & Dragons min-maxer. Each individual event makes sense given the implication that Min is unusually powerful, but I'm dubious about the surrounding society and lack of protections against Charm given what Min is able to do. Min does say that gumiho are rare and many people think they're extinct, which is a bit of a fig leaf, but you'll need to bring your urban fantasy suspension of disbelief skills to this one.

I did like that the world-building conceit went more than skin deep and influenced every part of the world. There are ghosts who are critical to the plot. Terraforming is done through magic, hence the quest for the Dragon Pearl and the miserable state of Min's home planet due to its loss. Medical treatment involves the body's meridians, as does engineering: The starships have meridians similar to those of humans, and engineers partly merge with those meridians to adjust them. This is not the sort of book that tries to build rigorous scientific theories or explain them to the reader, and I'm not sure everything would hang together if you poked at it too hard, but Min isn't interested in doing that poking and the story doesn't try to justify itself. It's mostly a vibe, but it's a vibe that I enjoyed and that is rather different than other space fantasy I've read.

The characters were okay but never quite clicked for me, in part because proper character exploration would have required Min take a detour from her quest to find her brother and that was not going to happen. The reader gets occasional glimpses of a military SF cadet story and a friendship on false premises story, but neither have time to breathe because Min drops any entanglement that gets in the way of her quest. She's almost amoral in a way that I found believable but not quite aligned with my reading mood. I also felt a bit wrong-footed by how her friendships developed; saying too much more would be a spoiler, but I was expecting more human connection than I got.

I think my primary disappointment with this book was something I knew going in, not in any way its fault, and part of the reason why I'd put off reading it: This is pitched at young teenagers and didn't have quite enough plot and characterization complexity to satisfy me. It's a linear, somewhat episodic adventure story with some neat world-building, and it therefore glides over the spots where an adult novel would have added political and factional complexity. That is exactly as advertised, so it's up to you whether that's the book you're in the mood for.

One warning: The text of this book opens with an introduction by Rick Riordan that is just fluff marketing and that spoils the first few chapters of the book. It is unmarked as such at the beginning and tricked me into thinking it was the start of the book proper, and then deeply annoyed me. If you do read this book, I recommend skipping the utterly pointless introduction and going straight to chapter one.

Followed by Tiger Honor.

Rating: 6 out of 10

31 January, 2026 05:26AM

January 29, 2026

hackergotchi for C.J. Adams-Collier

C.J. Adams-Collier

Part 3: Building the Keystone – Dataproc Custom Images for Secure Boot & GPUs

Part
3: Building the Keystone – Dataproc Custom Images for Secure Boot &
GPUs

In Part 1, we established a secure, proxy-only network. In Part 2, we
explored the enhanced install_gpu_driver.sh initialization
action. Now, in Part 3, we’ll focus on using the LLC-Technologies-Collier/custom-images
repository (branch proxy-exercise-2025-11) to build the
actual custom Dataproc images embedded with NVIDIA drivers signed for
Secure Boot, all within our proxied environment.

Why Custom Images?

To run NVIDIA GPUs on Shielded VMs with Secure Boot enabled, the
NVIDIA kernel modules must be signed with a key trusted by the VM’s EFI
firmware. Since standard Dataproc images don’t include these
custom-signed modules, we need to build our own. This process also
allows us to pre-install a full stack of GPU-accelerated software.

The
custom-images Toolkit
(examples/secure-boot)

The examples/secure-boot directory within the
custom-images repository contains the necessary scripts and
configurations, refined through significant development to handle proxy
and Secure Boot challenges.

Key Components & Development Insights:

  • env.json: The central configuration
    file (as used in Part 1) for project, network, proxy, and bucket
    details. This became the single source of truth to avoid configuration
    drift.
  • create-key-pair.sh: Manages the Secure
    Boot signing keys (PK, KEK, DB) in Google Secret Manager, essential for
    the module signing.
  • build-and-run-podman.sh: Orchestrates
    the image build process in an isolated Podman container. This was
    introduced to standardize the build environment and encapsulate
    dependencies, simplifying what the user needs to install locally.
  • pre-init.sh: Sets up the build
    environment within the container and calls
    generate_custom_image.py. It crucially passes metadata
    derived from env.json (like proxy settings and Secure Boot
    key secret names) to the temporary build VM.
  • generate_custom_image.py: The core
    Python script that automates GCE VM creation, runs the customization
    script, and creates the final GCE image.
  • gce-proxy-setup.sh: This script from
    startup_script/ is vital. It’s injected into the temporary
    build VM and runs first to configure the OS, package
    managers (apt, dnf), tools (curl, wget, GPG), Conda, and Java to use the
    proxy settings passed in the metadata. This ensures the entire build
    process is proxy-aware.
  • install_gpu_driver.sh: Used as the
    --customization-script within the build VM. As detailed in
    Part 2, this script handles the driver/CUDA/ML stack installation and
    signing, now able to function correctly due to the proxy setup by
    gce-proxy-setup.sh.

Layered Image Strategy:

The pre-init.sh script employs a layered approach:

  1. secure-boot Image: Base image with
    Secure Boot certificates injected.
  2. tf Image: Based on
    secure-boot, this image runs the full
    install_gpu_driver.sh within the proxy-configured build VM
    to install NVIDIA drivers, CUDA, ML libraries (TensorFlow, PyTorch,
    RAPIDS), and sign the modules. This is the primary target image for our
    use case.

(Note: secure-proxy and proxy-tf layers
were experiments, but the -tf image combined with runtime
metadata emerged as the most effective solution for 2.2-debian12).

Build Steps:

  1. Clone Repos & Configure
    env.json:
    Ensure you have the
    custom-images and cloud-dataproc repos and a
    complete env.json as described in Part 1.

  2. Run the Build:
    bash # Example: Build a 2.2-debian12 based image set # Run from the custom-images repository root bash examples/secure-boot/build-and-run-podman.sh 2.2-debian12
    This command will build the layered images, leveraging the proxy
    settings from env.json via the metadata injected into the
    build VM. Note the final image name produced (e.g.,
    dataproc-2-2-deb12-YYYYMMDD-HHMMSS-tf).

Conclusion of Part 3

Through an iterative process, we’ve developed a robust workflow
within the custom-images repository to build Secure
Boot-compatible GPU images in a proxy-only environment. The key was
isolating the build in Podman, ensuring the build VM is fully
proxy-aware using gce-proxy-setup.sh, and leveraging the
enhanced install_gpu_driver.sh from Part 2.

In Part 4, we’ll bring it all together, deploying a Dataproc cluster
using this custom -tf image within the secure network, and
verifying the end-to-end functionality.

29 January, 2026 09:08AM by C.J. Collier

hackergotchi for Bits from Debian

Bits from Debian

New Debian Developers and Maintainers (November and December 2025)

The following contributors got their Debian Developer accounts in the last two months:

  • Aquila Macedo (aquila)
  • Peter Blackman (peterb)
  • Kiran S Kunjumon (hacksk)
  • Ben Westover (bjw)

The following contributors were added as Debian Maintainers in the last two months:

  • Vladimir Petko
  • Antonin Delpeuch
  • Nadzeya Hutsko
  • Aryan Karamtoth
  • Carl Keinath
  • Richard Nelson

Congratulations!

29 January, 2026 09:00AM by Jean-Pierre Giraud

January 28, 2026

Antoine Beaupré

Qalculate hacks

This is going to be a controversial statement because some people are absolute nerds about this, but, I need to say it.

Qalculate is the best calculator that has ever been made.

I am not going to try to convince you of this, I just wanted to put out my bias out there before writing down those notes. I am a total fan.

This page will collect my notes of cool hacks I do with Qalculate. Most examples are copy-pasted from the command-line interface (qalc(1)), but I typically use the graphical interface as it's slightly better at displaying complex formulas. Discoverability is obviously also better for the cornucopia of features this fantastic application ships.

Qalc commandline primer

On Debian, Qalculate's CLI interface can be installed with:

apt install qalc

Then you start it with the qalc command, and end up on a prompt:

anarcat@angela:~$ qalc
> 

Then it's a normal calculator:

anarcat@angela:~$ qalc
> 1+1

  1 + 1 = 2

> 1/7

  1 / 7 ≈ 0.1429

> pi

  pi ≈ 3.142

> 

There's a bunch of variables to control display, approximation, and so on:

> set precision 6
> 1/7

  1 / 7 ≈ 0.142857
> set precision 20
> pi

  pi ≈ 3.1415926535897932385

When I need more, I typically browse around the menus. One big issue I have with Qalculate is there are a lot of menus and features. I had to fiddle quite a bit to figure out that set precision command above. I might add more examples here as I find them.

Bandwidth estimates

I often use the data units to estimate bandwidths. For example, here's what 1 megabit per second is over a month ("about 300 GiB"):

> 1 megabit/s * 30 day to gibibyte 

  (1 megabit/second) × (30 days) ≈ 301.7 GiB

Or, "how long will it take to download X", in this case, 1GiB over a 100 mbps link:

> 1GiB/(100 megabit/s)

  (1 gibibyte) / (100 megabits/second) ≈ 1 min + 25.90 s

Password entropy

To calculate how much entropy (in bits) a given password structure, you count the number of possibilities in each entry (say, [a-z] is 26 possibilities, "one word in a 8k dictionary" is 8000), extract the base-2 logarithm, multiplied by the number of entries.

For example, an alphabetic 14-character password is:

> log2(26*2)*14

  log₂(26 × 2) × 14 ≈ 79.81

... 80 bits of entropy. To get the equivalent in a Diceware password with a 8000 word dictionary, you would need:

> log2(8k)*x = 80

  (log₂(8 × 000) × x) = 80 ≈

  x ≈ 6.170

... about 6 words, which gives you:

> log2(8k)*6

  log₂(8 × 1000) × 6 ≈ 77.79

78 bits of entropy.

Exchange rates

You can convert between currencies!

> 1 EUR to USD

  1 EUR ≈ 1.038 USD

Even fake ones!

> 1 BTC to USD

  1 BTC ≈ 96712 USD

This relies on a database pulled form the internet (typically the central european bank rates, see the source). It will prompt you if it's too old:

It has been 256 days since the exchange rates last were updated.
Do you wish to update the exchange rates now? y

As a reader pointed out, you can set the refresh rate for currencies, as some countries will require way more frequent exchange rates.

The graphical version has a little graphical indicator that, when you mouse over, tells you where the rate comes from.

Other conversions

Here are other neat conversions extracted from my history

> teaspoon to ml

  teaspoon = 5 mL

> tablespoon to ml

  tablespoon = 15 mL

> 1 cup to ml 

  1 cup ≈ 236.6 mL

> 6 L/100km to mpg

  (6 liters) / (100 kilometers) ≈ 39.20 mpg

> 100 kph to mph

  100 kph ≈ 62.14 mph

> (108km - 72km) / 110km/h

  ((108 kilometers) − (72 kilometers)) / (110 kilometers/hour) ≈
  19 min + 38.18 s

Completion time estimates

This is a more involved example I often do.

Background

Say you have started a long running copy job and you don't have the luxury of having a pipe you can insert pv(1) into to get a nice progress bar. For example, rsync or cp -R can have that problem (but not tar!).

(Yes, you can use --info=progress2 in rsync, but that estimate is incremental and therefore inaccurate unless you disable the incremental mode with --no-inc-recursive, but then you pay a huge up-front wait cost while the entire directory gets crawled.)

Extracting a process start time

First step is to gather data. Find the process start time. If you were unfortunate enough to forget to run date --iso-8601=seconds before starting, you can get a similar timestamp with stat(1) on the process tree in /proc with:

$ stat /proc/11232
  File: /proc/11232
  Size: 0               Blocks: 0          IO Block: 1024   directory
Device: 0,21    Inode: 57021       Links: 9
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-02-07 15:50:25.287220819 -0500
Modify: 2025-02-07 15:50:25.287220819 -0500
Change: 2025-02-07 15:50:25.287220819 -0500
 Birth: -

So our start time is 2025-02-07 15:50:25, we shave off the nanoseconds there, they're below our precision noise floor.

If you're not dealing with an actual UNIX process, you need to figure out a start time: this can be a SQL query, a network request, whatever, exercise for the reader.

Saving a variable

This is optional, but for the sake of demonstration, let's save this as a variable:

> start="2025-02-07 15:50:25"

  save("2025-02-07T15:50:25"; start; Temporary; ; 1) =
  "2025-02-07T15:50:25"

Estimating data size

Next, estimate your data size. That will vary wildly with the job you're running: this can be anything: number of files, documents being processed, rows to be destroyed in a database, whatever. In this case, rsync tells me how many bytes it has transferred so far:

# rsync -ASHaXx --info=progress2 /srv/ /srv-zfs/
2.968.252.503.968  94%    7,63MB/s    6:04:58  xfr#464440, ir-chk=1000/982266) 

Strip off the weird dots in there, because that will confuse qalculate, which will count this as:

  2.968252503968 bytes ≈ 2.968 B

Or, essentially, three bytes. We actually transferred almost 3TB here:

  2968252503968 bytes ≈ 2.968 TB

So let's use that. If you had the misfortune of making rsync silent, but were lucky enough to transfer entire partitions, you can use df (without -h! we want to be more precise here), in my case:

Filesystem              1K-blocks       Used  Available Use% Mounted on
/dev/mapper/vg_hdd-srv 7512681384 7258298036  179205040  98% /srv
tank/srv               7667173248 2870444032 4796729216  38% /srv-zfs

(Otherwise, of course, you use du -sh $DIRECTORY.)

Digression over bytes

Those are 1 K bytes which is actually (and rather unfortunately) Ki, or "kibibytes" (1024 bytes), not "kilobytes" (1000 bytes). Ugh.

> 2870444032 KiB

  2870444032 kibibytes ≈ 2.939 TB
> 2870444032 kB

  2870444032 kilobytes ≈ 2.870 TB

At this scale, those details matter quite a bit, we're talking about a 69GB (64GiB) difference here:

> 2870444032 KiB - 2870444032 kB

  (2870444032 kibibytes) − (2870444032 kilobytes) ≈ 68.89 GB

Anyways. Let's take 2968252503968 bytes as our current progress.

Our entire dataset is 7258298064 KiB, as seen above.

Solving a cross-multiplication

We have 3 out of four variables for our equation here, so we can already solve:

> (now-start)/x = (2996538438607 bytes)/(7258298064 KiB) to h

  ((actual − start) / x) = ((2996538438607 bytes) / (7258298064
  kibibytes))

  x ≈ 59.24 h

The entire transfer will take about 60 hours to complete! Note that's not the time left, that is the total time.

To break this down step by step, we could calculate how long it has taken so far:

> now-start

  now − start ≈ 23 h + 53 min + 6.762 s

> now-start to s

  now − start ≈ 85987 s

... and do the cross-multiplication manually, it's basically:

x/(now-start) = (total/current)

so:

x = (total/current) * (now-start)

or, in Qalc:

> ((7258298064  kibibytes) / ( 2996538438607 bytes) ) *  85987 s

  ((7258298064 kibibytes) / (2996538438607 bytes)) × (85987 secondes) ≈
  2 d + 11 h + 14 min + 38.81 s

It's interesting it gives us different units here! Not sure why.

Now and built-in variables

The now here is actually a built-in variable:

> now

  now ≈ "2025-02-08T22:25:25"

There is a bewildering list of such variables, for example:

> uptime

  uptime = 5 d + 6 h + 34 min + 12.11 s

> golden

  golden ≈ 1.618

> exact

  golden = (√(5) + 1) / 2

Computing dates

In any case, yay! We know the transfer is going to take roughly 60 hours total, and we've already spent around 24h of that, so, we have 36h left.

But I did that all in my head, we can ask more of Qalc yet!

Let's make another variable, for that total estimated time:

> total=(now-start)/x = (2996538438607 bytes)/(7258298064 KiB)

  save(((now − start) / x) = ((2996538438607 bytes) / (7258298064
  kibibytes)); total; Temporary; ; 1) ≈
  2 d + 11 h + 14 min + 38.22 s

And we can plug that into another formula with our start time to figure out when we'll be done!

> start+total

  start + total ≈ "2025-02-10T03:28:52"

> start+total-now

  start + total − now ≈ 1 d + 11 h + 34 min + 48.52 s

> start+total-now to h

  start + total − now ≈ 35 h + 34 min + 32.01 s

That transfer has ~1d left, or 35h24m32s, and should complete around 4 in the morning on February 10th.

But that's icing on top. I typically only do the cross-multiplication and calculate the remaining time in my head.

I mostly did the last bit to show Qalculate could compute dates and time differences, as long as you use ISO timestamps. Although it can also convert to and from UNIX timestamps, it cannot parse arbitrary date strings (yet?).

Other functionality

Qalculate can:

  • Plot graphs;
  • Use RPN input;
  • Do all sorts of algebraic, calculus, matrix, statistics, trigonometry functions (and more!);
  • ... and so much more!

I have a hard time finding things it cannot do. When I get there, I typically need to resort to programming code in Python, use a spreadsheet, and others will turn to more complete engines like Maple, Mathematica or R.

But for daily use, Qalculate is just fantastic.

And it's pink! Use it!

Gotchas

There are a couple of things that get me with Qalc, unfortunately.

Decimals precision

I mentioned set precision above:

> set precision 6
> 1/7

  1 / 7 ≈ 0.142857
> set precision 20
> pi

  pi ≈ 3.1415926535897932385

Fractional displays

But sometimes, I want fractional displays (obviously not for π because it is irrational). For example, sometimes I work in inches, and this would look better as a fraction:

> 8973/12

  8973 / 12 = 747.75

The trick here is to change the fraction setting, from the qalc(1) manual:

       fractions, fr (-1* = auto, 0 = off, 1 = exact, 2 = on, 3 = mixed, 4 =
       long, 5 = dual, 1/n)
               Determines how rational numbers are displayed (e.g. 5/4 = 1 + 1/4
               =  1.25).  'long' removes limits on the size of the numerator and
               denonimator.

Normally, this should be set to auto, so if you've changed it, change it back:

set fractions auto

Then you get the nice mixed output:

> 8973/12

  8973 / 12 = 747 + 3/4 = 747.75

The dual setting is also nice:

> set fractions dual

  747.75 = 2991/4 = 747 + 3/4 = 747.75

Strangely, I couldn't figure out how to get the same output in the graphical interface. The closest menu item is Mode > Rational Number Form.

Further reading and installation

This is just scratching the surface, the fine manual has more information, including more examples. There is also of course a qalc(1) manual page which also ships an excellent EXAMPLES section.

Qalculate is packaged for over 30 Linux distributions, but also ships packages for Windows and MacOS. There are third-party derivatives as well including a web version and an Android app.

Updates

Colin Watson liked this blog post and was inspired to write his own hacks, similar to what's here, but with extras, check it out!

28 January, 2026 06:38PM

hackergotchi for C.J. Adams-Collier

C.J. Adams-Collier

Part 2: Taming the Beast – Deep Dive into the Proxy-Aware GPU Initialization Action

Part
2: Taming the Beast – Deep Dive into the Proxy-Aware GPU Initialization
Action

In Part 1 of this series, we laid the network foundation for running
secure Dataproc clusters. Now, let’s zoom in on the core component
responsible for installing and configuring NVIDIA GPU drivers and the
associated ML stack in this restricted environment: the
install_gpu_driver.sh script from the LLC-Technologies-Collier/initialization-actions
repository (branch gpu-202601).

This isn’t just any installation script; it has been significantly
enhanced to handle the nuances of Secure Boot and to operate seamlessly
behind an HTTP/S proxy.

The
Challenge: Installing GPU Drivers Without Direct Internet

Our goal was to create a Dataproc custom image with NVIDIA GPU
drivers, sign the kernel modules for Secure Boot, and ensure the entire
process works seamlessly when the build VM and the eventual cluster
nodes have no direct internet access, relying solely on an HTTP/S proxy.
This involved:

  1. Proxy-Aware Build: Ensuring all build steps within
    the custom image creation process (package downloads, driver downloads,
    GPG keys, etc.) correctly use the customer’s proxy.
  2. Secure Boot Signing: Integrating kernel module
    signing using keys managed in GCP Secret Manager, especially when
    drivers are built from source.
  3. Conda Environment: Reliably and speedily installing
    a complex Conda environment with PyTorch, TensorFlow, Rapids, and other
    GPU-accelerated libraries through the proxy.
  4. Dataproc Integration: Making sure the custom image
    works correctly with Dataproc’s own startup, agent processes, and
    cluster-specific configurations like YARN.

The
Development Journey: Key Enhancements in
install_gpu_driver.sh

To address these challenges, the script incorporates several key
features:

  • Robust Proxy Handling (set_proxy
    function):

    • Challenge: Initial script versions had spotty proxy
      support. Many tools like apt, curl,
      gpg, and even gsutil failed in proxy-only
      environments.
    • Enhancements: The set_proxy function
      (also used in gce-proxy-setup.sh) was completely overhauled
      to parse various proxy metadata (http-proxy,
      https-proxy, proxy-uri,
      no-proxy). Critically, environment variables
      (HTTP_PROXY, HTTPS_PROXY,
      NO_PROXY) are now set before any network
      operations. NO_PROXY is carefully set to include
      .google.com and .googleapis.com to allow
      direct access to Google APIs via Private Google Access. System-wide
      trust stores (OS, Java, Conda) are updated with the proxy’s CA
      certificate if provided via http-proxy-pem-uri.
      gcloud, apt, dnf, and
      dirmngr are also configured to use the proxy.
  • Reliable GPG Key Fetching (import_gpg_keys
    function):

    • Challenge: Importing GPG keys for repositories
      often failed as keyservers use non-HTTP ports (e.g., 11371) blocked by
      firewalls, and gpg --recv-keys is not proxy-friendly.
    • Solution: A new import_gpg_keys
      function now fetches keys over HTTPS using curl, which
      respects the environment’s proxy settings. This replaced all direct
      gpg --recv-keys calls.
  • GCS Caching is King:
    • Challenge: Repeatedly downloading large files
      (drivers, CUDA, source code) through a proxy is slow and
      inefficient.
    • Solution: Implemented extensive GCS caching for
      NVIDIA drivers, CUDA runfiles, NVIDIA Open Kernel Module source
      tarballs, compiled kernel modules, and even packed Conda environments.
      Scripts now check a GCS bucket (dataproc-temp-bucket)
      before hitting the internet.
    • Impact: Dramatically speeds up subsequent runs and
      init action execution times on cluster nodes after the cache is
      warmed.
  • Conda Environment Stability & Speed:
    • Challenge: Large Conda environments are prone to
      solver conflicts and slow installation times.
    • Solution: Integrated Mamba for faster package
      solving. Refined package lists for better compatibility. Added logic to
      force-clean and rebuild the Conda environment cache on GCS and locally
      if inconsistencies are detected (e.g., driver installed but Conda env
      not fully set up).
  • Secure Boot & Kernel Module Signing:
    • Challenge: Custom-compiled kernel modules must be
      signed to load when Secure Boot is enabled.
    • Solution: The script integrates with GCP Secret
      Manager to fetch signing keys. The build_driver_from_github
      function now includes robust steps to compile, sign (using
      sign-file), install, and verify the signed modules.
  • Custom Image Workflow & Deferred Configuration:
    • Challenge: Cluster-specific settings (like YARN GPU
      configuration) should not be baked into the image.
    • Solution: The install_gpu_driver.sh
      script detects when it’s run during image creation
      (--metadata invocation-type=custom-images). In this mode,
      it defers cluster-specific setups to a systemd service
      (dataproc-gpu-config.service) that runs on the first boot
      of a cluster instance. This ensures that YARN and Spark configurations
      are applied in the context of the running cluster, not at image build
      time.

Conclusion of Part 2

The install_gpu_driver.sh initialization action is more
than just an installer; it’s a carefully crafted tool designed to handle
the complexities of secure, proxied environments. Its robust proxy
support, comprehensive GCS caching, refined Conda management, Secure
Boot signing capabilities, and awareness of the custom image build
lifecycle make it a critical enabler.

In Part 3, we’ll explore how the LLC-Technologies-Collier/custom-images
repository (branch proxy-exercise-2025-11) uses this
initialization action to build the complete, ready-to-deploy Secure Boot
GPU custom images.

28 January, 2026 10:45AM by C.J. Collier

Dataproc GPUs, Secure Boot, & Proxies

Part
1: Building a Secure Network Foundation for Dataproc with GPUs &
SWP

Welcome to the first post in our series on running GPU-accelerated
Dataproc workloads in secure, enterprise-grade environments. Many
organizations need to operate within VPCs that have no direct internet
egress, instead routing all traffic through a Secure Web Proxy (SWP).
Additionally, security mandates often require the use of Shielded VMs
with Secure Boot enabled. This series will show you how to meet these
requirements for your Dataproc GPU clusters.

In this post, we’ll focus on laying the network foundation using
tools from the LLC-Technologies-Collier/cloud-dataproc
repository (branch proxy-sync-2026-01).

The Challenge: Network
Isolation & Control

Before we can even think about custom images or GPU drivers, we need
a network environment that:

  1. Prevents direct internet access from Dataproc cluster nodes.
  2. Forces all egress traffic through a manageable and auditable
    SWP.
  3. Provides the necessary connectivity for Dataproc to function and for
    us to build images later.
  4. Supports Secure Boot for all VMs.

The Toolkit:
LLC-Technologies-Collier/cloud-dataproc

To make setting up and tearing down these complex network
environments repeatable and consistent, we’ve developed a set of bash
scripts within the gcloud directory of the
cloud-dataproc repository. These scripts handle the
creation of VPCs, subnets, firewall rules, service accounts, and the
Secure Web Proxy itself.

Key Script:
gcloud/bin/create-dpgce-private

This script is the cornerstone for creating the private, proxied
environment. It automates:

  • VPC and Subnet creation (for the cluster, SWP, and management).
  • Setup of Certificate Authority Service and Certificate Manager for
    SWP TLS interception.
  • Deployment of the SWP Gateway instance.
  • Configuration of a Gateway Security Policy to control egress.
  • Creation of necessary firewall rules.
  • Result: Cluster nodes in this VPC have NO default
    internet route and MUST use the SWP.

Configuration via env.json

We use a single env.json file to drive the
configuration. This file will also be used by the
custom-images scripts in Part 3. This env.json
should reside in your custom-images repository clone, and
you’ll symlink it into the cloud-dataproc/gcloud
directory.

Running the Setup:

# Assuming you have cloud-dataproc and custom-images cloned side-by-side
# And your env.json is in the custom-images root
cd cloud-dataproc/gcloud
# Symlink to the env.json in custom-images
ln -sf ../../custom-images/env.json env.json
# Run the creation script, but don't create a cluster yet
bash bin/create-dpgce-private --no-create-cluster
cd ../../custom-images

Node
Configuration: The Metadata Startup Script for Runtime

For the Dataproc cluster nodes to function correctly in this proxied
environment, they need to be configured to use the SWP on boot. We
achieve this using a GCE metadata startup script.

The script startup_script/gce-proxy-setup.sh (from the
custom-images repository) is designed to be run on each
cluster node at boot. It reads metadata like http-proxy and
http-proxy-pem-uri (which our cluster creation scripts in
Part 4 will pass) to configure the OS environment, package managers, and
other tools to use the SWP.

Upload this script to your GCS bucket:

# Run from the custom-images repository root
gsutil cp startup_script/gce-proxy-setup.sh gs://$(jq -r .BUCKET env.json)/custom-image-deps/

This script is essential for the runtime behavior of the
cluster nodes.

Conclusion of Part 1

With the cloud-dataproc scripts, we’ve laid the
groundwork by provisioning a secure VPC with controlled egress through
an SWP. We’ve also prepared the essential node-level proxy configuration
script (gce-proxy-setup.sh) in GCS, ready to be used by our
clusters.

Stay tuned for Part 2, where we’ll dive into the
install_gpu_driver.sh initialization action from the
LLC-Technologies-Collier/initialization-actions repository
(branch gpu-202601) and how it’s been adapted to install
all GPU-related software through the proxy during the image build
process.

28 January, 2026 10:37AM by C.J. Collier

Sven Hoexter

Decrypt TLS Connection with wireshark and curl

With TLS 1.3 more parts of the handshake got encrypted (e.g. the certificate), but sometimes it's still helpful to look at the complete handshake.

curl uses the somewhat standardized env variable for the key log file called SSLKEYLOGFILE, which is also supported by Firefox and Chrome. wireshark hides the setting in the UI behind Edit -> Preferences -> Protocols -> TLS -> (Pre)-Master-Secret log filename which is uncomfortable to reach. Looking up the config setting in the Advanced settings one can learn that it's called internally tls.keylog_file. Thus we can set it up with:

sudo wireshark -o "tls.keylog_file:/home/sven/curl.keylog"

SSLKEYLOGFILE=/home/sven/curl.keylog curl -v https://www.cloudflare.com/cdn-cgi/trace

Depending on the setup root might be unable to access the wayland session, that can be worked around by letting sudo keep the relevant env variables:

$ cat /etc/sudoers.d/wayland 
Defaults   env_keep += "XDG_RUNTIME_DIR"
Defaults   env_keep += "WAYLAND_DISPLAY"

Or setup wireshark properly and use the wireshark group to be able to dump traffic. Might require a sudo dpkg-reconfigure wireshark-common.

Regarding curl: In some situations it could be desirable to force a specific older TLS version for testing, which requires a minimal and maximal version. E.g. to force TLS 1.2 only:

curl -v --tlsv1.2 --tls-max 1.2 https://www.cloudflare.com/cdn-cgi/trace

28 January, 2026 10:32AM

January 27, 2026

Elana Hashman

A beginner's guide to improving your digital security

In 2017, I led a series of workshops aimed at teaching beginners a better understanding of encryption, how the internet works, and their digital security. Nearly a decade later, there is still a great need to share reliable resources and guides on improving these skills.

I have worked professionally in computer security one way or another for well over a decade, at many major technology companies and in many open source software projects. There are many inaccurate and unreliable resources out there on this subject, put together by well-meaning people without a background in security, which can lead to sharing misinformation, exaggeration and fearmongering.

I hope that I can offer you a trusted, curated list of high impact things that you can do right now, using whichever vetted guide you prefer. In addition, I also include how long it should take, why you should do each task, and any limitations.

This guide is aimed at improving your personal security, and does not apply to your work-owned devices. Always assume your company can monitor all of your messages and activities on work devices.

What can I do to improve my security right away?

I put together this list in order of effort, easiest tasks first. You should be able to complete many of the low effort tasks in a single hour. The medium to high effort tasks are very much worth doing, but may take you a few days or even weeks to complete them.

Low effort (<15 minutes)

Upgrade your software to the latest versions

Why? I don't know anyone who hasn't complained about software updates breaking features, introducing bugs, and causing headaches. If it ain't broke, why upgrade, right? Well, alongside all of those annoying bugs and breaking changes, software updates also include security fixes, which will protect your device from being exploited by bad actors. Security issues can be found in software at any time, even software that's been available for many years and thought to be secure. You want to install these as soon as they are available.

Recommendation: Turn on automatic upgrades and always keep your devices as up-to-date as possible. If you have some software you know will not work if you upgrade it, at least be sure to upgrade your laptop and phone operating system (iOS, Android, Windows, etc.) and web browser (Chrome, Safari, Firefox, etc.). Do not use devices that do not receive security support (e.g. old Android or iPhones).

Guides:

Limitations: This will prevent someone from exploiting known security issues on your devices, but it won't help if your device was already compromised. If this is a concern, doing a factory reset, upgrade, and turning on automatic upgrades may help. This also won't protect against all types of attacks, but it is a necessary foundation.

Use Signal

Why? Signal is a trusted, vetted, secure messaging application that allows you to send end-to-end encrypted messages and make video/phone calls. This means that only you and your intended recipient can decrypt the messages and someone cannot intercept and read your messages, in contrast to texting (SMS) and other insecure forms of messaging. Other applications advertise themselves as end-to-end encrypted, but Signal provides the strongest protections.

Recommendation: I recommend installing the Signal app and using it! My mom loves that she can video call me on Wi-Fi on my Android phone. It also supports group chats. I use it as a secure alternative to texting (SMS) and other chat platforms. I also like Signal's "disappearing messages" feature which I enable by default because it automatically deletes messages after a certain period of time. This avoids your messages taking up too much storage.

Guides:

Limitations: Signal is only able to protect your messages in transit. If someone has access to your phone or the phone of the person you sent messages to, they will still be able to read them. As a rule of thumb, if you don't want someone to read something, don't write it down! Meet in person or make an encrypted phone call where you will not be overheard. If you are talking to someone you don't know, assume your messages are as public as posting on social media.

Set passwords and turn on device encryption

Why? Passwords ensure that someone else can't unlock your device without your consent or knowledge. They also are required to turn on device encryption, which protects your information on your device from being accessed when it is locked. Biometric (fingerprint or face ID) locking provides some privacy, but your fingerprint or face ID can be used against your wishes, whereas if you are the only person who knows your password, only you can use it.

Recommendation: Always set passwords and have device encryption enabled in order to protect your personal privacy. It may be convenient to allow kids or family members access to an unlocked device, but anyone else can access it, too! Use strong passwords that cannot be guessed—avoid using names, birthdays, phone numbers, addresses, or other public information. Using a password manager will make creating and managing passwords even easier. Disable biometric unlock, or at least know how to disable it. Most devices will enable disk encryption by default, but you should double-check.

Guides:

Limitations: If your device is unlocked, the password and encryption will provide no protections; the device must be locked for this to protect your privacy. It is possible, though unlikely, for someone to gain remote access to your device (for example through malware or stalkerware), which would bypass these protections. Some forensic tools are also sophisticated enough to work with physical access to a device that is turned on and locked, but not a device that is turned off/freshly powered on and encrypted. If you lose your password or disk encryption key, you may lose access to your device. For this reason, Windows and Apple laptops can make a cloud backup of your disk encryption key. However, a cloud backup can potentially be disclosed to law enforcement.

Install an ad blocker

Why? Online ad networks are often exploited to spread malware to unsuspecting visitors. If you've ever visited a regular website and suddenly seen an urgent, flashing pop-up claiming your device was hacked, it is often due to a bad ad. Blocking ads provides an additional layer of protection against these kinds of attacks.

Recommendation: I recommend everyone uses an ad blocker at all times. Not only are ads annoying and disruptive, but they can even result in your devices being compromised!

Guides:

Limitations: Sometimes the use of ad blockers can break functionality on websites, which can be annoying, but you can temporarily disable them to fix the problem. These may not be able to block all ads or all tracking, but they make browsing the web much more pleasant and lower risk! Some people might also be concerned that blocking ads might impact the revenue of their favourite websites or creators. In this case, I recommend either donating directly or sharing the site with a wider audience, but keep using the ad blocker for your safety.

Enable HTTPS-Only Mode

Why? The "S" in "HTTPS" stands for "secure". This feature, which can be enabled on your web browser, ensures that every time you visit a website, your connection is always end-to-end encrypted (just like when you use Signal!) This ensures that someone can't intercept what you search for, what pages on websites you visit, and any information you or the website share such as your banking details.

Recommendation: I recommend enabling this for everyone, though with improvements in web browser security and adoption of HTTPS over the years, your devices will often do this by default! There is a small risk you will encounter some websites that do not support HTTPS, usually older sites.

Guides:

Limitations: HTTPS protects the information on your connection to a website. It does not hide or protect the fact that you visited that website, only the information you accessed. If the website is malicious, HTTPS does not provide any protection. In certain settings, like when you use a work-managed computer that was set up for you, it can still be possible for your IT Department to see what you are browsing, even over an HTTPS connection, because they have administrator access to your computer and the network.

Medium to high effort (1+ hours)

These tasks require more effort but are worth the investment.

Set up a password manager

Why? It is not possible for a person to remember a unique password for every single website and app that they use. I have, as of writing, 556 passwords stored in my password manager. Password managers do three important things very well:

  1. They generate secure passwords with ease. You don't need to worry about getting your digits and special characters just right; the app will do it for you, and generate long, secure passwords.
  2. They remember all your passwords for you, and you just need to remember one password to access all of them. The most common reason people's accounts get hacked online is because they used the same password across multiple websites, and one of the websites had all their passwords leaked. When you use a unique password on every website, it doesn't matter if your password gets leaked!
  3. They autofill passwords based on the website you're visiting. This is important because it helps prevent you from getting phished. If you're tricked into visiting an evil lookalike site, your password manager will refuse to fill the password.

Recommendation: These benefits are extremely important, and setting up a password manager is often one of the most impactful things you can do for your digital security. However, they take time to get used to, and migrating all of your passwords into the app (and immediately changing them!) can take a few minutes at a time... over weeks. I recommend you prioritize the most important sites, such as your email accounts, banking/financial sites, and cellphone provider. This process will feel like a lot of work, but you will get to enjoy the benefits of never having to remember new passwords and the autofill functionality for websites. My recommended password manager is 1Password, but it stores passwords in the cloud and costs money. There are some good free options as well if cost is a concern. You can also use web browser- or OS-based password managers, but I do not prefer these.

Guides:

Limitations: Many people are concerned about the risk of using a password manager causing all of their passwords to be compromised. For this reason, it's very important to use a vetted, reputable password manager that has passed audits, such as 1Password or Bitwarden. It is also extremely important to choose a strong password to unlock your password manager. 1Password makes this easier by generating a secret to strengthen your unlock password, but I recommend using a long, memorable password in any case. Another risk is that if you forget your password manager's password, you will lose access to all your passwords. This is why I recommend 1Password, which has you set up an Emergency Kit to recover access to your account.

Set up two-factor authentication (2FA) for your accounts

Why? If your password is compromised in a website leak or due to a phishing attack, two-factor authentication will require a second piece of information to log in and potentially thwart the intruder. This provides you with an extra layer of security on your accounts.

Recommendation: You don't necessarily need to enable 2FA on every account, but prioritize enabling it on your most important accounts (email, banking, cellphone, etc.) There are typically a few different kinds: email-based (which is why your email account's security is so important), text message or SMS-based (which is why your cell phone account's security is so important), app-based, and hardware token-based. Email and text message 2FA are fine for most accounts. You may want to enable app- or hardware token-based 2FA for your most sensitive accounts.

Guides:

Limitations: The major limitation is that if you lose access to 2FA, you can be locked out of an account. This can happen if you're travelling abroad and can't access your usual cellphone number, if you break your phone and you don't have a backup of your authenticator app, or if you lose your hardware-based token. For this reason, many websites will provide you with "backup tokens"—you can print them out and store them in a secure location or use your password manager. I also recommend if you use an app, you choose one that will allow you to make secure backups, such as Ente. You are also limited by the types of 2FA a website supports; many don't support app- or hardware token-based 2FA.

Remove your information from data brokers

Why? This is a problem that mostly affects people in the US. It surprises many people that information from their credit reports and other public records is scraped and available (for free or at a low cost) online through "data broker" websites. I have shocked friends who didn't believe this was an issue by searching for their full names and within 5 minutes being able to show them their birthday, home address, and phone number. This is a serious privacy problem!

Recommendation: Opt out of any and all data broker websites to remove this information from the internet. This is especially important if you are at risk of being stalked or harassed.

Guides:

Limitations: It can take time for your information to be removed once you opt out, and unfortunately search engines may have cached your information for a while longer. This is also not a one-and-done process. New data brokers are constantly popping up and some may not properly honour your opt out, so you will need to check on a regular basis (perhaps once or twice a year) to make sure your data has been properly scrubbed. This also cannot prevent someone from directly searching public records to find your information, but that requires much more effort.

"Recommended security measures" I think beginners should avoid

We've covered a lot of tasks you should do, but I also think it's important to cover what not to do. I see many of these tools recommended to security beginners, and I think that's a mistake. For each tool, I will explain my reasoning around why I don't think you should use it, and the scenarios in which it might make sense to use.

"Secure email"

What is it? Many email providers, such as Proton Mail, advertise themselves as providing secure email. They are often recommended as a "more secure" alternative to typical email providers such as GMail.

What's the problem? Email is fundamentally insecure by design. The email specification (RFC-3207) states that any publicly available email server MUST NOT require the use of end-to-end encryption in transit. Email providers can of course provide additional security by encrypting their copies of your email, and providing you access to your email by HTTPS, but the messages themselves can always be sent without encryption. Some platforms such as Proton Mail advertise end-to-end encrypted emails so long as you email another Proton user. This is not truly email, but their own internal encrypted messaging platform that follows the email format.

What should I do instead? Use Signal to send encrypted messages. NEVER assume the contents of an email are secure.

Who should use it? I don't believe there are any major advantages to using a service such as this one. Even if you pay for a more "secure" email provider, the majority of your emails will still be delivered to people who don't. Additionally, while I don't use or necessarily recommend their service, Google offers an Advanced Protection Program for people who may be targeted by state-level actors.

PGP/GPG Encryption

What is it? PGP ("Pretty Good Privacy") and GPG ("GNU Privacy Guard") are encryption and cryptographic signing software. They are often recommended to encrypt messages or email.

What's the problem? GPG is decades old and its usability has always been terrible. It is extremely easy to accidentally send a message that you thought was encrypted without encryption! The problems with PGP/GPG have been extensively documented.

What should I do instead? Use Signal to send encrypted messages. Again, NEVER use email for sensitive information.

Who should use it? Software developers who contribute to projects where there is a requirement to use GPG should continue to use it until an adequate alternative is available. Everyone else should live their lives in PGP-free bliss.

Installing a "secure" operating system (OS) on your phone

What is it? There are a number of self-installed operating systems for Android phones, such as GrapheneOS, that advertise as being "more secure" than using the version of the Android operating system provided by your phone manufacturer. They often remove core Google APIs and services to allow you to "de-Google" your phone.

What's the problem? These projects are relatively niche, and don't have nearly enough resourcing to be able to respond to the high levels of security pressure Android experiences (such as against the forensic tools I mentioned earlier). You may suddenly lose security support with no notice, as with CalyxOS. You need a high level of technical know-how and a lot of spare time to maintain your device with a custom operating system, which is not a reasonable expectation for the average person. By stripping all Google APIs such as Google Play Services, some useful apps can no longer function. And some law enforcement organizations have gone as far as accusing people who install GrapheneOS on Pixel phones to be engaging in criminal activity.

What should I do instead? For the best security on an Android device, use a phone manufactured by Google or Samsung (smaller manufacturers are more unreliable), or consider buying an iPhone. Make sure your device is receiving security updates and up-to-date.

Who should use it? These projects are great for tech enthusiasts who are interested in contributing to and developing them further. They can be used to give new life to old phones that are not receiving security or software updates. They are also great for people with an interest in free and open source software and digital autonomy. But these tools are not a good choice for a general audience, nor do they provide more practical security than using an up-to-date Google or Samsung Android phone.

Virtual Private Network (VPN) Services

What is it? A virtual private network or VPN service can provide you with a secure tunnel from your device to the location that the VPN operates. This means that if I am using my phone in Seattle connected to a VPN in Amsterdam, if I access a website, it appears to the website that my phone is located in Amsterdam.

What's the problem? VPN services are frequently advertised as providing security or protection from nefarious bad actors, or helping protect your privacy. These benefits are often far overstated, and there are predatory VPN providers that can actually be harmful. It costs money and resources to provide a VPN, so free VPN services are especially suspect. When you use a VPN, the VPN provider knows the websites you are visiting in order to provide you with the service. Free VPN providers may sell this data in order to cover the cost of providing the service, leaving you with less security and privacy. The average person does not have the knowledge to be able to determine if a VPN service is trustworthy or not. VPNs also don't provide any additional encryption benefits if you are already using HTTPS. They may provide a small amount of privacy benefit if you are connected to an untrusted network with an attacker.

What should I do instead? Always use HTTPS to access websites. Don't connect to untrusted internet providers—for example, use cellphone network data instead of a sketchy Wi-Fi access point. Your local neighbourhood coffee shop is probably fine.

Who should use it? There are three main use cases for VPNs. The first is to bypass geographic restrictions. A VPN will cause all of your web traffic to appear to be coming from another location. If you live in an area that has local internet censorship policies, you can use a VPN to access the internet from a location that lacks such policies. The second is if you know your internet service provider is actively hostile or malicious. A trusted VPN will protect the visibility of all your traffic, including which websites you visit, from your internet service provider, and the only thing they will be able to see is that you are accessing a VPN. The third use case is to access a network that isn't connected to the public internet, such as a corporate intranet. I strongly discourage the use of VPNs for "general-purpose security."

Tor

What is it? Tor, "The Onion Router", is a free and open source software project that provides anonymous networking. Unlike with a VPN, where the VPN provider knows who you are and what websites you are requesting, Tor's architecture makes it extremely difficult to determine who sent a request.

What's the problem? Tor is difficult to set up properly; similar to PGP-encrypted email, it is possible to accidentally not be connected to Tor and not know the difference. This usability has improved over the years, but Tor is still not a good tool for beginners to use. Due to the way Tor works, it is also extremely slow. If you have used cable or fiber internet, get ready to go back to dialup speeds. Tor also doesn't provide perfect privacy and without a strong understanding of its limitations, it can be possible to deanonymize someone despite using it. Additionally, many websites are able to detect connections from the Tor network and block them.

What should I do instead? If you want to use Tor to bypass censorship, it is often better to use a trusted VPN provider, particularly if you need high bandwidth (e.g. for streaming). If you want to use Tor to access a website anonymously, Tor itself might not be enough to protect you. For example, if you need to provide an email address or personal information, you can decline to provide accurate information and use a masked email address. A friend of mine once used the alias "Nunya Biznes" 🥸

Who should use it? Tor should only be used by people who are experienced users of security tools and understand its strengths and limitations. Tor also is best used on a purpose-built system, such as Tor Browser or Freedom of the Press Foundation's SecureDrop.

I want to learn more!

I hope you've found this guide to be a useful starting point. I always welcome folks reaching out to me with questions, though I might take a little bit of time to respond. You can always email me.

If there's enough interest, I might cover the following topics in a future post:

  • Threat modelling, which you can get started with by reading the EFF's or VCW's guides
  • Browser addons for privacy, which Consumer Reports has a tip for
  • Secure DNS, which you can read more about here

Stay safe out there! 🔒

27 January, 2026 02:00AM by Elana Hashman

January 24, 2026

hackergotchi for Gunnar Wolf

Gunnar Wolf

Finally some light for those who care about Debian on the Raspberry Pi

Finally, some light at the end of the tunnel!

As I have said in this blog and elsewhere, after putting quite a bit of work into generating the Debian Raspberry Pi images between late 2018 and 2023, I had to recognize I don’t have the time and energy to properly care for it.

I even registered a GSoC project for it. I mentored Kurva Prashanth, who did good work on the vmdb2 scripts we use for the image generation — but in the end, was unable to push them to be built in Debian infrastructure. Maybe a different approach was needed! While I adopted the images as they were conceived by Michael Stapelberg, sometimes it’s easier to start from scratch and build a fresh approach.

So, I’m not yet pointing at a stable, proven release, but to a good promise. And I hope I’m not being pushy by making this public: in the #debian-raspberrypi channel, waldi has shared the images he has created with the Debian Cloud Team’s infrastructure.

So, right now, the images built so far support Raspberry Pi families 4 and 5 (notably, not the 500 computer I have, due to a missing Device Tree, but I’ll try to help figure that bit out… Anyway, p400/500/500+ systems are not that usual). Work is underway to get the 3B+ to boot (some hackery is needed, as it only understands MBR partition schemes, so creating a hybrid image seems to be needed).

Debian Cloud images for Raspberries

Sadly, I don’t think the effort will be extended to cover older, 32-bit-only systems (RPi 0, 1 and 2).

Anyway, as this effort stabilizes, I will phase out my (stale!) work on raspi.debian.net, and will redirect it to point at the new images.

Comments

Andrea Pappacoda tachi@d.o 2026-01-26 17:39:14 GMT+1

Are there any particular caveats compared to using the regular Raspberry Pi OS?

Are they documented anywhere?

Gunnar Wolf gwolf.blog@gwolf.org 2026-01-26 11:02:29 GMT-6

Well, the Raspberry Pi OS includes quite a bit of software that’s not packaged in Debian for various reasons — some of it because it’s non-free demo-ware, some of it because it’s RPiOS-specific configuration, some of it… I don’t care, I like running Debian wherever possible 😉

Andrea Pappacoda tachi@d.o 2026-01-26 18:20:24 GMT+1

Thanks for the reply! Yeah, sorry, I should’ve been more specific. I also just care about the Debian part. But: are there any hardware issues or unsupported stuff, like booting from an SSD (which I’m currently doing)?

Gunnar Wolf gwolf.blog@gwolf.org 2026-01-26 12:16:29 GMT-6

That’s… beyond my knowledge 😉 Although I can tell you that:

  • Raspberry Pi OS has hardware support as soon as their new boards hit the market. The ability to even boot a board can take over a year for the mainline Linux kernel (at least, it has, both in the cases of the 4 and the 5 families).

  • Also, sometimes some bits of hardware are not discovered by the Linux kernels even if the general family boots because they are not declared in the right place of the Device Tree (i.e. the wireless network interface in the 02W is in a different address than in the 3B+, or the 500 does not fully boot while the 5B now does). Usually it is a matter of “justâ€� declaring stuff in the right place, but it’s not a skill many of us have.

  • Also, many RPi “hatsâ€� ship with their own Device Tree overlays, and they cannot always be loaded on top of mainline kernels.

Andrea Pappacoda tachi@d.o 2026-01-26 19:31:55 GMT+1

That’s… beyond my knowledge 😉 Although I can tell you that:

Raspberry Pi OS has hardware support as soon as their new boards hit the market. The ability to even boot a board can take over a year for the mainline Linux kernel (at least, it has, both in the cases of the 4 and the 5 families).

Yeah, unfortunately I’m aware of that… I’ve also been trying to boot OpenBSD on my rpi5 out of curiosity, but been blocked by my somewhat unusual setup involving an NVMe SSD as the boot drive :/

Also, sometimes some bits of hardware are not discovered by the Linux kernels even if the general family boots because they are not declared in the right place of the Device Tree (i.e. the wireless network interface in the 02W is in a different address than in the 3B+, or the 500 does not fully boot while the 5B now does). Usually it is a matter of “just� declaring stuff in the right place, but it’s not a skill many of us have.

At some point in my life I had started reading a bit about device trees and stuff, but got distracted by other stuff before I could develop any familiarity with it. So I don’t have the skills either :)

Also, many RPi “hats� ship with their own Device Tree overlays, and they cannot always be loaded on top of mainline kernels.

I’m definitely not happy to hear this!

Guess I’ll have to try, and maybe report back once some page for these new builds materializes.

24 January, 2026 04:24PM

January 23, 2026

Nazi.Compare

Judgment: French army vanquishes German FSFE on Hitler's birthday, Microsoft contract dispute (1716711)

(1716711, 1804171) Free Software Foundation Europe e.V. vs. Le ministère des armées, Tribunal Administratif de Melun.

Hitler's birthday, 20 April, was a special occasion every year under the fascist dictatorship, on par with the King's Birthday in the Commonwealth. Incidentally, the British monarchy are really Germans.

We previously considered the fact that Debianists elected a German leader on Hitler's birthday and then they did exactly the same thing again the following year.

In 2017, the German FSFE misfits began a legal case against the French military. FSFE misfits disputed the French military's decision to sign a contract with Microsoft.

The case was eventually resolved in 2021 with a judgment against the Germans / FSFE misfits. For those who failed to notice at the time, the tribunal handed down the judgment on Hitler's birthday:

FSFE, judgment, France, military, microsoft

 

FSFE, judgment, France, military, microsoft

 

While the full name of the FSFE is Free Software Foundation Europe, internal statistics and mailing list traffic levels demonstrate that the majority of these people are Germans and they do not represent Europe at large, they only represent themselves.

Throughout the legal procedure and trial, an employee of the French government, Dr Amandine Jambert from CNIL was part of the internal mailing lists used for discussions at the FSFE. Was this a conflict of interest when the FSFE was taking legal action against her own employer?

In the email below, Jambert does not state her full name, she only uses the sockpuppet identity "Cryptie" and she does not have any email signature disclosing her position within an agency of the French state.

The FSFE misfits claim to be a peak body for Free Software activism in Europe. They claim to comply with Transparency International guidelines. As the saying goes, do as we say, not as we do.

In 2023, when Jambert resigned from the FSFE misfits, she finally admitted she has a conflict of interest, implicitly admitting that she lied about harassment to avoid facing a basic ethical question.

Subject: 	Re: Fwd: [April - Atelier] Open Bar : Une action en justice dans les tuyaux et DSI des Armées pro-microsoft
Date: 	Wed, 30 Aug 2017 13:15:58 +0200
From: 	Cryptie <cryptie@fsfe.org>
To: 	team@lists.fsfe.org, Hugo Roy <hugo@hugoroy.eu>, france@lists.fsfe.org



Hi all,

I think it would be a good idea to tell them what we did/ plan to do as, in France, they are those that could help us if we need.

Best,
Cryptie

Le 30 août 2017 13:12:00 GMT+02:00, Hugo Roy <hugo@hugoroy.eu> a écrit :



    Hi all,

    Important info: the company (Nexedi) will be represented by a lawyer
    that I know well, because he sucks.

    I think this makes it paramount that FSFE also sues, to ensure that
    things are done right and limit any damage that this guy may do.

    Matze: maybe you remember, I talked about this guy and how he miserably
    failed a lawsuit (on a similar topic against Microsoft) last year.


    On 2017-08-30 12:43, Hugo Roy wrote:

        Hi there,

        You will find attached an April internal mailing list relating
        to the
        Microsoft - French Ministry of Defense contract, about which
        FSFE sent
        a request for information (in the potentiality of a lawsuit).

        A company (Nexedi) is, apparently, going to sue over this too.

        Thinking about replying to April that FSFE is also thinking about a
        lawsuit and to keep April updated (and that we are happy to work
        with
        them, as usual). Thoughts?

        Best,
        Hugo

    ------------------------------------------------------------------------

    Team mailing list
    Team@lists.fsfe.org
    https://lists.fsfe.org/mailman/listinfo/team


-- 
Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.

Remember, in May 2018, the FSFE staff had an Extraordinary General Meeting without the developers present and they voted to remove the elections from the constitution of the association. They have not had a full election since 2017.

In July 2018, a few weeks after removing elections from the constitution, the German FSFE misfits went to RMLL in Strasbourg and they recorded a video of Jambert talking about democracy in French.

Amandine Jambert, cryptie, FSFE, balloons, RMLL, EDPB, CNIL, Strasbourg

 

Amandine Jambert, cryptie, FSFE, balloons, RMLL, EDPB, CNIL, Strasbourg

 

Read more about Nazi comparisons.

23 January, 2026 06:00PM

January 22, 2026

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

Rewriting Git merge history, part 2

In part 1, we discovered the problem of rewriting git history in the presence of nontrivial merges. Today, we'll discuss the workaround I chose.

As I previously mentioned, and as Julia Evans' excellent data model document explains, a git commit is just a snapshot of a tree (suitably deduplicated by means of content hashes), a commit message and a (possibly empty) set of parents. So fundamentally, we don't really need to mess with diffs; if we can make the changes we want directly to the tree (well, technically, make a new tree that looks like what we want, and a new commit using that tree), we're good. (Diffs in git are, generally, just git diff looking at two trees and trying to make sense of it. This has the unfortunate result that there is no solid way of representing a rename; there are heuristics, but if you rename a file and change it in the same commit, they may fail and stuff like git blame or git log may be broken, depending on flags. Gerrit doesn't even seem to understand a no-change copy.)

In earlier related cases, I've taken this to the extreme by simply hand-writing a commit using git commit-tree. Create exactly the state that you want by whatever means, commit it in some dummy commit and then use that commit's tree with some suitable commit message and parent(s); voila. But it doesn't help us with history; while we can fix up an older commit in exactly the way we'd like, we also need the latter commits to have our new fixed-up commit as parent.

Thus, enter git filter-branch. git filter-branch comes with a suitable set of warnings about eating your repository and being deprecated (I never really figured out its supposed replacement git filter-repo, so I won't talk much about it), but it's useful when all else fails.

In particular, git filter-branch allows you to do arbitrary changes to the tree of a series of commits, updating the parent commit IDs as rewrites happen. So if you can express your desired changes in a way that's better than “run the editor” (or if you're happy running the editor and making the same edit manually 300 times!), you can just run that command over all commits in the entire branch (forgive me for breaking lines a bit):

git filter-branch -f --tree-filter \
  '! [ -f src/cluster.cpp ] || sed -i "s/if (mi.rank != 0)/if (mi.rank != 0
    \&\& mi.rank == rank())/" src/cluster.cpp' \
  665155410753978998c8080c813da660fc64bbfe^..cluster-master

This is suitably terrible. Remember, if we only did this for one commit, the change wouldn't be there in the next one (git diff would show that it was immediately reverted), so filter-branch needs to do this over and over again, once for each commit (tree) in the branch. And I wanted multiple fixups, so I had a bunch of these; some of them were as simple as “copy this file from /tmp” and some were shell scripts that did things like running clang-format.

You can do similar things for commit messages; at some point, I figured I should write “cluster” (the official name for the branch) and not “cluster-master” (my local name) in the merge messages, so I could just do

git filter-branch \
  --commit-msg-filter 'sed s/cluster-master/cluster/g' \
  665155410753978998c8080c813da660fc64bbfe^..cluster-master

I also did a bunch of them to fix up my email address (GIT_COMMITTER_EMAIL wasn't properly set), although I cannot honestly remember whether I used --env-filter or something else. Perhaps that was actually with git rebase and `-r --exec 'git commit --amend --no-edit --author …'` or similar. There are many ways to do ugly things. :-)

Eventually, I had the branch mostly in a state where I thought it would be ready for review, but after uploading to GitHub, one reviewer commented that some of my merges against master were commits that didn't exist in master. Huh? That's… surprising.

It took a fair bit of digging to figure out what had happened: git filter-branch had rewritten some commits that it didn't actually have to; the merge sources from upstream. This is normally harmless, since git hashes are deterministic, but these commits were signed by the author! And filter-branch (or perhaps fast-export, upon which it builds?) generally assumes that it can't sign stuff with other people's keys, so it just strips the signatures, deeming that better than having invalid ones sitting around. Now, of course, these commit signatures would still be valid since we didn't change anything, but evidently, filter-branch doesn't have any special code for that.

Removing an object like this (a “gpgsig” attribute, it seems) changes the commit hash, which is where the phantom commits came from. I couldn't get filter-branch to turn it off… but again, parents can be freely changed, diffs don't exist anyway. So I wrote a little script that took in parameters suitable for git commit-tree (mostly the parent list), rewrote known-bad parents to known-good parents, gave the script to git filter-branch --commit-filter, and that solved the problem. (I guess --parent-filter would also have worked; I don't think I saw it in the man page at the time.)

So, well, I won't claim this is an exercise in elegancy. (Perhaps my next adventure will be figuring out how this works in jj, which supposedly has conflicts as more of a first-class concept.) But it got the job done in a couple of hours after fighting with rebase for a long time, the PR was reviewed, and now the Stockfish cluster branch is a little bit more alive.

22 January, 2026 07:45AM

January 21, 2026

hackergotchi for Evgeni Golov

Evgeni Golov

Validating cloud-init configs without being root

Somehow this whole DevOps thing is all about generating the wildest things from some (usually equally wild) template.

And today we're gonna generate YAML from ERB, what could possibly go wrong?!

Well, actually, quite a lot, so one wants to validate the generated result before using it to break systems at scale.

The YAML we generate is a cloud-init cloud-config, and while checking that we generated a valid YAML document is easy (and we were already doing that), it would be much better if we could check that cloud-init can actually use it.

Enter cloud-init schema, or so I thought. Turns out running cloud-init schema is rather broken without root privileges, as it tries to load a ton of information from the running system. This seems like a bug (or multiple), as the data should not be required for the validation of the schema itself. I've not found a way to disable that behavior.

Luckily, I know Python.

Enter evgeni-knows-better-and-can-write-python:

#!/usr/bin/env python3

import sys
from cloudinit.config.schema import get_schema, validate_cloudconfig_file, SchemaValidationError

try:
    valid = validate_cloudconfig_file(config_path=sys.argv[1], schema=get_schema())
    if not valid:
        raise RuntimeError("Schema is not valid")
except (SchemaValidationError, RuntimeError) as e:
    print(e)
    sys.exit(1)

The canonical1 version if this lives in the Foreman git repo, so go there if you think this will ever receive any updates.

The hardest part was to understand thevalidate_cloudconfig_file API, as it will sometimes raise an SchemaValidationError, sometimes a RuntimeError and sometimes just return False. No idea why. But the above just turns it into a couple of printed lines and a non zero exit code, unless of course there are no problems, then you get peaceful silence.

21 January, 2026 07:42PM by evgeni

January 19, 2026

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RApiDatetime 0.0.11 on CRAN: Micro-Maintenance

A new (micro) maintenance release of our RApiDatetime package is now on CRAN, coming only a good week after the 0.0.10 release which itself had a two year gap to its predecessor release.

RApiDatetime provides a number of entry points for C-level functions of the R API for Date and Datetime calculations. The functions asPOSIXlt and asPOSIXct convert between long and compact datetime representation, formatPOSIXlt and Rstrptime convert to and from character strings, and POSIXlt2D and D2POSIXlt convert between Date and POSIXlt datetime. Lastly, asDatePOSIXct converts to a date type. All these functions are rather useful, but were not previously exported by R for C-level use by other packages. Which this package aims to change.

This release adds a single (and ) around one variable as the rchk container and service by Tomas now flagged this. Which is … somewhat peculiar, as this is old code also ‘borrowed’ from R itself but no point arguing so I just added this.

Details of the release follow based on the NEWS file.

Changes in RApiDatetime version 0.0.11 (2026-01-19)

  • Add PROTECT (and UNPROTECT) to appease rchk

Courtesy of my CRANberries, there is also a diffstat report for this release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

19 January, 2026 11:21PM

hackergotchi for Jonathan Dowland

Jonathan Dowland

FOSDEM 2026

I'm going to FOSDEM 2026!

I'm presenting in the Containers dev room. My talk is Java Memory Management in Containers and it's scheduled as the first talk on the first day. I'm the warm-up act!

The Java devroom has been a stalwart at FOSDEM since 2004 (sometimes in other forms), but sadly there's no Java devroom this year. There's a story about that, but it's not mine to tell.

Please recommend to me any interesting talks! Here's a few that caught my eye:

Debian/related:

Containers:

Research:

Other:

19 January, 2026 02:12PM

Craig Small

WordPress 6.9 for Debian

The Debian packages for WordPress version 6.9 were uploaded today. The upstream website says you can add notes and move graphics around, making for an easier and nicer editing experience.

I’m not personally sure about that, generally if things change to much people get annoyed but it seems at least the initial simple stuff has stayed the same.

19 January, 2026 09:57AM by dropbear

Russell Coker

Furilabs FLX1s

The Aim

I have just got a Furilabs FLX1s [1] which is a phone running a modified version of Debian. I want to have a phone that runs all apps that I control and can observe and debug. Android is very good for what it does and there are security focused forks of Android which have a lot of potential, but for my use a Debian phone is what I want.

The FLX1s is not going to be my ideal phone, I am evaluating it for use as a daily-driver until a phone that meets my ideal criteria is built. In this post I aim to provide information to potential users about what it can do, how it does it, and how to get the basic functions working. I also evaluate how well it meets my usage criteria.

I am not anywhere near an average user. I don’t think an average user would ever even see one unless a more technical relative showed one to them. So while this phone could be used by an average user I am not evaluating it on that basis. But of course the features of the GUI that make a phone usable for an average user will allow a developer to rapidly get past the beginning stages and into more complex stuff.

Features

The Furilabs FLX1s [1] is a phone that is designed to run FuriOS which is a slightly modified version of Debian. The purpose of this is to run Debian instead of Android on a phone. It has switches to disable camera, phone communication, and microphone (similar to the Librem 5) but the one to disable phone communication doesn’t turn off Wifi, the only other phone I know of with such switches is the Purism Librem 5.

It has a 720*1600 display which is only slightly better than the 720*1440 display in the Librem 5 and PinePhone Pro. This doesn’t compare well to the OnePlus 6 from early 2018 with 2280*1080 or the Note9 from late 2018 with 2960*1440 – which are both phones that I’ve run Debian on. The current price is $US499 which isn’t that good when compared to the latest Google Pixel series, a Pixel 10 costs $US649 and has a 2424*1080 display and it also has 12G of RAM while the FLX1s only has 8G. Another annoying thing is how rounded the corners are, it seems that round corners that cut off the content are a standard practice nowadays, in my collection of phones the latest one I found with hard right angles on the display was a Huawei Mate 10 Pro which was released in 2017. The corners are rounder than the Note 9, this annoys me because the screen is not high resolution by today’s standards so losing the corners matters.

The default installation is Phosh (the GNOME shell for phones) and it is very well configured. Based on my experience with older phone users I think I could give a phone with this configuration to a relative in the 70+ age range who has minimal computer knowledge and they would be happy with it. Additionally I could set it up to allow ssh login and instead of going through the phone support thing of trying to describe every GUI setting to click on based on a web page describing menus for the version of Android they are running I could just ssh in and run diff on the .config directory to find out what they changed. Furilabs have done a very good job of setting up the default configuration, while Debian developers deserve a lot of credit for packaging the apps the Furilabs people have chosen a good set of default apps to install to get it going and appear to have made some noteworthy changes to some of them.

Droidian

The OS is based on Android drivers (using the same techniques as Droidian [2]) and the storage device has the huge number of partitions you expect from Android as well as a 110G Ext4 filesystem for the main OS.

The first issue with the Droidian approach of using an Android kernel and containers for user space code to deal with drivers is that it doesn’t work that well. There are 3 D state processes (uninterrupteable sleep – which usually means a kernel bug if the process remains in that state) after booting and doing nothing special. My tests running Droidian on the Note 9 also had D state processes, in this case they are D state kernel threads (I can’t remember if the Note 9 had regular processes or kernel threads stuck in D state). It is possible for a system to have full functionality in spite of some kernel threads in D state but generally it’s a symptom of things not working as well as you would hope.

The design of Droidian is inherently fragile. You use a kernel and user space code from Android and then use Debian for the rest. You can’t do everything the Android way (with the full OS updates etc) and you also can’t do everything the Debian way. The TOW Boot functionality in the PinePhone Pro is really handy for recovery [3], it allows the internal storage to be accessed as a USB mass storage device. The full Android setup with ADB has some OK options for recovery, but part Android and part Debian has less options. While it probably is technically possible to do the same things in regard to OS repair and reinstall the fact that it’s different from most other devices means that fixes can’t be done in the same way.

Applications

GUI

The system uses Phosh and Phoc, the GNOME system for handheld devices. It’s a very different UI from Android, I prefer Android but it is usable with Phosh.

IM

Chatty works well for Jabber (XMPP) in my tests. It supports Matrix which I didn’t test because I don’t desire the same program doing Matrix and Jabber and because Matrix is a heavy protocol which establishes new security keys for each login so I don’t want to keep logging in on new applications.

Chatty also does SMS but I couldn’t test that without the SIM caddy.

I use Nheko for Matrix which has worked very well for me on desktops and laptops running Debian.

Email

I am currently using Geary for email. It works reasonably well but is lacking proper management of folders, so I can’t just subscribe to the important email on my phone so that bandwidth isn’t wasted on less important email (there is a GNOME gitlab issue about this – see the Debian Wiki page about Mobile apps [4]).

Music

Music playing isn’t a noteworthy thing for a desktop or laptop, but a good music player is important for phone use. The Lollypop music player generally does everything you expect along with support for all the encoding formats including FLAC0 – a major limitation of most Android music players seems to be lack of support for some of the common encoding formats. Lollypop has it’s controls for pause/play and going forward and backward one track on the lock screen.

Maps

The installed map program is gnome-maps which works reasonably well. It gets directions via the Graphhopper API [5]. One thing we really need is a FOSS replacement for Graphhopper in GNOME Maps.

Delivery and Unboxing

I received my FLX1s on the 13th of Jan [1]. I had paid for it on the 16th of Oct but hadn’t received the email with the confirmation link so the order had been put on hold. But after I contacted support about that on the 5th of Jan they rapidly got it to me which was good. They also gave me a free case and screen protector to apologise, I don’t usually use screen protectors but in this case it might be useful as the edges of the case don’t even extend 0.5mm above the screen. So if it falls face down the case won’t help much.

When I got it there was an open space at the bottom where the caddy for SIMs is supposed to be. So I couldn’t immediately test VoLTE functionality. The contact form on their web site wasn’t working when I tried to report that and the email for support was bouncing.

Bluetooth

As a test of Bluetooth I connected it to my Nissan LEAF which worked well for playing music and I connected it to several Bluetooth headphones. My Thinkpad running Debian/Trixie doesn’t connect to the LEAF and to headphones which have worked on previous laptops running Debian and Ubuntu. A friend’s laptop running Debian/Trixie also wouldn’t connect to the LEAF so I suspect a bug in Trixie, I need to spend more time investigating this.

Wifi

Currently 5GHz wifi doesn’t work, this is a software bug that the Furilabs people are working on. 2.4GHz wifi works fine. I haven’t tested running a hotspot due to being unable to get 4G working as they haven’t yet shipped me the SIM caddy.

Docking

This phone doesn’t support DP Alt-mode or Thunderbolt docking so it can’t drive an external monitor. This is disappointing, Samsung phones and tablets have supported such things since long before USB-C was invented. Samsung DeX is quite handy for Android devices and that type feature is much more useful on a device running Debian than on an Android device.

Camera

The camera works reasonably well on the FLX1s. Until recently for the Librem 5 the camera didn’t work and the camera on my PinePhone Pro currently doesn’t work. Here are samples of the regular camera and the selfie camera on the FLX1s and the Note 9. I think this shows that the camera is pretty decent. The selfie looks better and the front camera is worse for the relatively close photo of a laptop screen – taking photos of computer screens is an important part of my work but I can probably work around that.

I wasn’t assessing this camera t find out if it’s great, just to find out if I have the sorts of problems I had before and it just worked. The Samsung Galaxy Note series of phones has always had decent specs including good cameras. Even though the Note 9 is old comparing to it is a respectable performance. The lighting was poor for all photos.

FLX1s


Note 9


Power Use

In 93 minutes having the PinePhone Pro, Librem 5, and FLX1s online with open ssh sessions from my workstation the PinePhone Pro went from 100% battery to 26%, the Librem 5 went from 95% to 69%, and the FLX1s went from 100% to 99%. The battery discharge rate of them was reported as 3.0W, 2.6W, and 0.39W respectively. Based on having a 16.7Wh battery 93 minutes of use should have been close to 4% battery use, but in any case all measurements make it clear that the FLX1s will have a much longer battery life. Including the measurement of just putting my fingers on the phones and feeling the temperature (FLX1s felt cool and the others felt hot).

The PinePhone Pro and the Librem 5 have an optional “Caffeine mode” which I enabled for this test, without that enabled the phone goes into a sleep state and disconnects from Wifi. So those phones would use much less power with caffeine mode enabled, but they also couldn’t get fast response to notifications etc. I found the option to enable a Caffeine mode switch on the FLX1s but the power use was reported as being the same both with and without it.

Charging

One problem I found with my phone is that in every case it takes 22 seconds to negotiate power. Even when using straight USB charging (no BC or PD) it doesn’t draw any current for 22 seconds. When I connect it it will stay at 5V and varying between 0W and 0.1W (current rounded off to zero) for 22 seconds or so and then start charging. After the 22 second display the phone will make the tick sound indicating that it’s charging and the power meter will measure that it’s drawing some current.

I added the table from my previous post about phone charging speed [6] with an extra row for the FLX1s. For charging from my PC USB ports the results were the worst ever, the port that does BC did not work at all it was looping trying to negotiate after a 22 second negotiation delay the port would turn off. The non-BC port gave only 2.4W which matches the 2.5W given by the spec for a “High-power device” which is what that port is designed to give. In a discussion on the Purism forum about the Librem5 charging speed one of their engineers told me that the reason why their phone would draw 2A from that port was because the cable was identifying itself as a USB-C port not a “High-power device” port. But for some reason out of the 7 phones I tested the FLX1s and the One Plus 6 are the only ones to limit themselves to what the port is apparently supposed to do. Also the One Plus 6 charges slowly on every power supply so I don’t know if it is obeying the spec or just sucking.

On a cheap AliExpress charger the FLX1s gets 5.9V and on a USB battery it gets 5.8V. Out of all 42 combinations of device and charger I tested these were the only ones to involve more than 5.1V but less than 9V. I welcome comments suggesting an explanation.

The case that I received has a hole for the USB-C connector that isn’t wide enough for the plastic surrounds on most of my USB-C cables (including the Dell dock). Also to make a connection requires a fairly deep insertion (deeper than the One Plus 6 or the Note 9). So without adjustment I have to take the case off to charge it. It’s no big deal to adjust the hole (I have done it with other cases) but it’s an annoyance.

Phone Top z640 Bottom Z640 Monitor Ali Charger Dell Dock Battery Best Worst
FLX1s FAIL 5.0V 0.49A 2.4W 4.8V 1.9A 9.0W 5.9V 1.8A 11W 4.8V 2.1A 10W 5.8V 2.1A 12W 5.8V 2.1A 12W 5.0V 0.49A 2.4W
Note9 4.8V 1.0A 5.2W 4.8V 1.6A 7.5W 4.9V 2.0A 9.5W 5.1V 1.9A 9.7W 4.8V 2.1A 10W 5.1V 2.1A 10W 5.1V 2.1A 10W 4.8V 1.0A 5.2W
Pixel 7 pro 4.9V 0.80A 4.2W 4.8V 1.2A 5.9W 9.1V 1.3A 12W 9.1V 1.2A 11W 4.9V 1.8A 8.7W 9.0V 1.3A 12W 9.1V 1.3A 12W 4.9V 0.80A 4.2W
Pixel 8 4.7V 1.2A 5.4W 4.7V 1.5A 7.2W 8.9V 2.1A 19W 9.1V 2.7A 24W 4.8V 2.3A 11.0W 9.1V 2.6A 24W 9.1V 2.7A 24W 4.7V 1.2A 5.4W
PPP 4.7V 1.2A 6.0W 4.8V 1.3A 6.8W 4.9V 1.4A 6.6W 5.0V 1.2A 5.8W 4.9V 1.4A 5.9W 5.1V 1.2A 6.3W 4.8V 1.3A 6.8W 5.0V 1.2A 5.8W
Librem 5 4.4V 1.5A 6.7W 4.6V 2.0A 9.2W 4.8V 2.4A 11.2W 12V 0.48A 5.8W 5.0V 0.56A 2.7W 5.1V 2.0A 10W 4.8V 2.4A 11.2W 5.0V 0.56A 2.7W
OnePlus6 5.0V 0.51A 2.5W 5.0V 0.50A 2.5W 5.0V 0.81A 4.0W 5.0V 0.75A 3.7W 5.0V 0.77A 3.7W 5.0V 0.77A 3.9W 5.0V 0.81A 4.0W 5.0V 0.50A 2.5W
Best 4.4V 1.5A 6.7W 4.6V 2.0A 9.2W 8.9V 2.1A 19W 9.1V 2.7A 24W 4.8V 2.3A 11.0W 9.1V 2.6A 24W

Conclusion

The Furilabs support people are friendly and enthusiastic but my customer experience wasn’t ideal. It was good that they could quickly respond to my missing order status and the missing SIM caddy (which I still haven’t received but believe is in the mail) but it would be better if such things just didn’t happen.

The phone is quite user friendly and could be used by a novice.

I paid $US577 for the FLX1s which is $AU863 by today’s exchange rates. For comparison I could get a refurbished Pixel 9 Pro Fold for $891 from Kogan (the major Australian mail-order company for technology) or a refurbished Pixel 9 Pro XL for $842. The Pixel 9 series has security support until 2031 which is probably longer than you can expect a phone to be used without being broken. So a phone with a much higher resolution screen that’s only one generation behind the latest high end phones and is refurbished will cost less. For a brand new phone a Pixel 8 Pro which has security updates until 2030 costs $874 and a Pixel 9A which has security updates until 2032 costs $861.

Doing what the Furilabs people have done is not a small project. It’s a significant amount of work and the prices of their products need to cover that. I’m not saying that the prices are bad, just that economies of scale and the large quantity of older stock makes the older Google products quite good value for money. The new Pixel phones of the latest models are unreasonably expensive. The Pixel 10 is selling new from Google for $AU1,149 which I consider a ridiculous price that I would not pay given the market for used phones etc. If I had a choice of $1,149 or a “feature phone” I’d pay $1,149. But the FLX1s for $863 is a much better option for me. If all I had to choose from was a new Pixel 10 or a FLX1s for my parents I’d get them the FLX1s.

For a FOSS developer a FLX1s could be a mobile test and development system which could be lent to a relative when their main phone breaks and the replacement is on order. It seems to be fit for use as a commodity phone. Note that I give this review on the assumption that SMS and VoLTE will just work, I haven’t tested them yet.

The UI on the FLX1s is functional and easy enough for a new user while allowing an advanced user to do the things they desire. I prefer the Android style and the Plasma Mobile style is closer to Android than Phosh is, but changing it is something I can do later. Generally I think that the differences between UIs matter more when on a desktop environment that could be used for more complex tasks than on a phone which limits what can be done by the size of the screen.

I am comparing the FLX1s to Android phones on the basis of what technology is available. But most people who would consider buying this phone will compare it to the PinePhone Pro and the Librem 5 as they have similar uses. The FLX1s beats both those phones handily in terms of battery life and of having everything just work. But it has the most non free software of the three and the people who want the $2000 Librem 5 that’s entirely made in the US won’t want the FLX1s.

This isn’t the destination for Debian based phones, but it’s a good step on the way to it and I don’t think I’ll regret this purchase.

19 January, 2026 06:43AM by etbe

Vincent Bernat

RAID 5 with mixed-capacity disks on Linux

Standard RAID solutions waste space when disks have different sizes. Linux software RAID with LVM uses the full capacity of each disk and lets you grow storage by replacing one or two disks at a time.

We start with four disks of equal size:

$ lsblk -Mo NAME,TYPE,SIZE
NAME TYPE  SIZE
vda  disk  101M
vdb  disk  101M
vdc  disk  101M
vdd  disk  101M

We create one partition on each of them:

$ sgdisk --zap-all --new=0:0:0 -t 0:fd00 /dev/vda
$ sgdisk --zap-all --new=0:0:0 -t 0:fd00 /dev/vdb
$ sgdisk --zap-all --new=0:0:0 -t 0:fd00 /dev/vdc
$ sgdisk --zap-all --new=0:0:0 -t 0:fd00 /dev/vdd
$ lsblk -Mo NAME,TYPE,SIZE
NAME   TYPE  SIZE
vda    disk  101M
└─vda1 part  100M
vdb    disk  101M
└─vdb1 part  100M
vdc    disk  101M
└─vdc1 part  100M
vdd    disk  101M
└─vdd1 part  100M

We set up a RAID 5 device by assembling the four partitions:1

$ mdadm --create /dev/md0 --level=raid5 --bitmap=internal --raid-devices=4 \
>   /dev/vda1 /dev/vdb1 /dev/vdc1 /dev/vdd1
$ lsblk -Mo NAME,TYPE,SIZE
    NAME          TYPE    SIZE
    vda           disk    101M
┌┈▶ └─vda1        part    100M
┆   vdb           disk    101M
├┈▶ └─vdb1        part    100M
┆   vdc           disk    101M
├┈▶ └─vdc1        part    100M
┆   vdd           disk    101M
└┬▶ └─vdd1        part    100M
 └┈┈md0           raid5 292.5M
$ cat /proc/mdstat
md0 : active raid5 vdd1[4] vdc1[2] vdb1[1] vda1[0]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

We use LVM to create logical volumes on top of the RAID 5 device.

$ pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created.
$ vgcreate data /dev/md0
  Volume group "data" successfully created
$ lvcreate -L 100m -n bits data
  Logical volume "bits" created.
$ lvcreate -L 100m -n pieces data
  Logical volume "pieces" created.
$ mkfs.ext4 -q /dev/data/bits
$ mkfs.ext4 -q /dev/data/pieces
$ lsblk -Mo NAME,TYPE,SIZE
    NAME          TYPE    SIZE
    vda           disk    101M
┌┈▶ └─vda1        part    100M
┆   vdb           disk    101M
├┈▶ └─vdb1        part    100M
┆   vdc           disk    101M
├┈▶ └─vdc1        part    100M
┆   vdd           disk    101M
└┬▶ └─vdd1        part    100M
 └┈┈md0           raid5 292.5M
    ├─data-bits   lvm     100M
    └─data-pieces lvm     100M
$ vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  data   1   2   0 wz--n- 288.00m 88.00m

This gives us the following setup:

One RAID 5 device built from four partitions from four disks of equal capacity. The RAID device is part of an LVM volume group with two logical volumes.
RAID 5 setup with disks of equal capacity

We replace /dev/vda with a bigger disk. We add it back to the RAID 5 array after copying the partitions from /dev/vdb:

$ cat /proc/mdstat
md0 : active (auto-read-only) raid5 vdb1[1] vdd1[4] vdc1[2]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk
$ sgdisk --replicate=/dev/vda /dev/vdb
$ sgdisk --randomize-guids /dev/vda
$ mdadm --manage /dev/md0 --add /dev/vda1
$ cat /proc/mdstat
md0 : active raid5 vda1[5] vdb1[1] vdd1[4] vdc1[2]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

We do not use the additional capacity: this setup would not survive the loss of /dev/vda because we have no spare capacity. We need a second disk replacement, like /dev/vdb:

$ cat /proc/mdstat
md0 : active (auto-read-only) raid5 vda1[5] vdd1[4] vdc1[2]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [U_UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk
$ sgdisk --replicate=/dev/vdb /dev/vdc
$ sgdisk --randomize-guids /dev/vdb
$ mdadm --manage /dev/md0 --add /dev/vdb1
$ cat /proc/mdstat
md0 : active raid5 vdb1[6] vda1[5] vdd1[4] vdc1[2]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

We create a new RAID 1 array by using the free space on /dev/vda and /dev/vdb:

$ sgdisk --new=0:0:0 -t 0:fd00 /dev/vda
$ sgdisk --new=0:0:0 -t 0:fd00 /dev/vdb
$ mdadm --create /dev/md1 --level=raid1 --bitmap=internal --raid-devices=2 \
>   /dev/vda2 /dev/vdb2
$ cat /proc/mdstat
md1 : active raid1 vdb2[1] vda2[0]
      101312 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active raid5 vdb1[6] vda1[5] vdd1[4] vdc1[2]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

We add /dev/md1 to the volume group:

$ pvcreate /dev/md1
  Physical volume "/dev/md1" successfully created.
$ vgextend data /dev/md1
  Volume group "data" successfully extended
$ vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  data   2   2   0 wz--n- 384.00m 184.00m
$  lsblk -Mo NAME,TYPE,SIZE
       NAME          TYPE    SIZE
       vda           disk    201M
   ┌┈▶ ├─vda1        part    100M
┌┈▶┆   └─vda2        part    100M
┆  ┆   vdb           disk    201M
┆  ├┈▶ ├─vdb1        part    100M
└┬▶┆   └─vdb2        part    100M
 └┈┆┈┈┈md1           raid1  98.9M
   ┆   vdc           disk    101M
   ├┈▶ └─vdc1        part    100M
   ┆   vdd           disk    101M
   └┬▶ └─vdd1        part    100M
    └┈┈md0           raid5 292.5M
       ├─data-bits   lvm     100M
       └─data-pieces lvm     100M

This gives us the following setup:2

One RAID 5 device built from four partitions and one RAID 1 device built from two partitions. The two last disks are smaller. The two RAID devices are part of a single LVM volume group.
Setup mixing both RAID 1 and RAID 5

We extend our capacity further by replacing /dev/vdc:

$ cat /proc/mdstat
md1 : active (auto-read-only) raid1 vda2[0] vdb2[1]
      101312 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active (auto-read-only) raid5 vda1[5] vdd1[4] vdb1[6]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
      bitmap: 0/1 pages [0KB], 65536KB chunk
$ sgdisk --replicate=/dev/vdc /dev/vdb
$ sgdisk --randomize-guids /dev/vdc
$ mdadm --manage /dev/md0 --add /dev/vdc1
$ cat /proc/mdstat
md1 : active (auto-read-only) raid1 vda2[0] vdb2[1]
      101312 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active raid5 vdc1[7] vda1[5] vdd1[4] vdb1[6]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

Then, we convert /dev/md1 from RAID 1 to RAID 5:

$ mdadm --grow /dev/md1 --level=5 --raid-devices=3 --add /dev/vdc2
mdadm: level of /dev/md1 changed to raid5
mdadm: added /dev/vdc2
$ cat /proc/mdstat
md1 : active raid5 vdc2[2] vda2[0] vdb2[1]
      202624 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active raid5 vdc1[7] vda1[5] vdd1[4] vdb1[6]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk
$ pvresize /dev/md1
$ vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  data   2   2   0 wz--n- 482.00m 282.00m

This gives us the following layout:

Two RAID 5 devices built from four disks of different sizes. The last disk is smaller and contains only one partition, while the others have two partitions: one for /dev/md0 and one for /dev/md1. The two RAID devices are part of a single LVM volume group.
RAID 5 setup with mixed-capacity disks using partitions and LVM

We further extend our capacity by replacing /dev/vdd:

$ cat /proc/mdstat
md0 : active (auto-read-only) raid5 vda1[5] vdc1[7] vdb1[6]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active (auto-read-only) raid5 vda2[0] vdc2[2] vdb2[1]
      202624 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk
$ sgdisk --replicate=/dev/vdd /dev/vdc
$ sgdisk --randomize-guids /dev/vdd
$ mdadm --manage /dev/md0 --add /dev/vdd1
$ cat /proc/mdstat
md0 : active raid5 vdd1[4] vda1[5] vdc1[7] vdb1[6]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active (auto-read-only) raid5 vda2[0] vdc2[2] vdb2[1]
      202624 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

We grow the second RAID 5 array:

$ mdadm --grow /dev/md1 --raid-devices=4 --add /dev/vdd2
mdadm: added /dev/vdd2
$ cat /proc/mdstat
md0 : active raid5 vdd1[4] vda1[5] vdc1[7] vdb1[6]
      299520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid5 vdd2[3] vda2[0] vdc2[2] vdb2[1]
      303936 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk
$ pvresize /dev/md1
$ vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  data   2   2   0 wz--n- 580.00m 380.00m
$ lsblk -Mo NAME,TYPE,SIZE
       NAME          TYPE    SIZE
       vda           disk    201M
   ┌┈▶ ├─vda1        part    100M
┌┈▶┆   └─vda2        part    100M
┆  ┆   vdb           disk    201M
┆  ├┈▶ ├─vdb1        part    100M
├┈▶┆   └─vdb2        part    100M
┆  ┆   vdc           disk    201M
┆  ├┈▶ ├─vdc1        part    100M
├┈▶┆   └─vdc2        part    100M
┆  ┆   vdd           disk    301M
┆  └┬▶ ├─vdd1        part    100M
└┬▶ ┆  └─vdd2        part    100M
 ┆  └┈┈md0           raid5 292.5M
 ┆     ├─data-bits   lvm     100M
 ┆     └─data-pieces lvm     100M
 └┈┈┈┈┈md1           raid5 296.8M

You can continue by replacing each disk one by one using the same steps. ♾️


  1. Write-intent bitmaps speed up recovery of the RAID array after a power failure by marking unsynchronized regions as dirty. They have an impact on performance, but I did not measure it myself. ↩︎

  2. In the lsblk output, /dev/md1 appears unused because the logical volumes do not use any space from it yet. Once you create more logical volumes or extend them, lsblk will reflect the usage. ↩︎

19 January, 2026 05:49AM by Vincent Bernat

Dima Kogan

mrcal 2.5 released!

mrcal 2.5 is out: the release notes. Once again, this is mostly a bug-fix release en route to the big new features coming in 3.0.

One cool thing is that these tools have now matured enough to no longer be considered experimental. They have been used with great success in lots of contexts across many different projects and organizations. Some highlights:

  • I've calibrated extremely wide lenses
  • and extremely narrow lenses
  • and joint systems containing many different kinds of lenses
  • with lots of cameras at the same time. The biggest single joint calibration I've done today had 10 cameras, but I'll almost certainly encounter bigger systems in the future
  • mrcal has been used to process both visible and thermal cameras
  • The new triangulated-feature capability has been used in a structure-from-motion context to compute the world geometry on-line.
  • mrcal has been used with weird experimental setups employing custom calibration objects and single-view solves
  • mrcal has calibrated joint camera-LIDAR systems
  • and joint camera-IMU systems
  • Lots of students use mrcal as part of PhotonVision, the toolkit used by teams in the FIRST Robotics Competition

Some of the above is new, and not yet fully polished and documented and tested, but it works.

In mrcal 2.5, most of the implementation of some new big features is written and committed, but it's still incomplete. The new stuff is there, but is lightly tested and documented. This will be completed eventually in mrcal 3.0:

  • Cross-reprojection uncertainty, to be able to perform full calibrations with a splined model and without a chessboard. mrcal-show-projection-uncertainty --method cross-reprojection-rrp-Jfp is available today, and works in the usual moving-chessboard-stationary camera case. Fully boardless coming later.
  • More general view of uncertainty and diffs. I want to support extrinsics-only and/or intrinsics computations-only in lots of scenarios. Uncertainty in point solves is already available in some conditions, for instance if the points are fixed. New mrcal-show-stereo-pair-diff tool reports an extrinsics+intrinsics diff between two calibrations of a stereo pair; experimental analyses/extrinsics-stability.py tool reports an extrinsics-only diff. These are in contrast to the intrinsics-only uncertainty and diffs in the existing mrcal-show-projection-diff and mrcal-show-projection-uncertainty tools. Some documentation in the uncertainty and differencing pages.
  • Implicit point solves, using the triangulation routines in the optimization cost function. Should produce much more efficient structure-from-motion solves. This is all the "triangulated-features" stuff. The cost function is primarily built around _mrcal_triangulated_error(). This is demoed in test/test-sfm-triangulated-points.py. And I've been using _mrcal_triangulated_error() in structure-from-motion implementations within other optimization routines.

mrcal is quite good already, and will be even better in the future. Try it today!

19 January, 2026 12:00AM by Dima Kogan

January 17, 2026

Simon Josefsson

Backup of S3 Objects Using rsnapshot

I’ve been using rsnapshot to take backups of around 10 servers and laptops for well over 15 years, and it is a remarkably reliable tool that has proven itself many times. Rsnapshot uses rsync over SSH and maintains a temporal hard-link file pool. Once rsnapshot is configured and running, on the backup server, you get a hardlink farm with directories like this for the remote server:

/backup/serverA.domain/.sync/foo
/backup/serverA.domain/daily.0/foo
/backup/serverA.domain/daily.1/foo
/backup/serverA.domain/daily.2/foo
...
/backup/serverA.domain/daily.6/foo
/backup/serverA.domain/weekly.0/foo
/backup/serverA.domain/weekly.1/foo
...
/backup/serverA.domain/monthly.0/foo
/backup/serverA.domain/monthly.1/foo
...
/backup/serverA.domain/yearly.0/foo

I can browse and rescue files easily, going back in time when needed.

The rsnapshot project README explains more, there is a long rsnapshot HOWTO although I usually find the rsnapshot man page the easiest to digest.

I have stored multi-TB Git-LFS data on GitLab.com for some time. The yearly renewal is coming up, and the price for Git-LFS storage on GitLab.com is now excessive (~$10.000/year). I have reworked my work-flow and finally migrated debdistget to only store Git-LFS stubs on GitLab.com and push the real files to S3 object storage. The cost for this is barely measurable, I have yet to run into the €25/month warning threshold.

But how do you backup stuff stored in S3?

For some time, my S3 backup solution has been to run the minio-client mirror command to download all S3 objects to my laptop, and rely on rsnapshot to keep backups of this. While 4TB NVME’s are relatively cheap, I’ve felt that this disk and network churn on my laptop is unsatisfactory for quite some time.

What is a better approach?

I find S3 hosting sites fairly unreliable by design. Only a couple of clicks in your web browser and you have dropped 100TB of data. Or by someone else who steal your plaintext-equivalent cookie. Thus, I haven’t really felt comfortable using any S3-based backup option. I prefer to self-host, although continously running a mirror job is not sufficient: if I accidentally drop the entire S3 object store, my mirror run will remove all files locally too.

The rsnapshot approach that allows going back in time and having data on self-managed servers feels superior to me.

What if we could use rsnapshot with a S3 client instead of rsync?

Someone else asked about this several years ago, and the suggestion was to use the fuse-based s3fs which sounded unreliable to me. After some experimentation, working around some hard-coded assumption in the rsnapshot implementation, I came up with a small configuration pattern and a wrapper tool to implement what I desired.

Here is my configuration snippet:

cmd_rsync    /backup/s3/s3rsync
rsync_short_args    -Q
rsync_long_args    --json --remove
lockfile    /backup/s3/rsnapshot.pid
snapshot_root    /backup/s3
backup    s3:://hetzner/debdistget-gnuinos    ./debdistget-gnuinos
backup    s3:://hetzner/debdistget-tacos  ./debdistget-tacos
backup    s3:://hetzner/debdistget-diffos ./debdistget-diffos
backup    s3:://hetzner/debdistget-pureos ./debdistget-pureos
backup    s3:://hetzner/debdistget-kali   ./debdistget-kali
backup    s3:://hetzner/debdistget-devuan ./debdistget-devuan
backup    s3:://hetzner/debdistget-trisquel   ./debdistget-trisquel
backup    s3:://hetzner/debdistget-debian ./debdistget-debian

The idea is to save a backup of a couple of S3 buckets under /backup/s3/.

I have some scripts that take a complete rsnapshot.conf file and append my per-directory configuration so that this becomes a complete configuration. If you are curious how I roll this, backup-all invokes backup-one appending my rsnapshot.conf template with the snippet above.

The s3rsync wrapper script is the essential hack to convert rsnapshot’s rsync parameters into something that talks S3 and the script is as follows:

#!/bin/sh

set -eu

S3ARG=
for ARG in "$@"; do
    case $ARG in
    s3:://*) S3ARG="$S3ARG "$(echo $ARG | sed -e 's,s3:://,,');;
    -Q*) ;;
    *) S3ARG="$S3ARG $ARG";;
    esac
done

echo /backup/s3/mc mirror $S3ARG

exec /backup/s3/mc mirror $S3ARG

It uses the minio-client tool. I first tried s3cmd but its sync command read all files to compute MD5 checksums every time you invoke it, which is very slow. The mc mirror command is blazingly fast since it only compare mtime’s, just like rsync or git.

First you need to store credentials for your S3 bucket. These are stored in plaintext in ~/.mc/config.json which I find to be sloppy security practices, but I don’t know of any better way to do this. Replace AKEY and SKEY with your access token and secret token from your S3 provider:

/backup/s3/mc alias set hetzner AKEY SKEY

If I invoke a sync job for a fully synced up directory the output looks like this:

root@hamster /backup# /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf -V sync
Setting locale to POSIX "C"
echo 1443 > /backup/s3/rsnapshot.pid 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-gnuinos \
    /backup/s3/.sync//debdistget-gnuinos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-gnuinos /backup/s3/.sync//debdistget-gnuinos
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-tacos \
    /backup/s3/.sync//debdistget-tacos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-tacos /backup/s3/.sync//debdistget-tacos
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-diffos \
    /backup/s3/.sync//debdistget-diffos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-diffos /backup/s3/.sync//debdistget-diffos
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-pureos \
    /backup/s3/.sync//debdistget-pureos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-pureos /backup/s3/.sync//debdistget-pureos
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-kali \
    /backup/s3/.sync//debdistget-kali 
/backup/s3/mc mirror --json --remove hetzner/debdistget-kali /backup/s3/.sync//debdistget-kali
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-devuan \
    /backup/s3/.sync//debdistget-devuan 
/backup/s3/mc mirror --json --remove hetzner/debdistget-devuan /backup/s3/.sync//debdistget-devuan
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-trisquel \
    /backup/s3/.sync//debdistget-trisquel 
/backup/s3/mc mirror --json --remove hetzner/debdistget-trisquel /backup/s3/.sync//debdistget-trisquel
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-debian \
    /backup/s3/.sync//debdistget-debian 
/backup/s3/mc mirror --json --remove hetzner/debdistget-debian /backup/s3/.sync//debdistget-debian
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
touch /backup/s3/.sync/ 
rm -f /backup/s3/rsnapshot.pid 
/run/current-system/profile/bin/logger -p user.info -t rsnapshot[1443] \
    /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf \
    -V sync: completed successfully 
root@hamster /backup# 

You can tell from the paths that this machine runs Guix. This was the first production use of the Guix System for me, and the machine has been running since 2015 (with the occasional new hard drive). Before, I used rsnapshot on Debian, but some stable release of Debian dropped the rsnapshot package, paving the way for me to test Guix in production on a non-Internet exposed machine. Unfortunately, mc is not packaged in Guix, so you will have to install it from the MinIO Client GitHub page manually.

Running the daily rotation looks like this:

root@hamster /backup# /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf -V daily
Setting locale to POSIX "C"
echo 1549 > /backup/s3/rsnapshot.pid 
mv /backup/s3/daily.5/ /backup/s3/daily.6/ 
mv /backup/s3/daily.4/ /backup/s3/daily.5/ 
mv /backup/s3/daily.3/ /backup/s3/daily.4/ 
mv /backup/s3/daily.2/ /backup/s3/daily.3/ 
mv /backup/s3/daily.1/ /backup/s3/daily.2/ 
mv /backup/s3/daily.0/ /backup/s3/daily.1/ 
/run/current-system/profile/bin/cp -al /backup/s3/.sync /backup/s3/daily.0 
rm -f /backup/s3/rsnapshot.pid 
/run/current-system/profile/bin/logger -p user.info -t rsnapshot[1549] \
    /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf \
    -V daily: completed successfully 
root@hamster /backup# 

Hopefully you will feel inspired to take backups of your S3 buckets now!

17 January, 2026 10:04PM by simon

hackergotchi for Jonathan Dowland

Jonathan Dowland

Honest Jon's lightly-used Starships

No man’s Sky (or as it’s known in our house, "spaceship game") is a space exploration/sandbox game that was originally released 10 years ago. Back then I tried it on my brother‘s PS4 but I couldn’t get into it. In 2022 it launched for the Nintendo Switch1 and the game finally clicked for me.

I play it very casually. I mostly don’t play at all, except sometimes when there are time-limited “expeditions” running, which I find refreshing, and usually have some exclusives as a reward for play.

One of the many things you can do in the game is collect star ships. I started keeping a list of notable ones I’ve found, and I’ve decided to occasionally blog about them.

The Horizon Vector NX spaceship

The Horizon Vector NX is a small sporty ship that players on Nintendo Switch could claim within the first month or so after it launched. The colour scheme resembles the original “neon” switch controllers. Although the ship type occurs naturally in the game in other configurations, I think differently-painted wings are unique to this ship.

For most of the last 4 years, my copy of this ship was confined to the Switch, until November 2024, when they added cross-save capability to the game. I was then able to access the ship when playing on Linux (or Mac).


  1. The game runs very well natively on Mac, flawlessly on Steam for Linux, but struggles on the origins switch. It’s a marvel it runs there at all.

17 January, 2026 08:02PM

January 16, 2026

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppSpdlog 0.0.26 on CRAN: Another Microfix

Version 0.0.26 of RcppSpdlog arrived on CRAN moments ago, and will be uploaded to Debian and built for r2u shortly. The (nice) documentation site has been refreshed too. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the nice package documention site.

Brian Ripley noticed an infelicity when building under C++20 which he is testing hard and fast. Sadly, this came a day late for yesterday’s upload of release 0.0.25 with another trivial fix so another incremental release was called for. We already accommodated C++20 and its use of std::format (in lieu of the included fmt::format) but had not turned it on unconditionally. We do so now, but offer an opt-out for those who prefer the previous build type.

The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.26 (2026-01-16)

  • Under C++20 or later, switch to using std::format to avoid a compiler nag that CRAN now complains about

Courtesy of my CRANberries, there is also a diffstat report detailing changes. More detailed information is on the RcppSpdlog page, or the package documention site.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

16 January, 2026 02:26PM

hackergotchi for Jonathan Dowland

Jonathan Dowland

Ye Gods

Via (I think) @mcc on the Fediverse, I learned of GetMusic: a sort-of "clearing house" for Free Bandcamp codes. I think the way it works is, some artists release a limited set of download codes for their albums in order to promote them, and GetMusic help them to keep track of that, and helps listeners to discover them.

GetMusic mail me occasionally, and once they highlighted an album The Arcane & Paranormal Earth which they described as "Post-Industrial in the vein of Coil and Nurse With Wound with shades of Aphex Twin, Autechre and assorted film music."

Well that description hooked me immediately but I missed out on the code. However, I sampled the album on Bandcamp directly a few times as well as a few of his others (Ye Gods is a side-project of Antoni Maiovvi, which itself is a pen-name) and liked them very much. I picked up the full collection of Ye Gods albums in one go for 30% off.

Here's a stand-out track:

On Earth by Ye Gods

So I guess this service works! Although I didn't actually get a free code in this instance, it promoted the artist, introduced me to something I really liked and drove a sale.

16 January, 2026 10:14AM

January 15, 2026

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppSpdlog 0.0.25 on CRAN: Microfix

Version 0.0.25 of RcppSpdlog arrived on CRAN right now, and will be uploaded to Debian and built for r2u shortly along with a minimal refresh of the documentation site. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the nice package documention site.

This release fixes a minuscule cosmetic issue from the previous release a week ago. We rely on two #defines that R sets to signal to spdlog that we are building in the R context (which matters for the R-specific logging sink, and picks up something Gabi added upon my suggestion at the very start of this package). But I use the same #defines to now check in Rcpp that we are building with R and, in this case, wrongly conclude R headers have already been installed so Rcpp (incorrectly) nags about that. The solution is to add two #undefine and proceed as normal (with Rcpp controlling and taking care of R header includion too) and that is what we do here. All good now, no nags from a false positive.

The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.25 (2026-01-15)

  • Ensure #define signaling R build (needed with spdlog) is unset before including R headers to not falsely triggering message from Rcpp

Courtesy of my CRANberries, there is also a diffstat report detailing changes. More detailed information is on the RcppSpdlog page, or the package documention site.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

15 January, 2026 01:20PM

January 14, 2026

RcppSimdJson 0.1.15 on CRAN: New Upstream, Some Maintenance

A brand new release 0.1.15 of the RcppSimdJson package is now on CRAN.

RcppSimdJson wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators. Via very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in parsing gigabytes of JSON parsed per second which is quite mindboggling. The best-case performance is ‘faster than CPU speed’ as use of parallel SIMD instructions and careful branch avoidance can lead to less than one cpu cycle per byte parsed; see the video of the talk by Daniel Lemire at QCon.

This version updates to the current 4.2.4 upstream release. It also updates the RcppExports.cpp file with ‘glue’ between C++ and R. We want move away from using Rf_error() (as Rcpp::stop() is generally preferable). Packages (such as this one) that are declaring an interface have an actual Rf_error() call generated in RcppExports.cpp which can protect which is what current Rcpp code generation does. Long story short, a minor internal reason.

The short NEWS entry for this release follows.

Changes in version 0.1.15 (2026-01-14)

  • simdjson was upgraded to version 4.2.4 (Dirk in #97

  • RcppExports.cpp was regenerated to aid a Rcpp transition

  • Standard maintenance updates for continuous integration and URLs

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

14 January, 2026 04:11PM

gunsales 0.1.3 on CRAN: Maintenance

An update to the gunsales package is now on CRAN. As in the last update nine years ago (!!), changes are mostly internal. An upcoming dplyr change requires a switch from the old and soon to-be-removed ‘underscored’ verb form; that was kindly addressed in an incoming pull request. We also updated the CI scripts a few times during this period as needed, and switched to using Authors@R, and refreshed and updated a number of URL references.

Courtesy of my CRANberries, there is also a diffstat report for this release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

14 January, 2026 11:21AM

hackergotchi for Gunnar Wolf

Gunnar Wolf

The Innovation Engine • Government-funded Academic Research

This post is an unpublished review for The Innovation Engine • Government-funded Academic Research

David Patterson does not need an introduction. Being the brain behind many of the inventions that shaped the computing industry (repeatedly) over the past 40 years, when he put forward an opinion article in Communications of the ACM targeting the current day political waves in the USA, I could not avoid choosing it to write this review.

Patterson worked for a a public university (University of California at Berkeley) between 1976 and 2016, and in this article he argues how government-funded academic research (GoFAR) allows for faster, more effective and freer development than private sector-funded research would, putting his own career milestones as an example of how public money that went to his research has easily been amplified by a factor of 10,000:1 for the country’s economy, and 1,000:1 particularly for the government.

Patterson illustrates this quoting five of the “home-run” research projects he started and pursued with government funding, eventually spinning them off as successful startups:

  • RISC (Reduced Instruction Set Computing): Microprocessor architecture that reduces the complexity and power consumption of CPUs, yielding much smaller and more efficient processors.
  • RAID (Redundant Array of Inexpensive Disks): Patterson experimented with a way to present a series of independent hard drive units as if they were a single, larger one, leading to increases in capacity and reliability beyond what the industry could provide in single drives, for a fraction of the price.
  • NOW (Network Of Workstations): Introduced what we now know as computer clusters (in contrast of large-scale massively multiprocessed cache-coherent systems known as “supercomputers”), which nowadays power over 80% of the Top500 supercomputer list and are the computer platform of choice of practically all data centers.
  • RAD Lab (Reliable Adaptive Distributed Systems Lab): Pursued the technology for data centers to be self-healing and self-managing, testing and pushing early cloud-scalability limits
  • ParLab (Parallel Computing Lab): Given the development of massively parallel processing inside even simple microprocessors, this lab explores how to improve designs of parallel software and hardware, presenting the ground works that proved that inherently parallel GPUs were better than CPUs at machine learning tasks. It also developed the RISC-V open instruction set architecture.

Patterson identifies principles for the projects he has led, that are specially compatible with the ways research works at universitary systems: Multidisciplinary teams, demonstrative usable artifacts, seven- to ten-year impact horizons, five-year sunset clauses (to create urgency and to lower opportunity costs), physical proximity of collaborators, and leadership followed on team success rather than individual recognition.

While it could be argued that it’s easy to point at Patterson’s work as a success example while he is by far not the average academic, the points he makes on how GoFAR research has been fundamental for the advance of science and technology, but also of biology, medicine, and several other fields are very clear.

14 January, 2026 12:29AM

January 13, 2026

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

plocate 1.1.24 released

I've released version 1.1.24 of plocate, as usual dominated by small patches from external contributors. The changelog is below:

plocate 1.1.24, January 13th, 2026

  - Improve error handling on synchronous reads. Reported by
    Björn Försterling.

  - Remove ConditionACPower=true from the systemd unit file,
    to fix an issue where certain charging patterns prevent
    updatedb from ever running on laptops. Patch by Manfred Schwarb.

  - Add a new option --config-file for changing the path of
    updatedb.conf. Patch by Yehuda Bernáth.

As always, you can get it from the plocate page or your favourite Linux distribution (packages to Debian unstable are on their way up, others will surely follow soon).

13 January, 2026 10:57PM

hackergotchi for Thomas Lange

Thomas Lange

30.000 FAIme jobs created in 7 years

The number of FAIme jobs has reached 30.000. Yeah!
At the end of this November the FAIme web service for building customized ISOs turns 7 years old. It had reached 10.000 jobs in March 2021 and 20.000 jobs were reached in June 2023. A nice increase of the usage.

Here are some statistics for the jobs processed in 2024:

Type of jobs

3%     cloud image
11%     live ISO
86%     install ISO

Distribution

2%     bullseye
8%     trixie
12%     ubuntu 24.04
78%     bookworm

Misc

  • 18%   used a custom postinst script
  • 11%   provided their ssh pub key for passwordless root login
  • 50%   of the jobs didn't included a desktop environment at all, the others used GNOME, XFCE or KDE or the Ubuntu desktop the most.
  • The biggest ISO was a FAIme job which created a live ISO with a desktop and some additional packages This job took 30min to finish and the resulting ISO was 18G in size.

Execution Times

The cloud and live ISOs need more time for their creation because the FAIme server needs to unpack and install all packages. For the install ISO the packages are only downloaded. The amount of software packages also affects the build time. Every ISO is build in a VM on an old 6-core E5-1650 v2. Times given are calculated from the jobs of the past two weeks.

Job type     Avg     Max
install no desktop     1 min     2 min
install GNOME     2 min     5 min

The times for Ubuntu without and with desktop are one minute higher than those mentioned above.

Job type     Avg     Max
live no desktop     4 min     6 min
live GNOME     8 min     11 min

The times for cloud images are similar to live images.

A New Feature

For a few weeks now, the system has been showing the number of jobs ahead of you in the queue when you submit a job that cannot be processed immediately.

The Next Milestone

At the end of this years the FAI project will be 25 years old. If you have a success story of your FAI usage to share please post it to the linux-fai mailing list or send it to me. Do you know the FAI questionnaire ? A lot of reports are already available.

Here's an overview what happened in the past 20 years in the FAI project.

About FAIme

FAIme is the service for building your own customized ISO via a web interface. You can create an installation or live ISO or a cloud image. Several Debian releases can be selected and also Ubuntu server or Ubuntu desktop installation ISOs can be customized. Multiple options are available like selecting a desktop and the language, adding your own package list, choosing a partition layout, adding a user, choosing a backports kernel, adding a postinst script and some more.

13 January, 2026 02:23PM

Simon Josefsson

Debian Libre Live 13.3.0 is released!

Following up on my initial announcement about Debian Libre Live I am happy to report on continued progress and the release of Debian Libre Live version 13.3.0.

Since both this and the previous 13.2.0 release are based on the stable Debian trixie release, there really isn’t a lot of major changes but instead incremental minor progress for the installation process. Repeated installations has a tendency to reveal bugs, and we have resolved the apt sources list confusion for Calamares-based installations and a couple of other nits. This release is more polished and we are not aware of any known remaining issues with them (unlike for earlier versions which were released with known problems), although we conservatively regard the project as still in beta. A Debian Libre Live logo is needed before marking this as stable, any graphically talented takers? (Please base it on the Debian SVG upstream logo image.)

We provide GNOME, KDE, and XFCE desktop images, as well as text-only “standard” image, which match the regular Debian Live images with non-free software on them, but also provide a “slim” variant which is merely 750MB compared to the 1.9GB “standard” image. The slim image can still start a debian installer, and can still boot into a minimal live text-based system.

The GNOME, KDE and XFCE desktop images feature the Calamares installer, and we have performed testing on a variety of machines. The standard and slim images does not have a installer from the running live system, but all images support a boot menu entry to start the installer.

With this release we also extend our arm64 support to two tested platforms. The current list of successfully installed and supported systems now include the following hardware:

This is a very limited set of machines, but the diversity in CPUs and architecture should hopefully reflect well on a wide variety of commonly available machines. Several of these machines are crippled (usually GPU or WiFI) without adding non-free software, complain at your hardware vendor and adapt your use-cases and future purchases.

The images are as follows, with SHA256SUM checksums and GnuPG signature on the 13.3.0 release page.

Curious how the images were made? Fear not, for the Debian Libre Live project README has documentation, the run.sh script is short and the .gitlab-ci.yml CI/CD Pipeline definition file brief.

Happy Libre OS hacking!

13 January, 2026 01:53PM by simon

January 12, 2026

hackergotchi for Louis-Philippe Véronneau

Louis-Philippe Véronneau

Reducing the size of initramfs kernel images

In the past few years, the size of the kernel images in Debian have been steadily growing. I don't see this as a problem per se, but it has been causing me trouble, as my /boot partition has become too small to accommodate two kernel images at the same time.

Since I'm running Debian Unstable on my personal systems and keep them updated with unattended-upgrade, this meant each (frequent) kernel upgrade triggered an error like this one:

update-initramfs: failed for /boot/initrd.img-6.17.11+deb14-amd64 with 1.
dpkg: error processing package initramfs-tools (--configure):
 installed initramfs-tools package post-installation script subprocess returned
 error exit status 1
Errors were encountered while processing:
 initramfs-tools
E: Sub-process /usr/bin/dpkg returned an error code (1)

This would in turn break the automated upgrade process and require me to manually delete the currently running kernel (which works, but isn't great) to complete the upgrade.

The "obvious" solution would have been to increase the size of my /boot partition to something larger than the default 456M. Since my systems use full-disk encryption and LVM, this isn't trivial and would have required me to play Tetris and swap files back and forth using another drive.

Another solution proposed by anarcat was to migrate to systemd-boot (I'm still using grub), use Unified Kernel Images (UKI) and merge the /boot and /boot/efi partitions. Since I already have a bunch of configurations using grub and I am not too keen on systemd taking over all the things on my computer, I was somewhat reluctant.

As my computers are all configured by Puppet, I could of course have done a complete system reinstallation, but again, this was somewhat more involved than what I wanted it to be.

After looking online for a while, I finally stumbled on this blog post by Neil Brown detailing how to shrink the size of the initramfs images. With MODULES=dep my images shrunk from 188M to 41M, fixing my issue. Thanks Neil!

I was somewhat worried removing kernel modules would break something on my systems, but so far, I only had to manually load the i2c_dev module, as I need it to manage my home monitor's brightness using ddcutil.

12 January, 2026 08:59PM by Louis-Philippe Véronneau

hackergotchi for Gunnar Wolf

Gunnar Wolf

Python Workout 2nd edition

This post is an unpublished review for Python Workout 2nd edition

Note: While I often post the reviews I write for Computing Reviews, this is a shorter review requested to me by Manning. They kindly invited me several months ago to be a reviewer for Python Workout, 2nd edition; after giving them my opinions, I am happy to widely recommend this book to interested readers.

Python is relatively an easy programming language to learn, allowing you to start coding pretty quickly. However, there’s a significant gap between being able to “throw code” in Python and truly mastering the language. To write efficient, maintainable code that’s easy for others to understand, practice is essential. And that’s often where many of us get stuck. This book begins by stating that it “is not designed to teach you Python (…) but rather to improve your understanding of Python and how to use it to solve problems.”

The author’s structure and writing style are very didactic. Each chapter addresses a different aspect of the language: from the simplest (numbers, strings, lists) to the most challenging for beginners (iterators and generators), Lerner presents several problems for us to solve as examples, emphasizing the less obvious details of each aspect.

I was invited as a reviewer in the preprint version of the book. I am now very pleased to recommend it to all interested readers. The author presents a pleasant and easy-to-read text, with a wealth of content that I am sure will improve the Python skills of all its readers.

12 January, 2026 07:23PM

hackergotchi for Daniel Kahn Gillmor

Daniel Kahn Gillmor

AI as a Compression Problem

A recent article in The Atlantic makes the case that very large language models effectively contain much of the works they're trained on. This article is an attempt to popularize the insights in the recent academic paper Extracting books from production language models from Ahmed et al. The authors of the paper demonstrate convincingly that well-known copyrighted textual material can be extracted from the chatbot interfaces of popular commercial LLM services.

The Atlantic article cites a podcast quote about the Stable Diffusion AI image-generator model, saying "We took 100,000 gigabytes of images and compressed it to a two-gigabyte file that can re-create any of those and iterations of those". By analogy, this suggests we might think of LLMs (which work on text, not the images handled by Stable Diffusion) as a form of lossy textual compression.

The entire text of Moby Dick, the canonical Big American Novel is merely 1.2MiB uncompressed (and less than 0.4MiB losslessly compressed with bzip2 -9). It's not surprising to imagine that a model with hundreds of billions of parameters might contain copies of these works.

Warning: The next paragraph contains fuzzy math with no real concrete engineering practice behind it!

Consider a hypothetical model with 100 billion parameters, where each parameter is stored as a 16-bit floating point value. The model weights would take 200 GB of storage. If you were to fill the parameter space only with losslessly compressed copies of books like Moby Dick, you could still fit half a million books, more than anyone can read in a lifetime. And lossy compression is typically orders of magnitude less in size than lossless compression, so we're talking about millions of works effectively encoded, with the acceptance of some artifacts being injected in the output.

I first encountered this "compression" view of AI nearly three years ago, in Ted Chiang's insightful ChatGPT is a Blurry JPEG of the Web. I was suprised that The Atlantic article didn't cite Chiang's piece. If you haven't read Ted Chiang, i strongly recommend his work, and this piece is a great place to start.

Chiang aside, the more recent writing that focuses on the idea of compressed works being "contained" in the model weights seems to be used by people interested in wielding some sort of copyright claims against the AI companies that maintain or provide access to these models. There are many many problems with AI today, but attacking AI companies based on copyright concerns seems similar to going after Al Capone for tax evasion.

We should be much more concerned with the effect these projects have on cultural homogeneity, mental health, labor rights, privacy, and social control than whether they're violating copyright in some specific instance.

12 January, 2026 05:00AM by Daniel Kahn Gillmor

January 11, 2026

Russell Coker

Terminal Emulator Security

I just read this informative article on ANSI terminal security [1]. The author has written a tool named vt-houdini for testing for these issues [2]. They used to host an instance on their server but appear to have stopped it. When you run that tool you can ssh to the system in question and without needing a password you are connected and the server probes your terminal emulator for vulnerabilities. The versions of Kitty and Konsole in Debian/Trixie have just passed those tests on my system.

This will always be a potential security problem due to the purpose of a terminal emulator. A terminal emulator will often display untrusted data and often data which is known to come from hostile sources (EG logs of attempted attacks). So what could be done in this regard?

Memory Protection

Due to the complexity of terminal emulation there is the possibility of buffer overflows and other memory management issues that could be used to compromise the emulator.

The Fil-C compiler is an interesting project [3], it compiles existing C/C++ code with memory checks. It is reported to have no noticeable impact on the performance of the bash shell which sounds like a useful option to address some of these issues as shell security issues are connected to terminal security issues. The performance impact on a terminal emulator would be likely to be more noticeable. Also note that Fil-C compilation apparently requires compiling all libraries with it, this isn’t a problem for bash as the only libraries it uses nowadays are libtinfo and libc. The kitty terminal emulator doesn’t have many libraries but libpython is one of them, it’s an essential part of Kitty and it is a complex library to compile in a different way. Konsole has about 160 libraries and it isn’t plausible to recompile so many libraries at this time.

Choosing a terminal emulator that has a simpler design might help in this regard. Emulators that call libraries for 3D effects etc and native support for displaying in-line graphics have a much greater attack surface.

Access Control

A terminal emulator could be run in a container to prevent it from doing any damage if it is compromised. But the terminal emulator will have full control over the shell it runs and if the shell has access needed to allow commands like scp/rsync to do what is expected of them then that means that no useful level of containment is possible.

It would be possible to run a terminal emulator in a container for the purpose of connecting to an insecure or hostile system and not allow scp/rsync to/from any directory other than /tmp (or other directories to use for sharing files). You could run “exec ssh $SERVER” so the terminal emulator session ends when the ssh connection ends.

Conclusion

There aren’t good solutions to the problems of terminal emulation security. But testing every terminal emulator with vt-houdini and fuzzing the popular ones would be a good start.

Qubes level isolation will help things in some situations, but if you need to connect to a server with privileged access to read log files containing potentially hostile data (which is a common sysadmin use case) then there aren’t good options.

11 January, 2026 03:46AM by etbe

January 09, 2026

Simon Josefsson

Debian Taco – Towards a GitSecDevOps Debian

One of my holiday projects was to understand and gain more trust in how Debian binaries are built, and as the holidays are coming to an end, I’d like to introduce a new research project called Debian Taco. I apparently need more holidays, because there are still more work to be done here, so at the end I’ll summarize some pending work.

Debian Taco, or TacOS, is a GitSecDevOps rebuild of Debian GNU/Linux.

The Debian Taco project publish rebuilt binary packages, package repository metadata (InRelease, Packages, etc), container images, cloud images and live images.

All packages are built from pristine source packages in the Debian archive. Debian Taco does not modify any Debian source code nor add or remove any packages found in Debian.

No servers are involved! Everything is built in GitLab pipelines and results are published through modern GitDevOps mechanism like GitLab Pages and S3 object storage. You can fork the individual projects below on GitLab.com and you will have your own Debian-derived OS available for tweaking. (Of course, at some level, servers are always involved, so this claim is a bit of hyperbole.)

Goals

The goal of TacOS is to be bit-by-bit identical with official Debian GNU/Linux, and until that has been completed, publish diffoscope output with differences.

The idea is to further categorize all artifact differences into one of the following categories:

1) An obvious bug in Debian. For example, if a package does not build reproducible.

2) An obvious bug in TacOS. For example, if our build environment does not manage to build a package.

3) Something else. This would be input for further research and consideration. This category also include things where it isn’t obvious if it is a bug in Debian or in TacOS. Known examples:

3A) Packages in TacOS are rebuilt the latest available source code, not the (potentially) older package that were used to build the Debian packages. This could lead to differences in the packages. These differences may be useful to analyze to identify supply-chain attacks. See some discussion about idempotent rebuilds.

Our packages are all built from source code, unless we have not yet managed to build something. In the latter situation, Debian Taco falls back and uses the official Debian artifact. This allows an incremental publication of Debian Taco that still is 100% complete without requiring that everything is rebuilt instantly. The goal is that everything should be rebuilt, and until that has been completed, publish a list of artifacts that we use verbatim from Debian.

Debian Taco Archive

The Debian Taco Archive project generate and publish the package archive (dists/tacos-trixie/InRelease, dists/tacos-trixie/main/binary-amd64/Packages.gz, pool/* etc), similar to what is published at https://deb.debian.org/debian/.

The output of the Debian Taco Archive is available from https://debdistutils.gitlab.io/tacos/archive/.

Debian Taco Container Images

The Debian Taco Container Images project provide container images of Debian Taco for trixie, forky and sid on the amd64, arm64, ppc64el and riscv64 architectures.

These images allow quick and simple use of Debian Taco interactively, but makes it easy to deploy for container orchestration frameworks.

Debian Taco Cloud Images

The Debian Taco Cloud Images project provide cloud images of Debian Taco for trixie, forky and sid on the amd64, arm64, ppc64el and riscv64 architectures.

Launch and install Debian Taco for your cloud environment!

Debian Taco Live Images

The Debian Taco Live Images project provide live images of Debian Taco for trixie, forky and sid on the amd64 and arm64 architectures.

These images allows running Debian Taco on physical hardware (or virtual machines), and even installation for permanent use.

Debian Taco Build Images and Packages

Packages are built using debdistbuild, which was introduced in a blog about Build Debian in a GitLab Pipeline.

The first step is to prepare build images, which is done by the Debian Taco Build Images project. They are similar to the Debian Taco containers but have build-essential and debdistbuild installed on them.

Debdistbuild is launched in a per-architecture per-suite CI/CD project. Currently only trixie-amd64 is available. That project has built some essential early packages like base-files, debian-archive-keyring and hostname. They are stored in Git LFS backed by a S3 object storage. These packages were all built reproducibly. So this means Debian Taco is still 100% bit-by-bit identical to Debian, except for the renaming.

I’ve yet to launch a more massive wide-scale package rebuild until some outstanding issues have been resolved. I earlier rebuilt around 7000 packages from Trixie on amd64, so I know that the method easily scales.

Remaining work

Where is the diffoscope package outputs and list of package differences? For another holiday! Clearly this is an important remaining work item.

Another important outstanding issue is how to orchestrate launching the build of all packages. Clearly a list of packages is needed, and some trigger mechanism to understand when new packages are added to Debian.

One goal was to build packages from the tag2upload browse.dgit.debian.org archive, before checking the Debian Archive. This ought to be really simple to implement, but other matters came first.

GitLab or Codeberg?

Everything is written using basic POSIX /bin/sh shell scripts. Debian Taco uses the GitLab CI/CD Pipeline mechanism together with a Hetzner S3 object storage to serve packages. The scripts have only weak reliance on GitLab-specific principles, and were designed with the intention to support other platforms. I believe reliance on a particular CI/CD platform is a limitation, so I’d like to explore shipping Debian Taco through a Forgejo-based architecture, possibly via Codeberg as soon as I manage to deploy reliable Forgejo runners.

The important aspects that are required are:

1) Pipelines that can build and publish web sites similar to GitLab Pages. Codeberg has a pipeline mechanism. I’ve successfully used Codeberg Pages to publish the OATH Toolkit homepage homepage. Glueing this together seems feasible.

2) Container Registry. It seems Forgejo supports a Container Registry but I’ve not worked with it at Codeberg to understand if there are any limitations.

3) Package Registry. The Deban Taco live images are uploaded into a package registry, because they are too big for being served through GitLab Pages. It may be converted to using a Pages mechanism, or possibly through Release Artifacts if multi-GB artifacts are supported on other platforms.

I hope to continue this work and explaining more details in a series of posts, stay tuned!

09 January, 2026 04:33PM by simon

Russell Coker

LEAF ZE1 After 6 Months

About 6 months ago I got a Nissan LEAF ZE1 (2019 model) [1]. Generally it’s going well and I’m happy with most things about it.

One issue is that as there isn’t a lot of weight in the front with the batteries in the centre of the car the front wheels slip easily when accelerating. It’s a minor thing but a good reason for wanting AWD in an electric car.

When I got the car I got two charging devices, the one to charge from a regular 240V 10A power point (often referred to as a “granny charger”) and a cable with a special EV charging connector on each end. The cable with an EV connector on each end is designed for charging that’s faster than the “granny charger” but not as fast as the rapid chargers which have the cable connected to the supply so the cable temperature can be monitored and/or controlled. That cable can be used if you get a fast charger setup at your home (which I never plan to do) and apparently at some small hotels and other places with home-style EV charging. I’m considering just selling that cable on ebay as I don’t think I have any need to personally own a cable other than the “granny charger”.

The key fob for the LEAF has a battery installed, it’s either CR2032 or CR2025 – mine has CR2025. Some reports on the Internet suggest that you can stuff a CR2032 battery in anyway but that didn’t work for me as the thickness of the battery stopped some of the contacts from making a good connection. I think I could have got it going by putting some metal in between but the batteries aren’t expensive enough to make it worth the effort and risk. It would be nice if I could use batteries from my stockpile of CR2032 batteries that came from old PCs but I can afford to spend a few dollars on it.

My driveway is short and if I left the charger out it would be visible from the street and at risk of being stolen. I’m thinking of chaining the charger to a tree and having some sort of waterproof enclosure for it so I don’t have to go to the effort of taking it out of the boot every time I use it. Then I could also configure the car to only charge during the peak sunlight hours when the solar power my home feeds into the grid has a negative price (we have so much solar power that it’s causing grid problems).

The cruise control is a pain to use, so much so that I haven’t yet got it to work usefully ever. The features look good in the documentation but in practice it’s not as good as the Kia one I’ve used previously where I could just press one button to turn it on, another button to set the current speed as the cruise control speed, and then just have it work.

The electronic compass built in to the dash turned out to be surprisingly useful. I regret not gluing a compass to the dash of previous cars. One example is when I start google navigation for a journey and it says “go South on street X” and I need to know which direction is South so I don’t start in the wrong direction. Another example is when I know that I’m North of a major road that I need to take to get to my destination so I just need to go roughly South and that is enough to get me to a road I recognise.

In the past when there is a bird in the way I don’t do anything different, I keep driving at the same speed and rely on the bird to see me and move out of the way. Birds have faster reactions than humans and have evolved to move at the speeds cars travel on all roads other than freeways, also birds that are on roads are usually ones that have an eye in each side of their head so they can’t not see my car approaching. For decades this has worked, but recently a bird just stood on the road and got squashed. So I guess that I should honk when there’s birds on the road.

Generally everything about the car is fine and I’m happy to keep driving it.

09 January, 2026 03:32AM by etbe

January 08, 2026

Reproducible Builds

Reproducible Builds in December 2025

Welcome to the December 2025 from the Reproducible Builds project!

Our monthly reports outline what we’ve been up to over the past month, highlighting items of news from elsewhere in the increasingly-important area of software supply-chain security. As ever, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.

  1. New orig-check service to validate Debian upstream tarballs
  2. Distribution work
  3. disorderfs updated to FUSE 3
  4. Mailing list updates
  5. Three new academic papers published
  6. Website updates
  7. Upstream patches

New orig-check service to validate Debian upstream tarballs

This month, Debian Developer Lucas Nussbaum announced the orig-check service, which attempts to automatically reproduce the generation upstream tarballs (ie. the “original source” component of a Debian source package), comparing that to the upstream tarball actually shipped with Debian.

As of the time of writing, it is possible for a Debian developer to upload a source archive that does not actually correspond to upstream’s version. Whilst this is not inherently malicious (it typically indicates some tooling/process issue), the very possibility that a maintainer’s version may differ potentially permits a maintainer to make (malicious) changes that would be misattributed to upstream.

This service therefore nicely complements the whatsrc.org service, which was reported in our reports for both April and August. The orig-check is dedicated to Lunar, who sadly passed away a year ago.


Distribution work

In Arch Linux this month, Robin Candau and Mark Hegreberg worked at making the Arch Linux WSL image bit-for-bit reproducible. Robin also shared some implementation details and future related work on our mailing list.

Continuing a series reported in these reports for March, April and July 2025 (etc.), Simon Josefsson has published another interesting article this month, itself a followup to a post Simon published in December 2024 regarding GNU Guix Container Images that are hosted on GitLab.

In Debian this month, Micha Lenk posted to the debian-backports-announce mailing list with the news that the Backports archive will now discard binaries generated and uploaded by maintainers: “The benefit is that all binary packages [will] get built by the Debian buildds before we distribute them within the archive.”

Felix Moessbauer of Siemens then filed a bug in the Debian bug tracker to signal their intention to package debsbom, a software bill of materials (SBOM) generator for distributions based on Debian. This generated a discussion on the bug inquiring about the output format as well as a question about how these SBOMs might be distributed.

Holger Levsen merged a number of significant changes written by Alper Nebi Yasak to the Debian Installer in order to improve its reproducibility. As noted in Alper’s merge request, “These are the reproducibility fixes I looked into before bookworm release, but was a bit afraid to send as it’s just before the release, because the things like the xorriso conversion changes the content of the files to try to make them reproducible.”

In addition, 76 reviews of Debian packages were added, 8 were updated and 27 were removed this month adding to our knowledge about identified issues. A new different_package_content_when_built_with_nocheck issue type was added by Holger Levsen. []

Arnout Engelen posted to our mailing list reporting that they successfully reproduced the NixOS minimal installation ISO for the 25.11 release without relying on a pre-compiled package archive, with more details on their blog.

Lastly, Bernhard M. Wiedemann posted another openSUSE monthly update for his work there.


disorderfs updated to FUSE 3

disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into system calls to reliably flush out reproducibility issues.

This month, however, Roland Clobus upgraded disorderfs* from FUSE 2 to FUSE 3 after its package automatically got removed from Debian testing. Some tests in Debian currently require disorderfs to make the Debian live images reproducible, although disorderfs is not a Debian-specific tool.


Mailing list updates

On our mailing list this month:

  • Luca Di Maio announced stampdalf, a “filesystem timestamp preservation” tool that wraps “arbitrary commands and ensures filesystem timestamp reproducibility”:

    stampdalf allows you to run any command that modifies files in a directory tree, then automatically resets all timestamps back to their original values. Any new files created during command execution are set to [the UNIX epoch] or a custom timestamp via SOURCE_DATE_EPOCH.

    The project’s GitHub page helpfully reveals that the project is “pronounced: stamp-dalf (stamp like time-stamp, dalf like Gandalf the wizard)” as “it’s a wizard of time and stamps”.)

  • Lastly, Reproducible Builds developer cen1 posted to our list announcing that “early/experimental/alpha” support for FreeBSD was added to rebuilderd. In their post, cen1 reports that the “initial builds are in progress and look quite decent”. cen1 also interestingly notes that “since the upstream is currently not technically reproducible I had to relax the bit-for-bit identical requirement of rebuilderd [—] I consider the pkg to be reproducible if the tar is content-identical (via diffoscope), ignoring timestamps and some of the manifest files.”.


Three new academic papers published

Yogya Gamage and Benoit Baudry of Université de Montréal, Canada together with Deepika Tiwari and Martin Monperrus of KTH Royal Institute of Technology, Sweden published a paper on The Design Space of Lockfiles Across Package Managers:

Most package managers also generate a lockfile, which records the exact set of resolved dependency versions. Lockfiles are used to reduce build times; to verify the integrity of resolved packages; and to support build reproducibility across environments and time. Despite these beneficial features, developers often struggle with their maintenance, usage, and interpretation. In this study, we unveil the major challenges related to lockfiles, such that future researchers and engineers can address them. […]

A PDF of their paper is available online.

Benoit Baudry also posted an announcement to our mailing list, which generated a number of replies.


Betul Gokkaya, Leonardo Aniello and Basel Halak of the University of Southampton then published a paper on the A taxonomy of attacks, mitigations and risk assessment strategies within the software supply chain:

While existing studies primarily focus on software supply chain attacks’ prevention and detection methods, there is a need for a broad overview of attacks and comprehensive risk assessment for software supply chain security. This study conducts a systematic literature review to fill this gap. By analyzing 96 papers published between 2015-2023, we identified 19 distinct SSC attacks, including 6 novel attacks highlighted in recent studies. Additionally, we developed 25 specific security controls and established a precisely mapped taxonomy that transparently links each control to one or more specific attacks. […]

A PDF of the paper is available online via the article’s canonical page.


Aman Sharma and Martin Monperrus of the KTH Royal Institute of Technology, Sweden along with Benoit Baudry of Université de Montréal, Canada published a paper this month on Causes and Canonicalization of Unreproducible Builds in Java. The abstract of the paper is as follows:

[Achieving] reproducibility at scale remains difficult, especially in Java, due to a range of non-deterministic factors and caveats in the build process. In this work, we focus on reproducibility in Java-based software, archetypal of enterprise applications. We introduce a conceptual framework for reproducible builds, we analyze a large dataset from Reproducible Central, and we develop a novel taxonomy of six root causes of unreproducibility. […]

A PDF of the paper is available online.


Website updates

Once again, there were a number of improvements made to our website this month including:


Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:



Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

08 January, 2026 10:51PM

Dima Kogan

Meshroom packaged for Debian

Like the title says, I just packaged Meshroom (and all the adjacent dependencies) for Debian! This is a fancy photogrammetry toolkit that uses modern software development methods. "Modern" meaning that it has a multitude of dependencies that come from lots of disparate places, which make it impossible for a mere mortal to build the thing. The Linux "installer" is 13GB and probably is some sort of container, or something.

But now, if you have a Debian/sid box with the non-free repos enabled, you can

sudo apt install meshroom

And then you can generate and 3D-print a life-size, geometrically-accurate statue of your cat. The colmap package does a similar thing, and has been in Debian for a while. I think it can't do as many things, but it's good to have both tools easily available.

These packages are all in contrib, because they depend on a number of non-free things, most notably CUDA.

This is currently in Debian/sid, but should be picked up by the downstream distros as they're released. The next noteworthy one is Ubuntu 26.04. Testing and feedback welcome.

08 January, 2026 03:34PM by Dima Kogan

Sven Hoexter

Moving from hexchat to Halloy

I'm not hanging around on IRC a lot these days, but when I do I used hexchat (and xchat before). Probably a bad habbit of clinging to what I got used to for the past 25 years. But in the light of the planned removal of GTK2, it felt like it was time to look for an alternative.

Halloy looked interesting, albeit not packaged for Debian. But upstream references a flatpak (another party I did not join so far), good enough to give it a try.

$ sudo apt install flatpak
$ flatpak remote-add --if-not-exists flathub https://dl.flathub.org/repo/flathub.flatpakrepo
$ flatpak install org.squidowl.halloy
$ flatpak run org.squidowl.halloy

Configuration ends up at ~/.var/app/org.squidowl.halloy/config/halloy/config.toml, which I linked for convenience to ~/.halloy.toml.

Since I connect via ZNC in an odd old setup without those virtual networks, but several accounts, and of course never bothered to replace the self signed certificate, it requires some additional configuration to be able to connect. Each account gets its own servers.<foo> block like this:

[servers.bnc-oftc]
nickname = "my-znc-user-for-this-network"
server = "sven.stormbind.net"
dangerously_accept_invalid_certs = true
password = "mypasssowrd"
port = 4711
use_tls = true

Halloy has also a small ZNC guide.

I'm growing old, so a bigger font size is useful. Be aware that font changes require an application restart to take effect.

[font]
size = 16
family = "Noto Mono"

I also prefer the single-pane mode which could be copy & pasted as documented.

Works good enough for now. hexchat was also the last none wayland application I've been using (xlsclients output is finally empty).

08 January, 2026 10:35AM

January 07, 2026

hackergotchi for Gunnar Wolf

Gunnar Wolf

Artificial Intelligence • Play or break the deck

This post is an unpublished review for Artificial Intelligence • Play or break the deck

As a little disclaimer, I usually review books or articles written in English, and although I will offer this review to Computing Reviews as usual, it is likely it will not be published. The title of this book in Spanish is Inteligencia artificial: jugar o romper la baraja.

I was pointed at this book, published last October by Margarita Padilla García, a well known Free Software activist from Spain who has long worked on analyzing (and shaping) aspects of socio-technological change. As other books published by Traficantes de sueños, this book is published as Open Access, under a CC BY-NC license, and can be downloaded in full. I started casually looking at this book, with too long a backlog of material to read, but soon realized I could just not put it down: it completely captured me.

This book presents several aspects of Artificial Intelligence (AI), written for a general, non-technical audience. Many books with a similar target have been published, but this one is quite unique; first of all, it is written in a personal, non-formal tone. Contrary to what’s usual in my reading, the author made the explicit decision not to fill the book with references to her sources (“because searching on Internet, it’s very easy to find things”), making the book easier to read linearly — a decision I somewhat regret, but recognize helps develop the author’s style.

The book has seven sections, dealing with different aspects of AI. They are the “Visions” (historical framing of the development of AI); “Spectacular” (why do we feel AI to be so disrupting, digging particularly into game engines and search space); “Strategies”, explaining how multilayer neural networks work and linking the various branches of historic AI together, arriving at Natural Language Processing; “On the inside”, tackling technical details such as algorithms, the importance of training data, bias, discrimination; “On the outside”, presenting several example AI implementations with socio-ethical implications; “Philosophy”, presenting the works of Marx, Heidegger and Simondon in their relation with AI, work, justice, ownership; and “Doing”, presenting aspects of social activism in relation to AI. Each part ends with yet another personal note: Margarita Padilla includes a letter to one of her friends related to said part.

Totalling 272 pages (A5, or roughly half-letter, format), this is a rather small book. I read it probably over a week. So, while this book does not provide lots of new information to me, the way how it was written, made it a very pleasing experience, and it will surely influence the way I understand or explain several concepts in this domain.

07 January, 2026 07:46PM

Thorsten Alteholz

My Debian Activities in December 2025

Debian LTS/ELTS

This was my hundred-thirty-eighth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. (As the LTS- and ELTS-teams have been merged now, there is only one paragraph left for both activities.)

During my allocated time I uploaded or worked on:

  • [cups] upload to unstable to fix an issue with the latest security upload
  • [libcoap3] uploaded to unstable to fix ten CVEs
  • [gcal] check whether security bug reports are really security bug reports (no, they are not and no CVEs have been issued yet)
  • [#1124284] trixie-pu for libcoap3 to fix ten CVEs in Trixie.
  • [#1121342] trixie-pu bug; debdiff has been approved and libcupsfilters uploaded.
  • [#1121391] trixie-pu bug; debdiff has been approved and cups-filter uploaded.
  • [#1121392] bookworm-pu bug; debdiff has been approved and cups-filter uploaded.
  • [#1121433] trixie-pu bug; debdiff has been approved and rlottie uploaded.
  • [#1121437] bookworm-pu bug; debdiff has been approved and rlottie uploaded.
  • [#1124284] trixie-pu bug; debdiff has been approved and libcoap3 uploaded.

I also tried to backport the libcoap3-patches to Bookworm, but I am afraid the changes would be too intrusive.

When I stumbled upon a comment for 7zip about “finding the patches might be a hard”, I couldn’t believe it. Well, Daniel was right and I didn’t find any.

Furthermore I worked on suricata, marked some CVEs as not-affected or ignored, and added some new patches. Unfortunately my allocated time was spent before I could do a new upload.

I also attended the monthly LTS/ELTS meeting.

Last but not least I injected some packages for uploads to security-master.

Debian Printing

This month I uploaded a new upstream version or a bugfix version of:

  • cups to unstable.

This work is generously funded by Freexian!

Debian Lomiri

I started to contribute to Lomiri packages, which are part of the Debian UBports Team. As a first step I took care of failing CI pipelines and tried to fix them. A next step would be to package some new Applications.

This work is generously funded by Fre(i)e Software GmbH!

Debian Astro

This month I uploaded a new upstream version or a bugfix version of:

Debian IoT

This month I uploaded a new upstream version or a bugfix version of:

Debian Mobcom

Unfortunately I didn’t found any time to work on this topic.

misc

This month I uploaded a new upstream version or a bugfix version of:

Last but not least, I wish (almost) everbody a Happy New Year and hope that you are able to stick to your New Year’s resolutions.

07 January, 2026 02:54PM by alteholz

January 06, 2026

Ingo Juergensmann

Outages on Nerdculture.de due to Ceph – Part 2

Last weekend I had “fun” with Ceph again on a Saturday evening. But let’s start at the beginning….

Before the weekend I announced a downtime/maintenance windows to upgrade PostgreSQL from v15 to v17 – because of the Debian upgrade from Bookworm to Trixie. After some tests with a cloned VM I decided use the quick path of pg_ugradecluster 15 main -v 17 -m upgrade –clone. As this would be my first time to upgrade PostgreSQL that way, I made several backups. In the end everything went smooth and the database is now on v17.

However, there was also a new Proxmox kernel and packages, so I also upgrade one Proxmox node and rebootet it. And then the issues began:

But before that I also encountered an issue with Redis for Mastodon. It complained about this:

Unable to obtain the AOF file appendonly.aof.4398.base.rdb

Solution to this was to change redis configuration to autoappend no.

And then CephFS was unavailable again, complaining about laggy MDS or no MDS at all, which – of course – was totally wrong. I search for solutions and read many forum posts in the Proxmox forum, but nothing helped. I also read the official Ceph documentation. After a whole day offline for all of the services to my thousands of users, I somehow managed to get systemctl reset-failed mnt-pve-cephfs && systemctl start mnt-pve-cephfs again. Shortly before that I followed the advice in the Ceph docs for RADOS Health and there especially section about Troubleshooting Monitors.

In the end, I can’t say which step exactly did the trick that CephFS was working again. But as it seems, I will have one or two more chances to find out, because only one server out of three is currently updated.

Another issue during the downtime also was that one server crashed/rebooted and didn’t came back. It hang in the midst of an upgrade at the point of upgrade-grub. Usually it wouldn’t be a big deal: just go the IPMI website and reboot the server.

Nah! That’s too simple!

For some unknow reason the IPMI interfaces lost their DHCP leases: the DHCP server at the colocation was not serving IPs. So I opened a ticket, got some acknowledgement from the support but also a statement “maybe tomorrow or on Monday…”. Hmpf!

On Sunday evening I managed to bring back CephFS. As said: no idea what specific step did the trick. But the story continues: On Monday before lunch time the IPMI DHCP was working again and I could access the web interfaces again, logged in…. and was forcefully locked out again:

Your session has timed out. You will need to open a new session

I hit the problem described here. But cold resetting the BMC didn’t work. So still no working web interface to deal with the issue. But on my phone I got “IPMIView” as app and that still worked and showed the KVM console. But what I saw there didn’t make me happy as well:

The reason for this is apparently the crash while running update-grub. Anyway, using the Grub bootloader and selecting an older kernel works fine. The server boots, Proxmox is showing the node as up and…. the working CephFS is stalled again! Fsck!

Rebooting the node or stopping Ceph on that node results immediatedly in a working CephFS again.

Currently I’m moving everything off of Ceph to the local disks of the two nodes. If everything is on local disks I can work on debugging CephFS without interrupting the service for the users (hopefully). But this also means that there will be no redundancy for Mastodon and mail.

When I have more detailled information about possible reasons and such, I may post to the Proxmox forum.

06 January, 2026 03:57PM by ij

January 05, 2026

hackergotchi for Matthew Garrett

Matthew Garrett

Not here

Hello! I am not posting here any more. You can find me here instead. Most Planets should be updated already (I've an MR open for Planet Gnome), but if you're subscribed to my feed directly please update it.

comment count unavailable comments

05 January, 2026 10:26PM

hackergotchi for Colin Watson

Colin Watson

Free software activity in December 2025

About 95% of my Debian contributions this month were sponsored by Freexian.

You can also support my work directly via Liberapay or GitHub Sponsors.

Python packaging

I upgraded these packages to new upstream versions:

Python 3.14 is now a supported version in unstable, and we’re working to get that into testing. As usual this is a pretty arduous effort because it requires going round and fixing lots of odds and ends across the whole ecosystem. We can deal with a fair number of problems by keeping up with upstream (see above), but there tends to be a long tail of packages whose upstreams are less active and where we need to chase them, or where problems only show up in Debian for one reason or another. I spent a lot of time working on this:

Fixes for pytest 9:

I filed lintian: Report Python egg-info files/directories to help us track the migration to pybuild-plugin-pyproject.

I did some work on dh-python: Normalize names in pydist lookups and pyproject plugin: Support headers (the latter of which allowed converting python-persistent and zope.proxy to pybuild-plugin-pyproject, although it needed a follow-up fix).

I fixed or helped to fix several other build/test failures:

Other bugs:

Other bits and pieces

Code reviews

05 January, 2026 01:08PM by Colin Watson

hackergotchi for Bits from Debian

Bits from Debian

Debian welcomes Outreachy interns for December 2025-March 2026 round

Outreachy logo

Debian continues participating in Outreachy, and as you might have already noticed, Debian has selected two interns for the Outreachy December 2025 - March 2026 round.

After a busy contribution phase and a competitive selection process, Hellen Chemtai Taylor and Isoken Ibizugbe are officially working as interns on Debian Images Testing with OpenQA for the past month, mentored by Tássia Camões Araújo, Roland Clobus and Philip Hands.

Congratulations and welcome Hellen Chemtai Taylor and Isoken Ibizugbe!

The team also congratulates all candidates for their valuable contributions, with special thanks to those who manage to continue participating as volunteers.

From the official website: Outreachy provides three-month internships for people from groups traditionally underrepresented in tech. Interns work remotely with mentors from Free and Open Source Software (FOSS) communities on projects ranging from programming, user experience, documentation, illustration and graphical design, to data science.

The Outreachy programme is possible in Debian thanks to the efforts of Debian developers and contributors who dedicate their free time to mentor students and outreach tasks, and the Software Freedom Conservancy's administrative support, as well as the continued support of Debian's donors, who provide funding for the internships.

Join us and help to improve Debian! You can follow the work of the Outreachy interns reading their blog posts (syndicated in Planet Debian), and chat with the team at the debian-openqa matrix channel. For Outreachy matters, the programme admins can be reached on #debian-outreach IRC/matrix channel and mailing list.

05 January, 2026 09:00AM by Anupa Ann Joseph, Tássia Camões Araújo

Vincent Bernat

Using eBPF to load-balance traffic across UDP sockets with Go

Akvorado collects sFlow and IPFIX flows over UDP. Because UDP does not retransmit lost packets, it needs to process them quickly. Akvorado runs several workers listening to the same port. The kernel should load-balance received packets fairly between these workers. However, this does not work as expected. A couple of workers exhibit high packet loss:

$ curl -s 127.0.0.1:8080/api/v0/inlet/metrics \
> | sed -n s/akvorado_inlet_flow_input_udp_in_dropped//p
packets_total{listener="0.0.0.0:2055",worker="0"} 0
packets_total{listener="0.0.0.0:2055",worker="1"} 0
packets_total{listener="0.0.0.0:2055",worker="2"} 0
packets_total{listener="0.0.0.0:2055",worker="3"} 1.614933572278264e+15
packets_total{listener="0.0.0.0:2055",worker="4"} 0
packets_total{listener="0.0.0.0:2055",worker="5"} 0
packets_total{listener="0.0.0.0:2055",worker="6"} 9.59964121598348e+14
packets_total{listener="0.0.0.0:2055",worker="7"} 0

eBPF can help by implementing an alternate balancing algorithm. �

Options for load-balancing

There are three methods to load-balance UDP packets across workers:

  1. One worker receives the packets and dispatches them to the other workers.
  2. All workers share the same socket.
  3. Each worker has its own socket, listening to the same port, with the SO_REUSEPORT socket option.

SO_REUSEPORT option

Tom Hebert added the SO_REUSEPORT socket option in Linux 3.9. The cover letter for his patch series explains why this new option is better than the two existing ones from a performance point of view:

SO_REUSEPORT allows multiple listener sockets to be bound to the same port. […] Received packets are distributed to multiple sockets bound to the same port using a 4-tuple hash.

The motivating case for SO_RESUSEPORT in TCP would be something like a web server binding to port 80 running with multiple threads, where each thread might have it’s own listener socket. This could be done as an alternative to other models:

  1. have one listener thread which dispatches completed connections to workers, or
  2. accept on a single listener socket from multiple threads.

In case #1, the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load. […] We have seen the disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest. With SO_REUSEPORT the distribution is uniform.

The motivating case for SO_REUSEPORT in UDP would be something like a DNS server. An alternative would be to receive on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock.

Akvorado uses the SO_REUSEPORT option to dispatch the packets across the workers. However, because the distribution uses a 4-tuple hash, a single socket handles all the flows from one exporter.

SO_ATTACH_REUSEPORT_EBPF option

In Linux 4.5, Craig Gallek added the SO_ATTACH_REUSEPORT_EBPF option to attach an eBPF program to select the target UDP socket. In Linux 4.6, he extended it to support TCP. The socket(7) manual page documents this mechanism:1

The BPF program must return an index between 0 and N-1 representing the socket which should receive the packet (where N is the number of sockets in the group). If the BPF program returns an invalid index, socket selection will fall back to the plain SO_REUSEPORT mechanism.

In Linux 4.19, Martin KaFai Lau added the BPF_PROG_TYPE_SK_REUSEPORT program type. Such an eBPF program selects the socket from a BPF_MAP_TYPE_REUSEPORT_ARRAY map instead. This new approach is more reliable when switching target sockets from one instance to another—for example, when upgrading, a new instance can add its sockets and remove the old ones.

Load-balancing with eBPF and Go

Altering the load-balancing algorithm for a group of sockets requires two steps:

  1. write and compile an eBPF program in C,2 and
  2. load it and attach it in Go.

eBPF program in C

A simple load-balancing algorithm is to randomly choose the destination socket. The kernel provides the bpf_get_prandom_u32() helper function to get a pseudo-random number.

volatile const __u32 num_sockets; // �

struct {
    __uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
    __type(key, __u32);
    __type(value, __u64);
    __uint(max_entries, 256);
} socket_map SEC(".maps"); // �

SEC("sk_reuseport")
int reuseport_balance_prog(struct sk_reuseport_md *reuse_md)
{
    __u32 index = bpf_get_prandom_u32() % num_sockets; // �
    bpf_sk_select_reuseport(reuse_md, &socket_map, &index, 0); // �
    return SK_PASS; // �
}

char _license[] SEC("license") = "GPL";

In �, we declare a volatile constant for the number of sockets in the group. We will initialize this constant before loading the eBPF program into the kernel. In �, we define the socket map. We will populate it with the socket file descriptors. In �, we randomly select the index of the target socket.3 In �, we invoke the bpf_sk_select_reuseport() helper to record our decision. Finally, in �, we accept the packet.

Header files

If you compile the C source with clang, you get errors due to missing headers. The recommended way to solve this is to generate a vmlinux.h file with bpftool:

$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

Then, include the following headers:4

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

For my 6.17 kernel, the generated vmlinux.h is quite big: 2.7 MiB. Moreover, bpf/bpf_helpers.h is shipped with libbpf. This adds another dependency for users. As the eBPF program is quite small, I prefer to put the strict minimum in vmlinux.h by cherry-picking the definitions I need.

Compilation

The eBPF Library for Go ships bpf2go, a tool to compile eBPF programs and to generate some scaffolding code. We create a gen.go file with the following content:

package main

//go:generate go tool bpf2go -tags linux reuseport reuseport_kern.c

After running go generate ./..., we can inspect the resulting objects with readelf and llvm-objdump:

$ readelf -S reuseport_bpfeb.o
There are 14 section headers, starting at offset 0x840:
  [Nr] Name              Type             Address           Offset
[…]
  [ 3] sk_reuseport      PROGBITS         0000000000000000  00000040
  [ 6] .maps             PROGBITS         0000000000000000  000000c8
  [ 7] license           PROGBITS         0000000000000000  000000e8
[…]
$ llvm-objdump -S reuseport_bpfeb.o
reuseport_bpfeb.o:  file format elf64-bpf
Disassembly of section sk_reuseport:
0000000000000000 <reuseport_balance_prog>:
; {
       0:   bf 61 00 00 00 00 00 00     r6 = r1
;     __u32 index = bpf_get_prandom_u32() % num_sockets;
       1:   85 00 00 00 00 00 00 07     call 0x7
[…]

Usage from Go

Let’s set up 10 workers listening to the same port.5 Each socket enables the SO_REUSEPORT option before binding:6

var (
    err error
    fds []uintptr
    conns []*net.UDPConn
)
workers := 10
listenAddr := "127.0.0.1:0"
listenConfig := net.ListenConfig{
    Control: func(_, _ string, c syscall.RawConn) error {
        c.Control(func(fd uintptr) {
            err = unix.SetsockoptInt(int(fd), unix.SOL_SOCKET, unix.SO_REUSEPORT, 1)
            fds = append(fds, fd)
        })
        return err
    },
}
for range workers {
    pconn, err := listenConfig.ListenPacket(t.Context(), "udp", listenAddr)
    if err != nil {
        t.Fatalf("ListenPacket() error:\n%+v", err)
    }
    udpConn := pconn.(*net.UDPConn)
    listenAddr = udpConn.LocalAddr().String()
    conns = append(conns, udpConn)
}

The second step is to load the eBPF program, initialize the num_sockets variable, populate the socket map, and attach the program to the first socket.7

// Load the eBPF collection.
spec, err := loadReuseport()
if err != nil {
    t.Fatalf("loadVariables() error:\n%+v", err)
}

// Set "num_sockets" global variable to the number of file descriptors we will register
if err := spec.Variables["num_sockets"].Set(uint32(len(fds))); err != nil {
    t.Fatalf("NumSockets.Set() error:\n%+v", err)
}

// Load the map and the program into the kernel.
var objs reuseportObjects
if err := spec.LoadAndAssign(&objs, nil); err != nil {
    t.Fatalf("loadReuseportObjects() error:\n%+v", err)
}
t.Cleanup(func() { objs.Close() })

// Assign the file descriptors to the socket map.
for worker, fd := range fds {
    if err := objs.reuseportMaps.SocketMap.Put(uint32(worker), uint64(fd)); err != nil {
        t.Fatalf("SocketMap.Put() error:\n%+v", err)
    }
}

// Attach the eBPF program to the first socket.
socketFD := int(fds[0])
progFD := objs.reuseportPrograms.ReuseportBalanceProg.FD()
if err := unix.SetsockoptInt(socketFD, unix.SOL_SOCKET, unix.SO_ATTACH_REUSEPORT_EBPF, progFD); err != nil {
    t.Fatalf("SetsockoptInt() error:\n%+v", err)
}

We are now ready to process incoming packets. Each worker is a Go routine incrementing a counter for each received packet:8

var wg sync.WaitGroup
receivedPackets := make([]int, workers)
for worker := range workers {
    conn := conns[worker]
    packets := &receivedPackets[worker]
    wg.Go(func() {
        payload := make([]byte, 9000)
        for {
            if _, err := conn.Read(payload); err != nil {
                if errors.Is(err, net.ErrClosed) {
                    return
                }
                t.Logf("Read() error:\n%+v", err)
            }
            *packets++
        }
    })
}

Let’s send 1000 packets:

sentPackets := 1000
conn, err := net.Dial("udp", conns[0].LocalAddr().String())
if err != nil {
    t.Fatalf("Dial() error:\n%+v", err)
}
defer conn.Close()
for range sentPackets {
    if _, err := conn.Write([]byte("hello world!")); err != nil {
        t.Fatalf("Write() error:\n%+v", err)
    }
}

If we print the content of the receivedPackets array, we can check the balancing works as expected, with each worker getting about 100 packets:

=== RUN   TestUDPWorkerBalancing
    balancing_test.go:84: receivedPackets[0] = 107
    balancing_test.go:84: receivedPackets[1] = 92
    balancing_test.go:84: receivedPackets[2] = 99
    balancing_test.go:84: receivedPackets[3] = 105
    balancing_test.go:84: receivedPackets[4] = 107
    balancing_test.go:84: receivedPackets[5] = 96
    balancing_test.go:84: receivedPackets[6] = 102
    balancing_test.go:84: receivedPackets[7] = 105
    balancing_test.go:84: receivedPackets[8] = 99
    balancing_test.go:84: receivedPackets[9] = 88

    balancing_test.go:91: receivedPackets = 1000
    balancing_test.go:92: sentPackets     = 1000

Graceful restart

You can also use SO_ATTACH_REUSEPORT_EBPF to gracefully restart an application. A new instance of the application binds to the same address and prepare its own version of the socket map. Once it attaches the eBPF program to the first socket, the kernel steers incoming packets to this new instance. The old instance needs to drain the already received packets before shutting down.

To check we are not losing any packet, we spawn a Go routine to send as many packets as possible:

sentPackets := 0
notSentPackets := 0
done := make(chan bool)
conn, err := net.Dial("udp", conns1[0].LocalAddr().String())
if err != nil {
    t.Fatalf("Dial() error:\n%+v", err)
}
defer conn.Close()
go func() {
    for {
        if _, err := conn.Write([]byte("hello world!")); err != nil {
            notSentPackets++
        } else {
            sentPackets++
        }
        select {
        case <-done:
            return
        default:
        }
    }
}()

Then, while the Go routine runs, we start the second set of workers. Once they are running, they start receiving packets. If we gracefully stop the initial set of workers, not a single packet is lost!9

=== RUN   TestGracefulRestart
    graceful_test.go:135: receivedPackets1[0] = 165
    graceful_test.go:135: receivedPackets1[1] = 195
    graceful_test.go:135: receivedPackets1[2] = 194
    graceful_test.go:135: receivedPackets1[3] = 190
    graceful_test.go:135: receivedPackets1[4] = 213
    graceful_test.go:135: receivedPackets1[5] = 187
    graceful_test.go:135: receivedPackets1[6] = 170
    graceful_test.go:135: receivedPackets1[7] = 190
    graceful_test.go:135: receivedPackets1[8] = 194
    graceful_test.go:135: receivedPackets1[9] = 155

    graceful_test.go:139: receivedPackets2[0] = 1631
    graceful_test.go:139: receivedPackets2[1] = 1582
    graceful_test.go:139: receivedPackets2[2] = 1594
    graceful_test.go:139: receivedPackets2[3] = 1611
    graceful_test.go:139: receivedPackets2[4] = 1571
    graceful_test.go:139: receivedPackets2[5] = 1660
    graceful_test.go:139: receivedPackets2[6] = 1587
    graceful_test.go:139: receivedPackets2[7] = 1605
    graceful_test.go:139: receivedPackets2[8] = 1631
    graceful_test.go:139: receivedPackets2[9] = 1689

    graceful_test.go:147: receivedPackets = 18014
    graceful_test.go:148: sentPackets     = 18014

Unfortunately, gracefully shutting down a UDP socket is not trivial in Go.10 Previously, we were terminating workers by closing their sockets. However, if we close them too soon, the application loses packets that were assigned to them but not yet processed. Before stopping, a worker needs to call conn.Read() until there are no more packets. A solution is to set a deadline for conn.Read() and check if we should stop the Go routine when the deadline is exceeded:

payload := make([]byte, 9000)
for {
    conn.SetReadDeadline(time.Now().Add(50 * time.Millisecond))
    if _, err := conn.Read(payload); err != nil {
        if errors.Is(err, os.ErrDeadlineExceeded) {
            select {
            case <-done:
                return
            default:
                continue
            }
        }
        t.Logf("Read() error:\n%+v", err)
    }
    *packets++
}

With TCP, this aspect is simpler: after enabling the net.ipv4.tcp_migrate_req sysctl, the kernel automatically migrates waiting connections to a random socket in the same group. Alternatively, eBPF can also control this migration. Both features are available since Linux 5.14.

Addendum

After implementing this strategy in Akvorado, all workers now drop packets! 😱

$ curl -s 127.0.0.1:8080/api/v0/inlet/metrics \
> | sed -n s/akvorado_inlet_flow_input_udp_in_dropped//p
packets_total{listener="0.0.0.0:2055",worker="0"} 838673
packets_total{listener="0.0.0.0:2055",worker="1"} 843675
packets_total{listener="0.0.0.0:2055",worker="2"} 837922
packets_total{listener="0.0.0.0:2055",worker="3"} 841443
packets_total{listener="0.0.0.0:2055",worker="4"} 840668
packets_total{listener="0.0.0.0:2055",worker="5"} 850274
packets_total{listener="0.0.0.0:2055",worker="6"} 835488
packets_total{listener="0.0.0.0:2055",worker="7"} 834479

The root cause is the default limit of 32 records for Kafka batch sizes. This limit is too low because the brokers have a large overhead when handling each batch: they need to ensure they persist correctly before acknowledging them. Increasing the limit to 4096 records fixes this issue.

While load-balancing incoming flows with eBPF remains useful, it did not solve the main issue. At least the even distribution of dropped packets helped identify the real bottleneck. 😅


  1. The current version of the manual page is incomplete and does not cover the evolution introduced in Linux 4.19. There is a pending patch about this. ↩�

  2. Rust is another option. However, the program we use is so trivial that it does not make sense to use Rust. ↩�

  3. As bpf_get_prandom_u32() returns a pseudo-random 32-bit unsigned value, this method exhibits a very slight bias towards the first indexes. This is unlikely to be worth fixing. ↩�

  4. Some examples include <linux/bpf.h> instead of "vmlinux.h". This makes your eBPF program dependent on the installed kernel headers. ↩�

  5. listenAddr is initially set to 127.0.0.1:0 to allocate a random port. After the first iteration, it is updated with the allocated port. ↩�

  6. This is the setupSockets() function in fixtures_test.go. ↩�

  7. This is the setupEBPF() function in fixtures_test.go. ↩�

  8. The complete code is in balancing_test.go ↩�

  9. The complete code is in graceful_test.go ↩�

  10. In C, we would poll() both the socket and a pipe used to signal for shutdown. When the second condition is triggered, we drain the socket by executing a series of non-blocking read() until we get EWOULDBLOCK. ↩�

05 January, 2026 08:51AM by Vincent Bernat

hackergotchi for Jonathan McDowell

Jonathan McDowell

Free Software Activities for 2025

Given we’ve entered a new year it’s time for my annual recap of my Free Software activities for the previous calendar year. For previous years see 2019, 2020, 2021, 2022, 2023 + 2024.

Conferences

My first conference of the year was FOSDEM. I’d submitted a talk proposal about system attestation in production environments for the attestation devroom, but they had a lot of good submissions and mine was a bit more “this is how we do it” rather than “here’s some neat Free Software that does it”. I’m still trying to work out how to make some of the bits we do more open, but the problem is a lot of the neat stuff is about taking internal knowledge about what should be running and making sure that’s the case, and what you end up with if you abstract that is a toolkit that still needs a lot of work to get something useful.

I’d more luck at DebConf25 where I gave a talk (Don’t fear the TPM) trying to explain how TPMs could be useful in a Debian context. Naturally the comments section descended into a discussion about UEFI Secure Boot, which is a separate, if related, thing. DebConf also featured the usual catch up with fellow team members, hanging out with folk I hadn’t seen in ages, and generally feeling a bit more invigorated about Debian.

Other conferences I considered, but couldn’t justify, were All Systems Go! and the Linux Plumbers Conference. I’ve no doubt both would have had a bunch of interesting and relevant talks + discussions, but not enough this year.

I’m going to have to miss FOSDEM this year, due to travel later in the month, and I’m uncertain if I’m going to make DebConf (for a variety of reasons). That means I don’t have a Free Software conference planned for 2026. Ironically FOSSY moving away from Portland makes it a less appealing option (I have Portland friends it would be good to visit). Other than potential Debian MiniConfs, anything else European I should consider?

Debian

I continue to try and keep RetroArch in shape, with 1.22.2+dfsg-1 (and, shortly after, 1.22.2+dfsg-2 - git-buildpackage in trixie seems more strict about Build-Depends existing in the outside environment, and I keep forgetting I need Build-Depends-Arch and Build-Depends-Indep to be pretty much the same with a minimal Build-Depends that just has enough for the clean target) getting uploaded in December, and 1.20.0+dfsg-1, 1.20+dfsg-2 + 1.20+dfsg-3 all being uploaded earlier in the year. retroarch-assets had 1.20.0+dfsg-1 uploaded back in April. I need to find some time to get 1.22.0 packaged. libretro-snes9x got updated to 1.63+dfsg-1.

sdcc saw 4.5.0+dfsg-1, 4.5.0+dfsg-2, 4.5.0+dfsg-3 (I love major GCC upgrades) and 4.5.0-dfsg-4 uploads. There’s an outstanding bug around a LaTeX error building the manual, but this turns out to be a bug in the 2.5 RC for LyX. Huge credit to Tobias Quathamer for engaging with this, and Pavel Sanda + Jürgen Spitzmüller from the LyX upstream for figuring out the issue + a fix.

Pulseview saw 0.4.2-4 uploaded to fix issues with the GCC 15 + CMake upgrades. I should probably chase the sigrok upstream about new releases; I think there are a bunch of devices that have gained support in git without seeing a tagged release yet.

I did an Electronics Team upload for gputils 1.5.2-2 to fix compilation with GCC 15.

While I don’t do a lot with storage devices these days if I can help it I still pay a little bit of attention to sg3-utils. That resulted in 1.48-2 and 1.48-3 uploads in 2025.

libcli got a 1.10.7-3 upload to deal with the libcrypt-dev split out.

Finally I got more up-to-date versions of libtorrent (0.15.7-1) and rtorrent (also 0.15.7-1) uploaded to experimental. There’s a ppc64el build failure in libtorrent, but having asked on debian-powerpc this looks like a flaky test/code and I should probably go ahead and upload to unstable.

I sponsored some uploads for Michel Lind - the initial uploads of plymouth-theme-hot-dog, and the separated out pykdumpfile package.

Recognising the fact I wasn’t contributing in a useful fashion to the Data Protection Team I set about trying to resign in an orderly fashion - see Andreas’ call for volunteers that went out in the last week. Shout out to Enrico for pointing out in the past that we should gracefully step down from things we’re not actually managing to do, to avoid the perception it’s all fine and no one else needs to step up. Took me too long to act on it.

The Debian keyring team continues to operate smoothly, maintaining our monthly release cadence with a 3 month rotation ensuring all team members stay familiar with the process, and ensure their setups are still operational (especially important after Debian releases). I handled the 2025.03.23, 2025.06.24, 2025.06.27, 2025.09.18, 2025.12.08 + 2025.12.26 pushes.

Linux

TPM related fixes were the theme of my kernel contributions in 2025, all within a work context. Some were just cleanups, but several fixed real issues that were causing us issues. I’ve also tried to be more proactive about reviewing diffs in the TPM subsystem; it feels like a useful way to contribute, as well as making me more actively pay attention to what’s going on there.

Personal projects

I did some work on onak, my OpenPGP keyserver. That resulted in a 0.6.4 release, mainly driven by fixes for building with more recent CMake + GCC versions in Debian. I’ve got a set of changes that should add RFC9580 (v6) support, but there’s not a lot of test keys out there at present for making sure I’m handling things properly. Equally there’s a plan to remove Berkeley DB from Debian, which I’m completely down with, but that means I need a new primary backend. I’ve got a draft of LMDB support to replace that, but I need to go back and confirm I’ve got all the important bits implemented before publishing it and committing to a DB layout. I’d also like to add sqlite support as an option, but that needs some thought about trying to take proper advantage of its features, rather than just treating it as a key-value store.

(I know everyone likes to hate on OpenPGP these days, but I continue to be interested by the whole web-of-trust piece of it, which nothing else I’m aware of offers.)

That about wraps up 2025. Nothing particularly earth shaking in there, more a case of continuing to tread water on the various things I’m involved. I highly doubt 2026 will be much different, but I think that’s ok. I scratch my own itches, and if that helps out other folk too then that’s lovely, but not the primary goal.

05 January, 2026 07:57AM

Russell Coker

Phone Charging Speeds With Debian/Trixie

One of the problems I encountered with the PinePhone Pro (PPP) when I tried using it as a daily driver [1] was the charge speed, both slow charging and a bad ratio of charge speed to discharge speed. I also tried using a One Plus 6 (OP6) which had a better charge speed and battery life but I never got VoLTE to work [2] and VoLTE is a requirement for use in Australia and an increasing number of other countries. In my tests with the Librem 5 from Purism I had similar issues with charge speed [3].

What I want to do is get an acceptable ratio of charge time to use time for a free software phone. I don’t necessarily object to a phone that can’t last an 8 hour day on a charge, but I can’t use a phone that needs to be on charge for 4 hours during the day. For this part I’m testing the charge speed and will test the discharge speed when I have solved some issues with excessive CPU use.

I tested with a cheap USB power monitoring device that is inline between the power cable and the phone. The device has no method of export so I just watched it and when the numbers fluctuated I tried to estimate the average. I only give the results to two significant digits which is about all the accuracy that is available, as I copied the numbers separately the V*A might not exactly equal the W. I idly considered rounding off Voltages to the nearest Volt and current to the half amp but the way the PC USB ports have voltage drop at higher currents is interesting.

This post should be useful for people who want to try out FOSS phones but don’t want to buy the range of phones and chargers that I have bought.

Phones Tested

I have seen claims about improvements with charging speed on the Librem 5 with recent updates so I decided to compare a number of phones running Debian/Trixie as well as some Android phones. I’m comparing an old Samsung phone (which I tried running Droidian on but is now on Android) and a couple of Pixel phones with the three phones that I currently have running Debian for charging.

Chargers Tested

HP Z640

The Librem 5 had problems with charging on a port on the HP ML110 Gen9 I was using as a workstation. I have sold the ML110 and can’t repeat that exact test but I tested on the HP z640 that I use now. The z640 is a much better workstation (quieter and better support for audio and other desktop features) and is also sold as a workstation.

The z640 documentation says that of the front USB ports the top one can do “fast charge (up to 1.5A)” with “USB Battery Charging Specification 1.2”. The only phone that would draw 1.5A on that port was the Librem 5 but the computer would only supply 4.4V at that current which is poor. For every phone I tested the bottom port on the front (which apparently doesn’t have USB-BC or USB-PD) charged at least as fast as the top port and every phone other than the OP6 charged faster on the bottom port. The Librem 5 also had the fastest charge rate on the bottom port. So the rumours about the Librem 5 being updated to address the charge speed on PC ports seem to be correct.

The Wikipedia page about USB Hardware says that the only way to get more than 1.5A from a USB port while operating within specifications is via USB-PD so as USB 3.0 ports the bottom 3 ports should be limited to 5V at 0.9A for 4.5W. The Librem 5 takes 2.0A and the voltage drops to 4.6V so that gives 9.2W. This shows that the z640 doesn’t correctly limit power output and the Librem 5 will also take considerably more power than the specs allow. It would be really interesting to get a powerful PSU and see how much power a Librem 5 will take without negotiating USB-PD and it would also be interesting to see what happens when you short circuit a USB port in a HP z640. But I recommend not doing such tests on hardware you plan to keep using!

Of the phones I tested the only one that was within specifications on the bottom port of the z640 was the OP6. I think that is more about it just charging slowly in every test than conforming to specs.

Monitor

The next test target is my 5120*2160 Kogan monitor with a USB-C port [4]. This worked quite well and apart from being a few percent slower on the PPP it outperformed the PC ports for every device due to using USB-PD (the only way to get more than 5V) and due to just having a more powerful PSU that doesn’t have a voltage drop when more than 1A is drawn.

Ali Charger

The Ali Charger is a device that I bought from AliExpress is a 240W GaN charger supporting multiple USB-PD devices. I tested with the top USB-C port that can supply 100W to laptops.

The Librem 5 has charging going off repeatedly on the Ali charger and doesn’t charge properly. It’s also the only charger for which the Librem 5 requests a higher voltage than 5V, so it seems that the Librem 5 has some issues with USB-PD. It would be interesting to know why this problem happens, but I expect that a USB signal debugger is needed to find that out. On AliExpress USB 2.0 sniffers go for about $50 each and with a quick search I couldn’t see a USB 3.x or USB-C sniffer. So I’m not going to spend my own money on a sniffer, but if anyone in Melbourne Australia owns a sniffer and wants to visit me and try it out then let me know. I’ll also bring it to Everything Open 2026.

Generally the Ali charger was about the best charger from my collection apart from the case of the Librem 5.

Dell Dock

I got a number of free Dell WD15 (aka K17A) USB-C powered docks as they are obsolete. They have VGA ports among other connections and for the HDMI and DisplayPort ports it doesn’t support resolutions higher than FullHD if both ports are in use or 4K if a single port is in use. The resolutions aren’t directly relevant to the charging but it does indicate the age of the design.

The Dell dock seems to not support any voltages other than 5V for phones and 19V (20V requested) for laptops. Certainly not the 9V requested by the Pixel 7 Pro and Pixel 8 phones. I wonder if not supporting most fast charging speeds for phones was part of the reason why other people didn’t want those docks and I got some for free. I hope that the newer Dell docks support 9V, a phone running Samsung Dex will display 4K output on a Dell dock and can productively use a keyboard and mouse. Getting equivalent functionality to Dex working properly on Debian phones is something I’m interested in.

Battery

The “Battery” I tested with is a Chinese battery for charging phones and laptops, it’s allegedly capable of 67W USB-PD supply but so far all I’ve seen it supply is 20V 2.5A for my laptop. I bought the 67W battery just in case I need it for other laptops in future, the Thinkpad X1 Carbon I’m using now will charge from a 30W battery.

There seems to be an overall trend of the most shonky devices giving the best charging speeds. Dell and HP make quality gear although my tests show that some HP ports exceed specs. Kogan doesn’t make monitors, they just put their brand on something cheap. Buying one of the cheapest chargers from AliExpress and one of the cheaper batteries from China I don’t expect the highest quality and I am slightly relieved to have done enough tests with both of those that a fire now seems extremely unlikely. But it seems that the battery is one of the fastest charging devices I own and with the exception of the Librem 5 (which charges slowly on all ports and unreliably on several ports) the Ali charger is also one of the fastest ones. The Kogan monitor isn’t far behind.

Conclusion

Voltage and Age

The Samsung Galaxy Note 9 was released in 2018 as was the OP6. The PPP was first released in 2022 and the Librem 5 was first released in 2020, but I think they are both at a similar technology level to the Note 9 and OP6 as the companies that specialise in phones have a pipeline for bringing new features to market.

The Pixel phones are newer and support USB-PD voltage selection while the other phones either don’t support USB-PD or support it but only want 5V. Apart from the Librem 5 which wants a higher voltage but runs it at a low current and repeatedly disconnects.

Idle Power

One of the major problems I had in the past which prevented me from using a Debian phone as my daily driver is the ratio of idle power use to charging power. Now that the phones seem to charge faster if I can get the idle power use under control then it will be usable.

Currently the Librem 5 running Trixie is using 6% CPU time (24% of a core) while idle and the screen is off (but “Caffeine” mode is enabled so no deep sleep). On the PPP the CPU use varies from about 2% and 20% (12% to 120% of one core), this was mainly plasmashell and kwin_wayland. The OP6 has idle CPU use a bit under 1% CPU time which means a bit under 8% of one core.

The Librem 5 and PPP seem to have configuration issues with KDE Mobile and Pipewire that result in needless CPU use. With those issues addressed I might be able to make a Librem 5 or PPP a usable phone if I have a battery to charge it.

The OP6 is an interesting point of comparison as a Debian phone but is not a viable option as a daily driver due to problems with VoLTE and also some instability – it sometimes crashes or drops off Wifi.

The Librem 5 charges at 9.2W from a a PC that doesn’t obey specs and 10W from a battery. That’s a reasonable charge rate and the fact that it can request 12V (unsuccessfully) opens the possibility to potential higher charge rates in future. That could allow a reasonable ratio of charge time to use time.

The PPP has lower charging speeds then the Librem 5 but works more consistently as there was no charger I found that wouldn’t work well with it. This is useful for the common case of charging from a random device in the office. But the fact that the Librem 5 takes 10W from the battery while the PPP only takes 6.3W would be an issue if using the phone while charging.

Now I know the charge rates for different scenarios I can work on getting the phones to use significantly less power than that on average.

Specifics for a Usable Phone

The 57W battery or something equivalent is something I think I will always need to have around when using a PPP or Librem 5 as a daily driver.

The ability to charge fast while at a desk is also an important criteria. The charge speed of my home PC is good in that regard and the charge speed of my monitor is even better. Getting something equivalent at a desktop of an office I work in is a possibility.

Improving the Debian distribution for phones is necessary. That’s something I plan to work on although the code is complex and in many cases I’ll have to just file upstream bug reports.

I have also ordered a FuriLabs FLX1s [5] which I believe will be better in some ways. I will blog about it when it arrives.

Phone Top z640 Bottom Z640 Monitor Ali Charger Dell Dock Battery Best Worst
Note9 4.8V 1.0A 5.2W 4.8V 1.6A 7.5W 4.9V 2.0A 9.5W 5.1V 1.9A 9.7W 4.8V 2.1A 10W 5.1V 2.1A 10W 5.1V 2.1A 10W 4.8V 1.0A 5.2W
Pixel 7 pro 4.9V 0.80A 4.2W 4.8V 1.2A 5.9W 9.1V 1.3A 12W 9.1V 1.2A 11W 4.9V 1.8A 8.7W 9.0V 1.3A 12W 9.1V 1.3A 12W 4.9V 0.80A 4.2W
Pixel 8 4.7V 1.2A 5.4W 4.7V 1.5A 7.2W 8.9V 2.1A 19W 9.1V 2.7A 24W 4.8V 2.3A 11.0W 9.1V 2.6A 24W 9.1V 2.7A 24W 4.7V 1.2A 5.4W
PPP 4.7V 1.2A 6.0W 4.8V 1.3A 6.8W 4.9V 1.4A 6.6W 5.0V 1.2A 5.8W 4.9V 1.4A 5.9W 5.1V 1.2A 6.3W 4.8V 1.3A 6.8W 5.0V 1.2A 5.8W
Librem 5 4.4V 1.5A 6.7W 4.6V 2.0A 9.2W 4.8V 2.4A 11.2W 12V 0.48A 5.8W 5.0V 0.56A 2.7W 5.1V 2.0A 10W 4.8V 2.4A 11.2W 5.0V 0.56A 2.7W
OnePlus6 5.0V 0.51A 2.5W 5.0V 0.50A 2.5W 5.0V 0.81A 4.0W 5.0V 0.75A 3.7W 5.0V 0.77A 3.7W 5.0V 0.77A 3.9W 5.0V 0.81A 4.0W 5.0V 0.50A 2.5W
Best 4.4V 1.5A 6.7W 4.6V 2.0A 9.2W 8.9V 2.1A 19W 9.1V 2.7A 24W 4.8V 2.3A 11.0W 9.1V 2.6A 24W

05 January, 2026 07:21AM by etbe

January 03, 2026

Joerg Jaspert

AI Shit, go away; iocaine to the rescue

As a lot of people do, I have some content that is reachable using webbrowsers. There is the password manager Vaultwarden, an instance of Immich, ForgeJo for some personal git repos, my blog and some other random pages here and there.

All of this never had been a problem, running a webserver is a relatively simple task, no matter if you use apache2 , nginx or any of the other possibilities. And the things mentioned above bring their own daemon to serve the users.

AI crap

And then some idiot somewhere had the idea to ignore every law, every copyright and every normal behaviour and run some shit AI bot. And more idiots followed. And now we have more AI bots than humans generating traffic.

And those AI shit crawlers do not respect any limits. robots.txt, slow servers, anything to keep your meager little site up and alive? Them idiots throw more resources onto them to steal content. No sense at all.

iocaine to the rescue

So them AI bros want to ignore everything and just fetch the whole internet? Without any consideration if thats even wanted? Or legal? There are people who dislike this. I am one of them, but there are some who got annoyed enough to develop tools to fight the AI craziness. One of those tools is iocaine - it says about itself that it is The deadliest poison known to AI.

Feed AI bots sh*t

So you want content? You do not accept any Go away? Then here is content. It is crap, but appearently you don’t care. So have fun.

What iocaine does is (cite from their webpage) “not made for making the Crawlers go away. It is an aggressive defense mechanism that tries its best to take the blunt of the assault, serve them garbage, and keep them off of upstream resources”.

That is, instead of the expensive webapp using a lot of resources that are basically wasted for nothing, iocaine generates a small static page (with some links back to itself, so the crawler shit stays happy). Which takes a hell of a lot less resource than any fullblown app.

iocaine setup

The website has a https://iocaine.madhouse-project.org/documentation/, it is not hard to setup. Still, I had to adjust some things for my setup, as I use [Caddy Docker Proxy}([https://github.com/lucaslorentz/caddy-docker-proxy) nowadays and wanted to keep the config within the docker setup, that is, within the labels.

Caddy container

So my container setup for the caddy itself contains the following extra lines:

    labels:
      caddy_0.email: email@example.com
      caddy_1: (iocaine)
      caddy_1.0_@read: method GET HEAD
      caddy_1.1_reverse_proxy: "@read iocaine:42069"
      "caddy_1.1_reverse_proxy.@fallback": "status 421"
      caddy_1.1_reverse_proxy.handle_response: "@fallback"

This will be translated to the following Caddy config snippet:

(iocaine) {
        @read method GET HEAD
        reverse_proxy @read iocaine:42069 {
                @fallback status 421
                handle_response @fallback
        }
}

Any container that should be protected by iocaine

All the containers that are “behind” the Caddy reverse proxy can now get protected by iocaine with just one more line in their docker-compose.yaml. So now we have

   labels:
      caddy: service.example.com
      caddy.reverse_proxy: "{{upstreams 3000}}"
      caddy.import: iocaine

which translates to

service.example.com {
        import iocaine
        reverse_proxy 172.18.0.6:3000
}

So with one simple extra label for the docker container I have iocaine activated.

Result? ByeBye (most) AI Bots

Looking at the services that got hammered most from those crap bots - deploying this iocaine container and telling Caddy about it solved the problem for me. 98% of the requests from the bots now go to iocaine and no longer hog resources in the actual services.

I wish it wouldn’t be neccessary to run such tools. But as long as we have shitheads doing the AI hype there is no hope. I wish they all would end up in Jail for all their various stealing they do. And someone with a little more brain left would set things up sensibly, then the AI thing could maybe turn out something good and useful.

But currently it is all crap.

03 January, 2026 01:23PM

hackergotchi for Benjamin Mako Hill

Benjamin Mako Hill

Effects of Algorithmic Flagging on Fairness: Quasi-experimental Evidence from Wikipedia

Note: I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface these for folks who missed them, I will be periodically (re)publishing blog posts about some “older” published projects. This particular post is closely based on a previously published post by Nate TeBlunthuis from the Community Data Science Blog.

Many online platforms are adopting AI and machine learning as a tool to maintain order and high-quality information in the face of massive influxes of user-generated content. Of course, AI algorithms can be inaccurate, biased, or unfair. How do signals from AI predictions shape the fairness of online content moderation? How can we measure an algorithmic flagging system’s effects?

In our paper published at CSCW, Nate TeBlunthuis, together with myself and Aaron Halfaker, analyzed the RCFilters system: an add-on to Wikipedia that highlights and filters edits that a machine learning algorithm called ORES identifies as likely to be damaging to Wikipedia. This system has been deployed on large Wikipedia language editions and is similar to other algorithmic flagging systems that are becoming increasingly widespread. Our work measures the causal effect of being flagged in the RCFilters user interface.

Screenshot of Wikipedia edit metadata on Special:RecentChanges with RCFilters enabled. Highlighted edits with a colored circle to the left side of other metadata are flagged by ORES. Different circle and highlight colors (white, yellow, orange, and red in the figure) correspond to different levels of confidence that the edit is damaging. RCFilters does not specifically flag edits by new accounts or unregistered editors, but does support filtering changes by editor types.

Our work takes advantage of the fact that RCFilters, like many algorithmic flagging systems, create discontinuities in the relationship between the probability that a moderator should take action and whether a moderator actually does. This happens because the output of machine learning systems like ORES is typically a continuous score (in RCFilters, an estimated probability that a Wikipedia edit is damaging), while the flags (in RCFilters, the yellow, orange, or red highlights) are either on or off and are triggered when the score crosses some arbitrary threshold. As a result, edits slightly above the threshold are both more visible to moderators and appear more likely to be damaging than edits slightly below. Even though edits on either side of the threshold have virtually the same likelihood of truly being damaging, the flagged edits are substantially more likely to be reverted. This fact lets us use a method called regression discontinuity to make causal estimates of the effect of being flagged in RCFilters.

Charts showing the probability that an edit will be reverted as a function of ORES scores in the neighborhood of the discontinuous threshold that triggers the RCfilters flag. The jump in the increase in reversion chances is larger for registered editors compared to unregistered editors at both thresholds.

To understand how this system may affect the fairness of Wikipedia moderation, we estimate the effects of flagging on edits on different groups of editors. Comparing the magnitude of these estimates lets us measure how flagging is associated with several different definitions of fairness. Surprisingly, we found evidence that these flags improved fairness for categories of editors that have been widely perceived as troublesome—particularly unregistered (anonymous) editors. This occurred because flagging has a much stronger effect on edits by the registered than on edits by the unregistered.

We believe that our results are driven by the fact that algorithmic flags are especially helpful for finding damage that can’t be easily detected otherwise. Wikipedia moderators can see the editor’s registration status in the recent changes, watchlists, and edit history. Because unregistered editors are often troublesome, Wikipedia moderators’ attention is often focused on their contributions, with or without algorithmic flags. Algorithmic flags make damage by registered editors (in addition to unregistered editors) much more detectable to moderators and so help moderators focus on damage overall, not just damage by suspicious editors. As a result, the algorithmic flagging system decreases the bias that moderators have against unregistered editors.

This finding is particularly surprising because the ORES algorithm we analyzed was itself demonstrably biased against unregistered editors (i.e., the algorithm tended to greatly overestimate the probability that edits by these editors were damaging). Despite the fact that the algorithms were biased, their introduction could still lead to less biased outcomes overall.

Our work shows that although it is important to design predictive algorithms to avoid such biases, it is equally important to study fairness at the level of the broader sociotechnical system. Since we first published a preprint of our paper, a follow-up piece by Leijie Wang and Haiyi Zhu replicated much of our work and showed that differences between different Wikipedia communities may be another important factor driving the effect of the system. Overall, this work suggests that social signals and social context can interact with algorithmic signals, and together these can influence behavior in important and unexpected ways.


The full citation for the paper is: TeBlunthuis, Nathan, Benjamin Mako Hill, and Aaron Halfaker. 2021. “Effects of Algorithmic Flagging on Fairness: Quasi-Experimental Evidence from Wikipedia.” Proceedings of the ACM on Human-Computer Interaction 5 (CSCW): 56:1-56:27. https://doi.org/10.1145/3449130.

We have also released replication materials for the paper, including all the data and code used to conduct the analysis and compile the paper itself.

03 January, 2026 12:34PM by Benjamin Mako Hill

Russ Allbery

Review: Challenges of the Deeps

Review: Challenges of the Deeps, by Ryk E. Spoor

Series: Arenaverse #3
Publisher: Baen
Copyright: March 2017
ISBN: 1-62579-564-5
Format: Kindle
Pages: 438

Challenges of the Deeps is the third book in the throwback space opera Arenaverse series. It is a direct sequel to Spheres of Influence, but Spoor provides a substantial recap of the previous volumes for those who did not read the series in close succession (thank you!).

Ariane has stabilized humanity's position in the Arena with yet another improbable victory. (If this is a spoiler for previous volumes, so was telling you the genre of the book.) Now is a good opportunity to fulfill the promise humanity made to their ally Orphan: accompaniment on a journey into the uncharted deeps of the Arena for reasons that Orphan refuses to explain in advance. Her experienced crew provide multiple options to serve as acting Leader of Humanity until she gets back. What can go wrong?

The conceit of this series is that as soon as a species achieves warp drive technology, their ships are instead transported into the vast extradimensional structure of the Arena where a godlike entity controls the laws of nature and enforces a formal conflict resolution process that looks alternatingly like a sporting event, a dueling code, and technology-capped total war. Each inhabitable system in the real universe seems to correspond to an Arena sphere, but the space between them is breathable atmosphere filled with often-massive storms.

In other words, this is an airship adventure as written by E.E. "Doc" Smith. Sort of. There is an adventure, and there are a lot of airships (although they fight mostly like spaceships), but much of the action involves tense mental and physical sparring with a previously unknown Arena power with unclear motives.

My general experience with this series is that I find the Arena concept fascinating and want to read more about it, Spoor finds his much-less-original Hyperion Project in the backstory of the characters more fascinating and wants to write about that, and we reach a sort of indirect, grumbling (on my part) truce where I eagerly wait for more revelations about the Arena and roll my eyes at the Hyperion stuff. Talking about Hyperion in detail is probably a spoiler for at least the first book, but I will say that it's an excuse to embed versions of literary characters into the story and works about as well as most such excuses (not very). The characters in question are an E.E. "Doc" Smith mash-up, a Monkey King mash-up, and a number of other characters that are obviously references to something but for whom I lack enough hints to place (which is frustrating).

Thankfully we get far less human politics and a decent amount of Arena world-building in this installment. Hyperion plays a role, but mostly as foreshadowing for the next volume and the cause of a surprising interaction with Arena rules. One of the interesting wrinkles of this series is that humanity have an odd edge against the other civilizations in part because we're borderline insane sociopaths from the perspective of the established powers. That's an old science fiction trope, but I prefer it to the Campbell-style belief in inherent human superiority.

Old science fiction tropes are what you need to be in the mood for to enjoy this series. This is an unapologetic and intentional throwback to early pulp: individuals who can be trusted with the entire future of humanity because they're just that moral, super-science, psychic warfare, and even coruscating beams that would make E.E. "Doc" Smith proud. It's an occasionally glorious but mostly silly pile of technobabble, but Spoor takes advantage of the weird, constructed nature of the Arena to provide more complex rules than competitive superlatives.

The trick is that while this is certainly science fiction pulp, it's also a sort of isekai novel. There's a lot of anime and manga influence just beneath the surface. I'm not sure why it never occurred to me before reading this series that melodramatic anime and old SF pulps have substantial aesthetic overlap, but of course they do. I loved the Star Blazers translated anime that I watched as a kid precisely because it had the sort of dramatic set pieces that make the Lensman novels so much fun.

There is a bit too much Wu Kong in this book for me (although the character is growing on me a little), and some of the maneuvering around the mysterious new Arena actor drags on longer than was ideal, but the climax is great stuff if you're in the mood for dramatic pulp adventure. The politics do not bear close examination and the writing is serviceable at best, but something about this series is just fun. I liked this book much better than Spheres of Influence, although I wish Spoor would stop being so coy about the nature of the Arena and give us more substantial revelations. I'm also now tempted to re-read Lensman, which is probably a horrible idea. (Spoor leaves the sexism out of his modern pulp.)

If you got through Spheres of Influence with your curiosity about the Arena intact, consider this one when you're in the mood for modern pulp, although don't expect any huge revelations. It's not the best-written book, but it sits squarely in the center of a genre and mood that's otherwise a bit hard to find.

Followed by the Kickstarter-funded Shadows of Hyperion, which sadly looks like it's going to concentrate on the Hyperion Project again. I will probably pick that up... eventually.

Rating: 6 out of 10

03 January, 2026 05:23AM

hackergotchi for Louis-Philippe Véronneau

Louis-Philippe Véronneau

2025 — A Musical Retrospective

2026 already! The winter weather here has really been beautiful and I always enjoy this time of year. Writing this yearly musical retrospective has now become a beloved tradition of mine1 and I enjoy retracing the year's various events through albums I listened to and concerts I went to.

Albums

In 2025, I added 141 new albums to my collection, around 60% more than last year's haul. I think this might have been too much? I feel like I didn't have time to properly enjoy all of them and as such, I decided to slow down my acquisition spree sometimes in early December, around the time I normally do the complete opposite.

This year again, I bought the vast majority of my music on Bandcamp. Most of the other albums I bought as CDs and ripped them.

Concerts

In 2025, I went to the following 25 (!!) concerts:

  • January 17th: Uzu, Young Blades, She came to quit, Fever Visions
  • February 1st: Over the Hill, Jail, Mortier, Ain't Right
  • February 7th: Béton Armé, Mulchulation II, Ooz
  • February 15th: The Prowlers, Ultra Razzia, Sistema de Muerte, Trauma Bond
  • February 28th: Francbâtards
  • March 28th: Conflit Majeur, to Even Exist, Crachat
  • April 12th: Jetsam, Mortier, NIIVI, Canette
  • April 26th-27th (Montreal Oi! Fest 2025): The Buzzers, Bad Terms, Sons of Pride, Liberty and Justice, Flafoot 56, The Beltones, Mortier, Street Code, The Stress, Alternate Action
  • May 1st: Bauxite, Atomic threat, the 351's
  • May 30th: Uzu, Tenaz, Extraña Humana, Sistema de muerte
  • June 7th: Ordures Ioniques, Tulaviok, Fucking Raymonds, Voyou
  • June 18th: Tiken Jah Fakoly
  • June 21st: Saïan Supa Celebration
  • June 26th: Taxi Girls, Death Proof, Laura Krieg
  • July 4th: Frente Cumbiero
  • July 12th: Montreal's Big Fiesta DJ Set
  • August 16th: Guerilla Poubelle
  • September 11th: No Suicide Act, Mortier
  • September 20th: Hors Contrôle, Union Thugs, Barricade Mentale
  • October 20th: Ezra Furman, The Golden Dregs
  • October 24th: Overbass, Hommage à Bérurier Noir, Self Control, Vermin Kaos
  • November 6th: Béton Armé, Faze, Slash Need, Chain Block
  • November 28th (Blood Moon Ritual 2025): Bhatt, Channeler, Pyrocene Death Cult, Masse d'Armes
  • December 13th (Stomp Records' 30th Anniversary Bash): The Planet Smashers, The Flatliners, Wine Lips, The Anti-Queens, Crash ton rock

Although I haven't touched metalfinder's code in a good while, my instance still works very well and I get the occasional match when a big-name artist in my collection comes in town. Most the venues that advertise on Bandsintown are tied to Ticketmaster though, which means most underground artists (i.e. most of the music I listen to) end up playing elsewhere.

As such, shout out again to the Gancio project and to the folks running the Montreal instance. It continues to be a smash hit and most of the interesting concerts end up being advertised there.

See you all in 2026!


  1. see the 2022, 2023 and 2024 entries 

03 January, 2026 05:00AM by Louis-Philippe Véronneau

January 02, 2026

hackergotchi for Joachim Breitner

Joachim Breitner

Seemingly impossible programs in Lean

In 2007, Martin Escardo wrote a often-read blog post about “Seemingly impossible functional programs”. One such seemingly impossible function is find, which takes a predicate on infinite sequences of bits, and returns an infinite sequence for which that predicate hold (unless the predicate is just always false, in which case it returns some arbitrary sequence).

Inspired by conversations with and experiments by Massin Guerdi at the dinner of LeaningIn 2025 in Berlin (yes, this blog post has been in my pipeline for far too long), I wanted to play around these concepts in Lean.

Let’s represent infinite sequences of bits as functions from Nat to Bit, and give them a nice name, and some basic functionality, including a binary operator for consing an element to the front:

import Mathlib.Data.Nat.Find

abbrev Bit := Bool

def Cantor : Type := Nat → Bit

def Cantor.head (a : Cantor) : Bit := a 0

def Cantor.tail (a : Cantor) : Cantor := fun i => a (i + 1)

@[simp, grind] def Cantor.cons (x : Bit) (a : Cantor) : Cantor
  | 0 => x
  | i+1 => a i

infix:60 " # " => Cantor.cons

With this in place, we can write Escardo’s function in Lean. His blog post discusses a few variants; I’ll focus on just one of them:

mutual
  partial def forsome (p : Cantor → Bool) : Bool :=
    p (find p)

  partial def find (p : Cantor → Bool) : Cantor :=
    have b := forsome (fun a => p (true # a))
    (b # find (fun a => p (b # a)))
end

We define find together with forsome, which checks if the predicate p holds for any sequence. Using that find sets the first element of the result to true if there exists a squence starting with true, else to false, and then tries to find the rest of the sequence.

It is a bit of a brian twiter that this code works, but it does:

def fifth_false : Cantor → Bool := fun a => not (a 5)

/-- info: [true, true, true, true, true, false, true, true, true, true] -/
#guard_msgs in
#eval List.ofFn (fun (i : Fin 10) => find fifth_false i)

Of course, in Lean we don’t just want to define these functions, but we want to prove that they do what we expect them to do.

Above we defined them as partial functions, even though we hope that they are not actually partial: The partial keyword means that we don’t have to do a termination proof, but also that we cannot prove anything about these functions.

So can we convince Lean that these functions are total after all? We can, but it’s a bit of a puzzle, and we have to adjust the definitions.

First of all, these “seemingly impossible functions” are only possible because we assume that the predicate we pass to it, p, is computable and total. This is where the whole magic comes from, and I recommend to read Escardo’s blog posts and papers for more on this fascinating topic. In particular, you will learn that a predicate on Cantor that is computable and total necessarily only looks at some initial fragment of the sequence. The length of that prefix is called the “modulus”. So if we hope to prove termination of find and forsome, we have to restrict their argument p to only such computable predicates.

To that end I introduce HasModulus and the subtype of predicates on Cantor that have such a modulus:

-- Extensional (!) modulus of uniform continuity
def HasModulus (p : Cantor → α) := ∃ n, ∀ a b : Cantor, (∀ i < n, a i = b i) → p a = p b

@[ext] structure CantorPred where
  pred : Cantor → Bool
  hasModulus : HasModulus pred

The modulus of such a predicate is now the least prefix lenght that determines the predicate. In particular, if the modulus is zero, the predicate is constant:

namespace CantorPred

variable (p : CantorPred)

noncomputable def modulus : Nat :=
  open Classical in Nat.find p.hasModulus

theorem eq_of_modulus : ∀a b : Cantor, (∀ i < p.modulus, a i = b i) → p a = p b := by
  open Classical in
  unfold modulus
  exact Nat.find_spec p.hasModulus

theorem eq_of_modulus_eq_0 (hm : p.modulus = 0) : ∀ a b, p a = p b := by
  intro a b
  apply p.eq_of_modulus
  simp [hm]

Because we want to work with CantorPred and not Cantor → Bool I have to define some operations on that new type; in particular the “cons element before predicate” operation that we saw above in find:

def comp_cons (b : Bit) : CantorPred where
  pred := fun a => p (b # a)
  hasModulus := by
    obtain ⟨n, h_n⟩ := p.hasModulus
    cases n with
    | zero => exists 0; grind
    | succ m =>
      exists m
      intro a b heq
      simp
      apply h_n
      intro i hi
      cases i
      · rfl
      · grind

@[simp, grind =] theorem comp_cons_pred (x : Bit) (a : Cantor) :
  (p.comp_cons x) a = p (x # a) := rfl

For this operation we know that the modulus decreases (if it wasn’t already zero):

theorem comp_cons_modulus (x : Bit) :
    (p.comp_cons x).modulus ≤ p.modulus - 1 := by
  open Classical in
  apply Nat.find_le
  intro a b hab
  apply p.eq_of_modulus
  cases hh : p.modulus
  · simp
  · intro i hi
    cases i
    · grind
    · grind
grind_pattern comp_cons_modulus => (p.comp_cons x).modulus

We can rewrite the find function above to use these operations:

mutual
  partial def forsome (p : CantorPred) : Bool := p (find p)

  partial def find (p : CantorPred) : Cantor := fun i =>
    have b := forsome (p.comp_cons true)
    (b # find (p.comp_cons b)) i
end

I have also eta-expanded the Cantor function returned by find; there is now a fun i => … i around the body. We’ll shortly see why that is needed.

Now have everything in place to attempt a termination proof. Before we do that proof, we could step back and try to come up with an informal termination argument.

  • The recursive call from forsome to find doesn’t decrease any argument at all. This is ok if all calls from find to forsome are decreasing.

  • The recursive call from find to find decreases the index i as the recursive call is behind the Cantor.cons operation that shifts the index. Good.

  • The recursive call from find to forsome decreases the modulus of the argument p, if it wasn’t already zero.

    But if was zero, it does not decrease it! But if it zero, then the call from forsome to find doesn’t actually need to call find, because then p doesn’t look at its argument.

We can express all this reasoning as a termination measure in the form of a lexicographic triple. The 0 and 1 in the middle component mean that for zero modulus, we can call forsome from find “for free”.

mutual
  def forsome (p : CantorPred) : Bool := p (find p)
  termination_by (p.modulus, if p.modulus = 0 then 0 else 1, 0)
  decreasing_by grind

  def find (p : CantorPred) : Cantor := fun i =>
    have b := forsome (p.comp_cons true)
    (b # find (p.comp_cons b)) i
  termination_by i => (p.modulus, if p.modulus = 0 then 1 else 0, i)
  decreasing_by all_goals grind
end

The termination proof doesn’t go through just yet: Lean is not able to see that (_ # p) i will call p with i - 1, and it does not see that p (find p) only uses find p if the modulus of p is non-zero. We can use the wf_preprocess feature to tell it about that:

The following theorem replaces a call to p f, where p is a function parameter, with the slightly more complex but provably equivalent expression on the right, where the call to f is no in the else branch of an if-then-else and thus has ¬p.modulus = 0 in scope:

@[wf_preprocess]
theorem coe_wf (p : CantorPred) :
    (wfParam p) f = p (if _ : p.modulus = 0 then fun _ => false else f) := by
  split
  next h => apply p.eq_of_modulus_eq_0 h
  next => rfl

And similarly we replace (_ # p) i with a variant that extend the context with information on how p is called:

def cantor_cons' (x : Bit) (i : Nat) (a : ∀ j, j + 1 = i → Bit) : Bit :=
  match i with
  | 0 => x
  | j + 1 => a j (by grind)

@[wf_preprocess] theorem cantor_cons_congr (b : Bit) (a : Cantor) (i : Nat) :
  (b # a) i = cantor_cons' b i (fun j _ => a j) := by cases i <;> rfl

After these declarations, the above definition of forsome and find goes through!

It remains to now prove that they do what they should, by a simple induction on the modulus of p:

@[simp, grind =] theorem tail_cons_eq (a : Cantor) : (x # a).tail = a := by
  funext i; simp [Cantor.tail, Cantor.cons]

@[simp, grind =] theorem head_cons_tail_eq (a : Cantor) : a.head # a.tail = a := by
  funext i; cases i <;> rfl

theorem find_correct (p : CantorPred) (h_exists : ∃ a, p a) : p (find p) := by
  by_cases h0 : p.modulus = 0
  · obtain ⟨a, h_a⟩ := h_exists
    rw [← h_a]
    apply p.eq_of_modulus_eq_0 h0
  · rw [find.eq_unfold, forsome.eq_unfold]
    dsimp -zeta
    extract_lets b
    change p (_ # _)
    by_cases htrue : ∃ a, p (true # a)
    next =>
      have := find_correct (p.comp_cons true) htrue
      grind
    next =>
      have : b = false := by grind
      clear_value b; subst b
      have hfalse : ∃ a, p (false # a) := by
        obtain ⟨a, h_a⟩ := h_exists
        cases h : a.head
        · exists Cantor.tail a
          grind
        · exfalso
          apply htrue
          exists Cantor.tail a
          grind
      clear h_exists
      exact find_correct (p.comp_cons false) hfalse
termination_by p.modulus
decreasing_by all_goals grind

theorem forsome_correct (p : CantorPred) :
    forsome p ↔ (∃ a, p a) where
  mp hfind := by unfold forsome at hfind; exists find p
  mpr hex := by unfold forsome; exact find_correct p hex

This is pretty nice! However there is more to do. For example, Escardo has a “massively faster” variant of find that we can implement as a partial function in Lean:

def findBit (p : Bit → Bool) : Bit :=
  if p false then false else true

def branch (x : Bit) (l r : Cantor) : Cantor :=
  fun n =>
    if n = 0      then x
    else if 2 ∣ n then r ((n - 2) / 2)
                  else l ((n - 1) / 2)

mutual
  partial def forsome (p : Cantor -> Bool) : Bool :=
    p (find p)

  partial def find (p : Cantor -> Bool) : Cantor :=
    let x := findBit (fun x => forsome (fun l => forsome (fun r => p (branch x l r))))
    let l := find (fun l => forsome (fun r => p (branch x l r)))
    let r := find (fun r => p (branch x l r))
    branch x l r
end

But can we get this past Lean’s termination checker? In order to prove that the modulus of p is decreasing, we’d have to know that, for example, find (fun r => p (branch x l r)) is behaving nicely. Unforunately, it is rather hard to do termination proof for a function that relies on the behaviour of the function itself.

So I’ll leave this open as a future exercise.

I have dumped the code for this post at https://github.com/nomeata/lean-cantor.

02 January, 2026 02:30PM by Joachim Breitner (mail@joachim-breitner.de)

hackergotchi for Ben Hutchings

Ben Hutchings