Reproducible Builds in May 2025
Welcome to our 5th report from the Reproducible Builds project in 2025! Our monthly reports outline what we’ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please do visit the Contribute page on our website.
In this report:
- Security audit of Reproducible Builds tools published
- When good pseudorandom numbers go bad
- Academic articles
- Distribution work
- diffoscope and disorderfs
- Website updates
- Reproducibility testing framework
- Upstream patches
Security audit of Reproducible Builds tools published
The Open Technology Fund’s (OTF) security partner Security Research Labs recently an conducted audit of some specific parts of tools developed by Reproducible Builds. This form of security audit, sometimes called a “whitebox� audit, is a form testing in which auditors have complete knowledge of the item being tested. They auditors assessed the various codebases for resilience against hacking, with key areas including differential report formats in diffoscope, common client web attacks, command injection, privilege management, hidden modifications in the build process and attack vectors that might enable denials of service.
The audit focused on three core Reproducible Builds tools: diffoscope, a Python application that unpacks archives of files and directories and transforms their binary formats into human-readable form in order to compare them; strip-nondeterminism, a Perl program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging; and reprotest, a Python application that builds source code multiple times in various environments in order to to test reproducibility.
OTF’s announcement contains more of an overview of the audit, and the full 24-page report is available in PDF form as well.
�When good pseudorandom numbers go bad�
Danielle Navarro published an interesting and amusing article on their blog on When good pseudorandom numbers go bad. Danielle sets the stage as follows:
[Colleagues] approached me to talk about a reproducibility issue they’d been having with some R code. They’d been running simulations that rely on generating samples from a multivariate normal distribution, and despite doing the prudent thing and using
set.seed()
to control the state of the random number generator (RNG), the results were not computationally reproducible. The same code, executed on different machines, would produce different random numbers. The numbers weren’t “just a little bit different� in the way that we’ve all wearily learned to expect when you try to force computers to do mathematics. They were painfully, brutally, catastrophically, irreproducible different. Somewhere, somehow, something broke.
Thanks to David Wheeler for posting about this article on our mailing list
Academic articles
There were two scholarly articles published this month that related to reproducibility:
Daniel Hugenroth and Alastair R. Beresford of the University of Cambridge in the United Kingdom and Mario Lins and René Mayrhofer of Johannes Kepler University in Linz, Austria published an article titled Attestable builds: compiling verifiable binaries on untrusted systems using trusted execution environments. In their paper, they:
present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact, which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts.
The authors compare “attestable builds� with reproducible builds by noting an attestable build requires “only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it�, and proceed by determining that t�he overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time.�
Timo Pohl, Pavel Novák, Marc Ohm and Michael Meier have published a paper called Towards Reproducibility for Software Packages in Scripting Language Ecosystems. The authors note that past research into Reproducible Builds has focused primarily on compiled languages and their ecosystems, with a further emphasis on Linux distribution packages:
However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This Systemization of Knowledge (SoK) [paper] provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems.
Ultimately, the three authors find that the literature is “sparse�, focusing on few individual problems and ecosystems, and therefore identify space for more critical research.
Distribution work
In Debian this month:
-
Ian Jackson filed a bug against the
debian-policy
package in order to delve into an issue affecting Debian’s support for cross-architecture compilation, multiple-architecture systems, reproducible builds’SOURCE_DATE_EPOCH
environment variable and the ability to recompile already-uploaded packages to Debian with a new/updated toolchain (binNMUs). Ian identifies a specific case, specifically in thelibopts25-dev
package, involving a manual page that had interesting downstream effects, potentially affecting backup systems. The bug generated a large number of replies, some of which have references to similar or overlapping issues, such as this one from 2016/2017. -
Chris Hofstaedtler filed a bug against the metasnap.debian.net service to note that some packages are not available in metasnap API.
-
22 reviews of Debian packages were added, 24 were updated and 11 were removed this month, all adding to our knowledge about identified issues.
Hans-Christoph Steiner of the F-Droid catalogue of open source applications for the Android platform published a blog post on Making reproducible builds visible. Noting that “Reproducible builds are essential in order to have trustworthy software�, Hans also mentions that “F-Droid has been delivering reproducible builds since 2015�. However:
There is now a “Reproducibility Status� link for each app on
f-droid.org
, listed on every app’s page. Our verification server shows ✔�� or 💔 based on its build results, where ✔�� means our rebuilder reproduced the same APK file and 💔 means it did not. The IzzyOnDroid repository has developed a more elaborate system of badges which displays a ✅ for each rebuilder. Additionally, there is a sketch of a five-level graph to represent some aspects about which processes were run.
Hans compares the approach with projects such as Arch Linux and Debian that “provide developer-facing tools to give feedback about reproducible builds, but do not display information about reproducible builds in the user-facing interfaces like the package management GUIs.�
Arnout Engelen of the NixOS project has been working on reproducing the minimal installation ISO image. This month, Arnout has successfully reproduced the build of the minimal image for the 25.05 release without relying on the binary cache. Work on also reproducing the graphical installer image is ongoing.
In openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.
Lastly in Fedora news, Jelle van der Waa opened issues tracking reproducible issues in Haskell documentation, Qt6 recording the host kernel and R packages recording the current date. The R packages can be made reproducible with packaging changes in Fedora.
diffoscope & disorderfs
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 295
, 296
and 297
to Debian:
- Don’t rely on zipdetails’
--walk
argument being available, and only add that argument on newer versions after we test for that. […] - Review and merge support for NuGet packages from Omair Majid. […]
- Update copyright years. […]
- Merge support for an
lzma
comparator from Will Hollywood. […][…]
Chris also merged an impressive changeset from Siva Mahadevan to make disorderfs more portable, especially on FreeBSD. disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues […]. This was then uploaded to Debian as version 0.6.0-1
.
Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 296 […][…] and 297 […][…], and disorderfs to version 0.6.0 […][…].
Website updates
Once again, there were a number of improvements made to our website this month including:
-
Chris Lamb:
- Merged four or five suggestions from Guillem Jover for the GNU Autotools examples on the
SOURCE_DATE_EPOCH
example page […] - Incorporated a number of fixes for the JavaScript
SOURCE_DATE_EPOCH
snippet from Sebastian Davis, which did not handle non-integer values correctly. […]
- Merged four or five suggestions from Guillem Jover for the GNU Autotools examples on the
-
David A. Wheeler:
- Fix an apostrophe in the
README.md
file. […]
- Fix an apostrophe in the
-
Hans-Christoph Steiner:
- Add the F-Droid “Verification Server to the Tools page. […]
- Add the Creative Commons Attribution-ShareAlike 4.0 International as the website’s root
LICENSE
file. […] - Updated the Recording the build environment page to add a section pertaining to how F-Droid handles this. […]
-
Jochen Sprickerhof:
- Add Chris Hofstaedtler to the Who is involved? page. […]
-
Sebastian Davids:
- Fix the CoffeeScript example on the
SOURCE_DATE_EPOCH
page. […] - Remove the JavaScript example that uses a ‘fixed’ timezone on the
SOURCE_DATE_EPOCH
page. […]
- Fix the CoffeeScript example on the
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility.
However, Holger Levsen posted to our mailing list this month in order to bring a wider awareness to funding issues faced by the Oregon State University (OSU) Open Source Lab (OSL). As mentioned on OSL’s public post, “recent changes in university funding makes our current funding model no longer sustainable [and that] unless we secure $250,000 in committed funds, the OSL will shut down later this year�. As Holger notes in his post to our mailing list, the Reproducible Builds project relies on hardware nodes hosted there. Nevertheless, Lance Albertson of OSL posted an update to the funding situation later in the month with broadly positive news.
Separate to this, there were various changes to the Jenkins setup this month, which is used as the backend driver of for both tests.reproducible-builds.org and reproduce.debian.net, including:
- Migrating the central
jenkins.debian.net
server AMD Opteron to Intel Haswell CPUs. Thanks to IONOS for hosting this server since 2012. - After testing it for almost ten years, the
i386
architecture has been dropped from tests.reproducible-builds.org. This is because that, with the upcoming release of Debian trixie,i386
is no longer supported as a ‘regular’ architecture — there will be no official kernel and no Debian installer fori386
systems. As a result, a large number of nodes hosted by Infomaniak have been retooled fromi386
toamd64
. - Another node,
ionos17-amd64.debian.net
, which is used for verifying packages for all.reproduce.debian.net (hosted by IONOS) has had its memory increased from 40 to 64GB, and the number of cores doubled to 32 as well. In addition, two nodes generously hosted by OSUOSL have had their memory doubled to 16GB. - Lastly, we have been granted access to more
riscv64
architecture boards, so now we have seven such nodes, all with 16GB memory and 4 cores that are verifying packages for riscv64.reproduce.debian.net. Many thanks to PLCT Lab, ISCAS for providing those.
Outside of this, a number of smaller changes were also made by Holger Levsen:
-
reproduce.debian.net-related:
- Only use two workers for the
ppc64el
architecture due to RAM size. […] - Monitor
nginx_request
andnginx_status
with the Munin monitoring system. […][…] - Detect various variants of network and memory errors. […][…][…][…]
- Add a prominent link to reproducible-builds.org. […]
- Add a
rebuilderd-cache-cleanup.service
and run it daily via timer. […][…][…][…][…] - Be more verbose what sources are being downloaded. […]
- Correctly deal with packages with an epoch in their version […] and deal with binNMUs versions with an epoch as well […][…].
- Document how to reschedule all other errors on all archs. […]
- Misc documentation improvements. […][…][…][…]
- Include the
$HOSTNAME
variable in the rebuilderd logfiles. […] - Install the
equivs
package on all worker nodes. […][…]
- Only use two workers for the
-
Jenkins nodes:
- Permit the
sudo
tool to fix up permission issues. […][…] - Document how to manage diskspace with OpenStack. […]
- Ignore a number of spurious monitoring errors on
riscv64
, FreeBSD, etc.. […][…][…][…] - Install
ntpsec-ntpdate
(instead ofntpdate
) as the former is available on Debian trixie and bookworm. […][…] - Use the same SSH
ControlPath
for all nodes. […] - Make sure the
munin
user uses the same SSH config as thejenkins
user. […]
- Permit the
-
tests.reproducible-builds.org-related:
-
Misc:
- Fix a (harmless) typo in the
multiarch_versionskew
script. […]
- Fix a (harmless) typo in the
In addition, Jochen Sprickerhof made a series of changes related to reproduce.debian.net:
- Add out of memory detection to the statistics page. […]
- Reverse the sorting order on the statistics page. […][…][…][…]
- Improve the spacing between statistics groups. […]
- Update a (hard-coded) line number in error message detection pertaining to a
debrebuild
line number. […] - Support Debian unstable in the
rebuilder-debian.sh
script. […]…] - Rely on
rebuildctl
to sync only ‘arch-specific’ packages. […][…]
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
autotrace
,ck
,cmake/musescore
,cmake
,crash
,cvsps
,gexif
,gq
,gtkam
,ibus-table-others
,krb5-appl
,ktoblzcheck-data
,leafnode
,lib2geom
,libexif-gtk
,libmfx-gen
,libmfx
,liboqs
,libyui
,linkloop
,meson
,MozillaFirefox
,ncurses
,netdiscover
,notify-sharp
,pcsc-acr38
,pcsc-asedriveiiie-serial
,pcsc-asedriveiiie-usb
,pcsc-asekey
,pcsc-eco5000
,pcsc-reflex60
,perl-Crypt-RC
,python-boto3
,python-gevent
,python-pytest-localserver
,qt6-tools
,seamonkey
,seq24
,smictrl
,sobby
,solfege
,urfkill
,uwsgi
,wsmancli
,xine-lib
,xkeycaps
,xquarto
,yast-control-center
,yast-ruby-bindings
andyast
-
Chris Hofstaedtler:
- #1104578 filed against
jabber-muc
.
- #1104578 filed against
-
Chris Lamb:
- #1105171 filed against
golang-github-lucas-clemente-quic-go
.
- #1105171 filed against
-
Jelle van der Waa:
-
Jochen Sprickerhof:
-
Zhaofeng Li:
- Add support for
--mtime
and--clamp-mtime
tobsdtar
.
- Add support for
-
James Addison:
- #1105119 for
python3
— requested enabling a LTO-adjacent option that should improve build reproducibility. - #1106274 upstream fix merged for
freezegun
for a timezone issue causing unit tests to fail during testing. - Opened a pull request for
tutanota
in an attempt to resolve a long-standing reproducibility issue.
- #1105119 for
-
Zbigniew Jędrzejewski-Szmek:
0xFFFF
: UseSOURCE_DATE_EPOCH
for date in manual pages.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Mastodon: @reproducible_builds@fosstodon.org
-
Mailing list:
rb-general@lists.reproducible-builds.org
Blocking comment spammers on an Ikiwiki blog
Despite comments on my ikiwiki blog being fully moderated, spammers have been increasingly posting link spam comments on my blog. While I used to use the blogspam plugin, the underlying service was likely retired circa 2017 and its public repositories are all archived.
It turns out that there is a relatively simple way to drastically reduce the amount of spam submitted to the moderation queue: ban the datacentre IP addresses that spammers are using.
Looking up AS numbers
It all starts by looking at the IP address of a submitted comment:
From there, we can look it up using
whois
:The important bit here is this line:
which referts to Autonomous System 207408, owned by a hosting company in Germany called Servinga.
Alternatively, you can use this WHOIS server with much better output:
Looking up IP blocks
Autonomous Systems are essentially organizations to which IPv4 and IPv6 blocks have been allocated.
These allocations can be looked up easily on the command line either using a third-party service:
or a local database downloaded from IPtoASN.
This is what I ended up with in the case of Servinga:
Preventing comment submission
While I do want to eliminate this source of spam, I don't want to block these datacentre IP addresses outright since legitimate users could be using these servers as VPN endpoints or crawlers.
I therefore added the following to my Apache config to restrict the CGI endpoint (used only for write operations such as commenting):
and then put the following in
/etc/apache2/spammers.include
:Finally, I can restart the website and commit my changes:
Future improvements
I will likely automate this process in the future, but at the moment my blog can go for a week without a single spam message (down from dozens every day). It's possible that I've already cut off the worst offenders.
I have published the list I am currently using.
04 June, 2025 08:28PM