Scaling Fossil Up
Joerg Sonnenberger has done an import of the CVS repository for the NetBSD pkgsrc into Fossil. The result demonstrates some scaling issues with Fossil. (See ticket [6c5471445173a17fffa6aa] for some additional information.) This design note describes the issues and suggests paths toward their solution.
Checkout and commit from pkgsrc are a little sluggish but are workable. Most of the website pages are workable. Where Fossil bogs down is on the "fossil rebuild" command and on cloning. Enhancements to fix those issues, and to improve the speed of checkout and commit, are suggested.
The size of the pkgsrc repository is about 550MB. This is about 4 times larger than anything we have seen before. But the size of the repository seems not to be a problem. Some administrative operations scale linearly with size, but it really is a linear scaling and while those commands take a little longer than on other smaller repositories, the slowdown is not unacceptable.
The pkgsrc repository has history for a little more than 13 years. This is not a problem. The SQLite repository has 10.3 years of history and it works just fine.
The total size of a checkout in pksrc is about 340MB. This is not a problem for Fossil either. The Fossil repository for the SLT (SQL Logic Test) test program for SQLite has a 1.3GB checkout (4x larger than pkgsrc) and the SLT repository works fine. Doing a commit can be a little sluggish on SLT since Fossil verifies the checkin by computing an MD5 checksum over the entire checkin content (1.3GB) and does so twice from different sources to verify that everything is consistent. This can take around 30 seconds or so, which seems a little sluggish, but is manageable. If desired, it would be easy enough to disable the MD5 repository checksum steps as a "setting". The respository MD5 checksum is part of Fossil repository self-checks and it would be unfortunate to have to disable them, but it is easily doable in a fully backwards and forwards compatible way.
The areas where pkgsrc is giving Fossil problems are these: The pkgsrc checkout has a huge number of small files (over 60,000) and there are many, many checkins (190,000 over 13 years, or about 40 checkins per day). Let's consider the latter issue first.
Fossil (like most VCSes) stores the most recent version of each file as full text but then stores all prior versions as a delta from the next version. Storing deltas rather than full text is a storage space optimization. Without the delta compression, a Fossil repository for pkgsrc would be about 500GB rather than 500MB. The deltas move backwards from newer to older versions of each file, since the assumption is that most accesses are for recent versions of the file and so you want to be able to get to the most recent versions using the fewest number of deltas.
Storing backwards deltas works fine for all of the repositories we have seen before. In 10 years of history, there have been 8446 checkins to SQLite, and so the longest delta chain is 8446 links long. Fossil can burn through that relatively quickly. But with pkgsrc, the longest delta chain is 20 times longer - 190,283 links - which is beginning to get very sluggish.
One possible solution would be to have Fossil keep track of the delta chain length and automatically insert a full-text copy of files after 500 or 1000 deltas, thus limiting the maximum delta chain length. I read that Mercurial does this. It will make the repository a little larger, but probably not that much.
The bigger problem is the huge number of files in the checkout: over 60K. When I was designing Fossil, it looked at the Linux kernel and used it as my upper-bound on the number of files in a project. The Linux kernel has about 20K files. So pkgsrc is about 3x larger than the design load for Fossil. Other Fossil repositories typically have on the order of 1000 files per checkout. SQLite has 873 in a checkout. The SLT repository, even though the total size of the checkout is 4 times larger than pkgsrc, is only 645 files. (SLT has a small number of very large files whereas pkgsrc as a large number of very small files.)
The problem with having a huge number of files is that the "manifest" file associated with each checkin becomes huge. The manifest stores the name and SHA1 hash of every file in the checkout. For a repository like SQLite, the total manifest size is about 60K. But for pkgsrc it is about 5MB, or almost 100 times larger.
Manifests, like everything else, are stored in the repository as a chain of deltas. Most checkins only change a handful of files, and so the deltas are very small and the history of all manifests can be stored in very little space, despite the large size of each separate manifest. The problem is that the manifests still have to be expanded to their full 5MB prior to being used for anything and it now takes about 100 times longer to apply each delta (because 100 times more text is being copied) and about 100 times longer to parse each manifest file.
The proposed solution to large manifests is to create a new control file type which I will call a sub-manifest and to add a new card type to the manifest file (the S-card) that links a manifest to a sub-manifest. A sub-manifest contains F-cards and S-cards and maybe a Z-card both nothing else: no date or check-in comment or signature. The idea is to split a large manifest into tree, with the main manifest being the root of the tree and sub-manifests being branches and leaves. We might try to limit the size of each manifest and sub-manifest to about 20 to 50K. We could have separate sub-manifest for each directory in the check-out, as Git does, but there are over 14,000 separate subdirectories in pkgsrc, and having to visit 14,000 artifacts in order to do the equivalent of "fossil list" seems excessive. A better approach would be for Fossil to automatically partition the repository into a few hundred sub-manifests with less than one thousand entries each.
With sub-manifests, most checkins will only modify files in a single sub-manifest. If there is a single sublayer (the main manifest pointing to about 200 sub-manifests of around 300 files each) then a typical checkin would only involve changing the main manifest (about 10K in size) and one sub-manifest (about 20K in size). Thus the size of checkin overhead drops from 5MB to 30K.
With sub-manifests as described above, it should be possible to apply the delta chain to reach a desired checkin about 100 times faster (since there is 100 times less deltaing to do on each step). And when doing a "fossil rebuild" there will be about 100 times less manifest parsing work to do.
To do a "fossil rebuild" on the current pkgsrc repository using the current version of Fossil requires about 8 hours of CPU time on my 4-year-old workstation. Most of this time is spent parsing the 190,000 manifests. If manifest parsing time were reduced by 100 fold, the "fossil rebuild" time could drop from 8 hours to about 5 minutes.
Adding sub-manifests and the S-card is an extension to Fossil. It is fully backwards compatible. Newer versions of Fossil that use sub-manifests would still be able to read and write older repositories that have only flat manifests. But older versions of Fossil would not be able to read newer repositories that used sub-manifests.
The other main area of performance concern with large repositories is cloning. The sync protocol for Fossil is optimized for sharing a small number of changes between two largely identical repositories. The sync protocol will work to synchronize two vastly different repositories, but it is much less efficient for that case.
The problem with clone is that it is currently implemented to use the non-optimal case of the sync protocol. Clone works by first creating a new empty repository, then syncing it to the clone source. That runs ok for smaller repositories such as SQLite. But it falls down badly on pkgsrc.
The solution here is to enhance the sync protocol to work better with the cloning case. If the client could tell the server "I have nothing, send me everything", the server could simply start downloading compressed and deltaed artifacts in linear order. The client could then install the artifacts it its new repository, and then run rebuild at the end to reconstruct the meta-data. This process should be almost as fast as FTP. Of course, rebuild must still be run at the end, but with the sub-manifest fix described above, that should only take about 5 minutes or so.
Fossil is able to handle checkouts of the size of pkgsrc without any problems. A checkout from the SLT repository is four times larger and it works fine. The problem with pkgsrc is the large number of files and subdirectories in each individual checkout and the huge number of checkins over the history of the project. Proposed solutions for these are described above.
There are other areas of concern that will need to be addressed. File browsing on the website is sluggish as each page renders in time proportional to the total number of distinct filenames in the repository (which is 100K for pkgsrc versus 2K for the next biggest repository). And I have not yet tried to run "fossil annotate" on a file with a really huge change history - there could be undiscovered issues there. But for the most part, Fossil will be usable for pkgsrc-scale projects if we can just address the rebuild and cloning issues.