Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Added section "7.0 Collapsing check-ins throws away valuable information" to rebaseharm.md, linked to from the previous throwaway comment about squashing a whole branch down to a single commit during rebase. This section explains an entire class of harms that come from rebase which wasn't previously covered. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
c71fe99f9b21b0b33335b031ca8bae49 |
User & Date: | wyoung 2019-09-13 11:12:17.071 |
Context
2019-09-13
| ||
11:33 | Reworked the final paragraph of the new section 2.8 in fossil-v-git.wiki, adding in a mention of www/branching.wiki to the comparison, to make it fairer. ... (check-in: 8f7576b0 user: wyoung tags: trunk) | |
11:12 | Added section "7.0 Collapsing check-ins throws away valuable information" to rebaseharm.md, linked to from the previous throwaway comment about squashing a whole branch down to a single commit during rebase. This section explains an entire class of harms that come from rebase which wasn't previously covered. ... (check-in: c71fe99f user: wyoung tags: trunk) | |
10:46 | Added section "2.8 Test Before Commit" to the fossil-v-git doc. ... (check-in: e5ba4578 user: wyoung tags: trunk) | |
Changes
Changes to www/rebaseharm.md.
︙ | ︙ | |||
68 69 70 71 72 73 74 | Consider a hypothetical case:  In the above, a feature branch consisting of check-ins C3 and C5 is run concurrently with the main line in check-ins C4 and C6. Advocates for rebase say that you should rebase the feature branch to the tip | | < > > > > | 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | Consider a hypothetical case:  In the above, a feature branch consisting of check-ins C3 and C5 is run concurrently with the main line in check-ins C4 and C6. Advocates for rebase say that you should rebase the feature branch to the tip of main like the following:  You could choose to collapse C3\' and C5\' into a single check-in as part of this rebase, but that’s a side issue we’ll deal with [separately](#collapsing). If only merge is available, one would do a merge from the concurrent mainline changes into the feature branch as follows:  Check-ins C5\' and C7 check-ins hold identical code. The only |
︙ | ︙ | |||
218 219 220 221 222 223 224 | Unfortunately, Git does not provide the ability to add corrections or clarifications to historical check-ins in its blockchain. Hence, once again, rebase can be seen as an attempt to work around limitations of Git. Wouldn't it be better to fix the tool rather than to lie about the project history? | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | | > > | | 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 | Unfortunately, Git does not provide the ability to add corrections or clarifications to historical check-ins in its blockchain. Hence, once again, rebase can be seen as an attempt to work around limitations of Git. Wouldn't it be better to fix the tool rather than to lie about the project history? ## <a name="collapsing"></a>7.0 Collapsing check-ins throws away valuable information One of the oft-cited advantages of rebasing in Git is that it lets you collapse multiple check-ins down to a single check-in to make the development history “clean.” The intent is that development appear as though every feature were created in a single step: no multi-step evolution, no back-tracking, no false starts, no mistakes. This ignores actual developer psychology: ideas rarely spring forth from fingers to files in faultless finished form. A wish for collapsed, finalized check-ins is a wish for a counterfactual situation. The common counterargument is that collapsed check-ins represent a better world, the ideal we’re striving for. What that argument overlooks is that we must throw away valuable information to get there. ## <a name="empathy"></a>7.1 Individual check-ins support developer empathy Ideally, future developers of our software can understand every feature in it using only context available in the version of the code they start work with. Prior to widespread version control, developers had no choice but to work that way. Pre-existing codebases could only be understood as-is or not at all. Developers in that world had an incentive to develop software that was easy to understand retrospectively, even if they were selfish people, because they knew they might end up being those future developers! Yet, sometimes we come upon a piece of code that we simply cannot understand. If you have never asked yourself, “What was this code’s developer thinking?” you haven’t been developing software for very long. When a developer can go back to the individual check-ins leading up to the current code, they can work out the answers to such questions using only the level of empathy necessary to be a good developer. To understand such code using only the finished form, you are asking future developers to make intuitive leaps that the original developer was unable to make. In other words, you are asking your future maintenance developers to be smarter than the original developers! That’s a beautiful wish, but there’s a sharp limit to how far you can carry it. Eventually you hit the limits of human brilliance. When the operation of some bit of code is not obvious, both Fossil and Git let you run a [`blame`](/help?cmd=blame) on the code file to get information about each line of code, and from that which check-in last touched a given line of code. If you squash the check-ins on a branch down to a single check-in, you throw away the information leading up to that finished form. Fossil not only preserves the check-ins surrounding the one that included the line of code you’re trying to understand, its [superior data model][sdm] lets you see the surrounding check-ins in both directions; not only what lead up to it, but what came next. Git can’t do that short of crawling the block-chain backwards from the tip of the branch to the check-in you’re looking at, an expensive operation. We believe it is easier to understand a line of code from the 10-line check-in it was a part of — and then to understand the surrounding check-ins as necessary — than it is to understand a 500-line check-in that collapses a whole branch’s worth of changes down to a single finished feature. [sdm]: ./fossil-v-git.wiki#durable ## <a name="bisecting"></a>7.2 Bisecting works better on small check-ins Git lets a developer write a feature in ten check-ins but collapse it down to an eleventh check-in and then deliberately push only that final collapsed check-in to the parent repo. Someone else may then do a bisect that blames the merged check-in as the source of the problem they’re chasing down; they then have to manually work out which of the 10 steps the original developer took to create it to find the source of the actual problem. Fossil pushes all 11 check-ins to the parent repository by default, so that someone doing that bisect sees the complete check-in history, so the bisect will point them at the single original check-in that caused the problem. ## <a name="comments"></a>7.3 Multiple check-ins require multiple check-in comments The more comments you have from a given developer on a given body of code, the more concise documentation you have of that developer’s thought process. To resume the bisecting example, a developer trying to work out what the original developer was thinking with a given change will have more success given a check-in comment that explains what the one check-in out of ten blamed by the “bisect” command was trying to accomplish than if they must work that out from the eleventh check-in’s comment, which only explains the “clean” version of the collapsed feature. ## <a name="cherrypicking"></a>7.4 Cherry-picks work better with small check-ins While working on a new feature in one branch, you may come across a bug in the pre-existing code that you need to fix in order for work on that feature to proceed. You could choose to switch briefly back to the parent branch, develop the fix there, check it in, then merge the parent back up to the feature branch in order to continue work, but that’s distracting. If the fix isn’t for a critical bug, fixing it on the parent branch can wait, so it’s better to maintain your mental working state by fixing the problem in place on the feature branch, then check the fix in on the feature branch, resume work on the feature, and later merge that fix down into the parent branch along with the feature. But now what happens if another branch *also* needs that fix? Let us say our code repository has a branch for the current stable release, a development branch for the next major version, and feature branches off of the development branch. If we rebase each feature branch down into the development branch as a single check-in, pushing only the rebase check-in up to the parent repo, only that fix’s developer has the information locally to perform the cherry-pick of the fix onto the stable branch. Developers working on new features often do not care about old stable versions, yet that stable version may have an end user community that depends on that version, who either cannot wait for the next stable version or who wish to put off upgrading to it for some time. Such users want backported bug fixes, yet the developers creating those fixes have poor incentives to provide those backports. Thus the existence of maintenance and support organizations, who end up doing such work. (There is [a famous company][rh] that built a multi-billion dollar enterprise on such work.) This work is far easier when each cherry-pick transfers completely and cleanly from one branch to another, and we increase the likelihood of achieving that state by working from the smallest check-ins that remain complete. If a support organization must manually disentangle a fix from a feature check-in, they are likely to introduce new bugs on the stable branch. Even if they manage to do their work without error, it takes them more time to do the cherry-pick that way. [rh]: https://en.wikipedia.org/wiki/Red_Hat ## <a name="backouts"></a>7.5 Back-outs also work better with small check-ins The inverse of the cherry-pick merge is the back-out merge. If you push only a collapsed version of a private working branch up to the parent repo, those working from that parent repo cannot automatically back out any of the individual check-ins that went into that private branch. Others must either manually disentangle the problematic part of your merge check-in or back out the entire feature. ## <a name="better-plan"></a>8.0 Cherry-pick merges work better than rebase Perhaps there are some cases where a rebase-like transformation is actually helpful. But those cases are rare. And when they do come up, running a series of cherry-pick merges achieve the same topology, but with advantages: 1. Cherry-pick merges preserve an honest record of history. (They do in Fossil at least. Git's file format does not have a slot to record cherry-pick merge history, unfortunately.) 2. Cherry-picks provide an opportunity to [test each new check-in before it is committed][tbc] to the blockchain 3. Cherry-pick merges are "safe" in the sense that they do not cause problems for collaborators if you do them on public branches. 4. Cherry-picks keep both the original and the revised check-ins, so both timestamps are preserved. [tbc]: ./fossil-v-git.wiki#testing ## <a name="conclusion"></a>9.0 Summary and conclusion Rebasing is an anti-pattern. It is dishonest. It deliberately omits historical information. It causes problems for collaboration. And it has no offsetting benefits. For these reasons, rebase is intentionally and deliberately omitted from the design of Fossil. [golden]: https://www.atlassian.com/git/tutorials/merging-vs-rebasing#the-golden-rule-of-rebasing [gitrebase]: https://git-scm.com/book/en/v2/Git-Branching-Rebasing [nagappan]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2008-11.pdf [weinberg]: https://books.google.com/books?id=76dIAAAAMAAJ |