We want the theorycrafting we encounter to be correct, but is that all we should want? This post explores another virtue of good theorycrafting, trustworthiness.
It is hard to put faith in a result, no matter how precise or appealing, if we have no idea how a person arrived at it. We want theorycrafters to describe their methods so that we can have some confidence that their results are reasonably-obtained rather than the product of chance or mistake. Real-world scientists have the same expectation in their fields and, quite understandably, have put a lot of time into articulating what things make a result trustworthy. They use the following terms:
This is a rather expansive concept for a five-letter word. Broadly speaking, the concern of validity deals with how well a test has been constructed. It can be broken down into internal validity, which regards whether the proposed cause is really leading to the proposed effect, and external validity, which regards whether or not the test is representative of the phenomenon in question. There are also the ideas of construct validity and content validity, which relate to whether the test is measuring what the tester thinks it is measuring.
What the concern of validity sums to, then, is the desire to have a test be in all ways a good and reasonable test of the phenomenon in question. If a test fails on just one dimension of validity it can ruin the whole thing. For example, even if you otherwise perform the absolute best target dummy testing possible but fail on construct validity by comparing a Careful Aim build with a non-CA build, the whole thing may have to be thrown out. (Careful Aim is problematic for most testing on target dummies because, owing to the dummies always being over 90% health, it skews results in favor of the CA build.)
Because validity is such a huge concern, good theorycrafting should include a long enough description of the methods and math used such that the reader is assured that test is appropriate and properly executed. This can require a long description for some tests, but these days we have some significant shortcuts at our disposal. For example, linking to a femaledwarf profile can remove the need for a lot of the descriptions of settings. Bloggers and forum regulars also have the advantage that they can link to previous posts explaining their methods if they are repeating an old test for a new tier.
We want a result to persist through repeated testing. Put another way, we want a result to be generally true rather than only true of one run of the test where some fluke or trick of RNG made an outlier result. This is why good theorycrafting often involves many iterations on a simulator (such as with SimulationCraft) or extended periods of time on a target dummy; the goal is to get any spikes or troughs in results to average out over lengthy testing. Thirty seconds on a target dummy is no way to test something considering how much your results may vary away from the “true” average due to crits and procs.
This is the idea that another tester should be able to reproduce your results. Replicating an experiment allows scientists confidence that a result was not the product of a particular tester’s idiosyncracies and is instead valid in a general sense. Replication recently allowed scientists to confirm that neutrinos can travel faster than light, but a lack of replicability was also the downfall of a famous result in the search for cold fusion.
Replication additionally allows for conversation and advancement within a scientific or theorycrafting community. A test being replicable allows testers to repeat and build off each other’s work, achieving better results than a single person working alone would be able to attain.
Together, these virtues of good testing combine toward this conclusion: good theorycrafting involves explaining your work well enough that other people can have faith in it as being generally true of the phenomenon in question and can replicate the test if they want to.
Unfortunately, thorough explanations of methods and math are not as common as we might like in the WoW community. It takes time to write them out, it makes for a potentially boring TL;DR (too long, didn’t read) wall of text in a post, it detracts from the flow of a post, and it opens up the tester to scrutiny that he or she might not want to be under. Frostheim is a prominent example of someone who does not overtly show his work. I suspect, given his audience of hunters who probably just want quick answers, that he does not want to bog down his guides and posts with long explanations of methods. It also certainly saves him significant amounts of time, given the number of guides he maintains. But it also deprives the hunter community of some confidence in his results; we have to take him on faith rather than being able to affirm that his methods are sound. It also precludes the community-improving dialogues that might result from explicit methods and the exposure that many hunters might want for starting their own theorycrafting. A partial solution to the inelegance of explanations of metods and math, at least for bloggers, is the use of appendixes. Testing explanations could be added on to the end of a post to offer the details to the curious but would not get in the way of the main post’s message.
In advocating for transparency in theorycrafting I am not trying to lecture from on high. I am not a perfect theorycrafter by any means, and I make mistakes with a depressing regularity. Indeed, all theorycrafters make mistakes. That is why theorycrafters explaining themselves with an eye to validity, reliability and replicability is so important. We are all human and prone to error, and it is only through an enterprise of transparency and communal scrutiny that we will achieve the best results.