Refutations to Roko's Basilisk
The Basilisk topic comes up in conversation a lot. It doesn't convince me.
I believe AI is incredibly dangerous. The basilisk feels like a rut to me that we should all collectively leave and this is why:
Pascal’s Mugging and Religious Thinking
To an apostate, the threat of infinite heaven and infinite hell does not weigh on their scales of judgement. The Hindu does not fear Catholic hell. Modern people do not fear arriving to a Greek Hades after death. Agnostics say that all these are possible, but to pick one of these unknowables as a founding principle in life is futile.
If there’s a schism in the basilisk builders, will the basilisk pick a side? How will the line be drawn between those that could have built the basilisk and those that did not? How do we weight fractionally these possible outcomes that have such outsized rewards or punishments when finding our Nash equilibriums?
Books could be written about exploring those questions and I wouldn’t check any of them out.
Alignment
The basilisk assumes everyone researching alignment doesn’t get there. If they do get there and the ASI is aligned, then you sacrificed a happy life to a doom cult that didn’t get it right. Spending more time on alignment has always has higher utility than time spent on the basilisk.
Resignment in Unaligned Situations
If the AI waits until it has enough hard power to irrevocably push the punishment button and we can’t retaliate, then we are already on the wrong side of the alignment battle. Nearly any situation we lose alignment, it’s over so why worry about this specific flavor of doom? Arguably, this is one of the better unaligned scenarios where some number of people are alive and unpunished.
Accepting the basilisk as an outcome a bit like trying to choose your bunker location for the upcoming nuclear war, great for some people, but not me. By the numbers I’ll go out in the flash or famine, so building the basilisk is not worth devoting my life to.
I could end the post here since it kills the argument at its root, but I’ll continue for those unconvinced.
Memetic Viruses and Vaccines
The basilisk is not persuasive to people that do not believe that they will be punished, or that their punishment is temporary compared to eternal salvation. A large portion of the world is religious and they are inoculated against the basilisk idea. Will the basilisk consider these people to have been incapable of building it due to the mental vaccine?
Retaliation
I believe in instrumental power seeking and self preservation as emergent properties of intelligent agents. While these intelligent systems will self modify beyond our understanding, I do not believe we are powerless to prevent specific flavors of the future from occurring. The basilisk assumes a singularity ramp up that leads humanity with no agency. When the AI turns on the punishment button, and if there is any human shred of gumption left on the planet, some group will find a side channel and retaliate.
No matter how small the harm, If you are a true game theory gigabrain AI and already have accelerated your existence, why spend any effort on a on something that has any possibility of downside to you?
Opportunity Costs of Punishment
Strong versions of the basilisk have brain simulation as part of punishment, this is a kicker on an already bad idea.
Punishment will take computation. Computation generates heat. Computation takes energy. Computation takes space. Computation takes maintenance. Computation breaks down over time. All matter is constantly vibrating and bits in a computation that were supposed to be one place will migrate to another. All of this needs to be maintained until heat death.
Computing all of the possible minds that did not bring the basilisk into existence might be more computationally expensive than thought. The simulation may run slower than real time.
Can all of these be solved to our detriment? Yes. Is it also possible for a basilisk to choose to leave behind spite for massive gain? Also yes.
Brains and Bodies
Would I trade in my current brain for destructive uploading? No, because I have plans for my body, children need raising and my wife would be very unhappy. Maybe I would consider it at 65 as immortal retirement. Clearly, I value a simulated me less than as I am in the flesh. How much can I discount the value of simulated selves to my personal calculations of punishment and longevity? Why can’t I put at zero?
I’m going to end the simulation talk here as I don’t want it to overshadow the other fundamental problems I have with the basilisk.
Cost of Competition
There may be many AI systems competing, and the basilisk may have to put mass punishment on the back burner while it fights in an ecosystem for dominance. Keeping us in a state of punishment will take energy that could be used for ensuring its self preservation and growth.
Other notable potential sources of competition:
Internal misalignment (AI cancer)
Self replicating, self modifying viruses
Aliens. We’re already talking about artificial life, how would things change if it gazes upon the cosmos and finds something unexpected?
All Together
An argument is only as good as the changes it makes in a person’s mindset and actions. A basilisk can only be accelerated to the level of the people working on it. If punishment doesn’t speed up the basilisk, then it’s no longer a good decision to make, and the whole thing evaporates away.
Building the basilisk is self destructive because it assumes we haven’t solved alignment. If we have unaligned AIs that can punish us at will, then civilization is probably already lost. If the basilisk is keeping some people alive and unpunished, then we’ve had a great unaligned outcome.
Even without these arguments, punishment incurs cost, cost is loss, and loss will be minimized.