Not a day goes by without the SARS-CoV-2 coronavirus shaking the web and the media. Today, it is a Nobel Prize in Medicine and Physiology who “launches a pavement in the pond”, by claiming loud and clear on several media that the coronavirus which currently causes a global pandemic has been modified in the laboratory and contains genes originating in HIV, in order to make it even more dangerous (Addendum: I am told that Luc Montagnier suggested that it was during the process of manufacturing a vaccine, mea culpa if I misinterpreted his remarks). As good scientists who respect each other, check these words before giving our opinion on it … You never know, he is a Nobel Prize, right?
I tell you, this article will be much shorter than usual. Furthermore, he will focus exclusively on this assertion that the current coronavirus contains sequences derived from HIV and will not be interested in what has been said about electromagnetism to cure Covid-19.
SARS-CoV-2 coronavirus and HIV
SARS-CoV-2 seen by transmission electron microscopy. Credits: NIAID
I will not give you an exhaustive description of the SARS-CoV-2 coronavirus. It is a member of the large coronavirus family, it emerged in late 2019 in Wuhan, China, and which caused the global pandemic of Covid-19. Coronaviruses are a family of viruses known as “positive RNA” because their genetic material is made up of RNA and not DNA. It is said to be positive because the cellular machinery for manufacturing proteins from its host (we, for example) will directly use the RNA of the virus to manufacture proteins.
The human immunodeficiency virus (HIV) is behind the global pandemic of Acquired Immunodeficiency Syndrome, or AIDS. He was identified in the 1980s by several international teams, including that of Professor Luc Montagnier. This virus, although it is also RNA positive, is from the retrovirus family. Unlike coronaviruses, retroviruses will copy their RNA as DNA and embed it in the genome of their host, within infected cells. These are two viruses from very distant families and whose viral cycles are also very different. There are, moreover, two types of HIV virus called HIV-1 and HIV-2 respectively, which have passed from animals to humans at two different times (1). HIV-1 is the most common and one of its subgroups is responsible for the majority of the AIDS pandemic. HIV-2 is more geographically limited to African countries or countries with close ties to Africa (such as Portugal and France). Slower in development, however, it poses problems because it has more resistance to the treatments used in the context of AIDS.
Genetic sequences of SARS-CoV-2 and HIV
The genetic sequences of HIV have been known for many years and those of SARS-CoV-2 were obtained quickly after the discovery of the pathogen in early January. Both are therefore available on Pubmed, the NIH platform. Also, I suggest that we go take a look to verify what Mr. Montagnier said.
Let’s start by finding the complete genome sequences that we want to compare …
SARS-CoV-2: https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2
HIV-1: https://www.ncbi.nlm.nih.gov/nuccore/NC_001802.1
HIV-2: https://www.ncbi.nlm.nih.gov/nuccore/NC_001722.1
Well, the sequences are available at the bottom of each page (the endless ATCGs there), but since the HIV genome is 10,000 nucleotides long and that of the coronavirus makes 30,000, I have a lot of good will, but we will not have fun comparing everything by hand eh… If your confinement is REALLY too long, well, have fun.
I’m going to use a magic tool called BLAST (for Basic Local Alignment Search Tool). Basically, we give him a sequence and he looks for the common points between the two sequences. Perfect for what we are looking for! As they are nucleic acids (RNA in this case), I will perform a BLASTn to try to compare my 3 sequences. I’m going to choose the algorithm that allows me to find the maximum similarity between my sequences, even if they are very different from each other. And I launch the analysis … SUSPENSE!
Five seconds later, here is the result:
BLASTn result between SARS-CoV2 and HIV-1 and HIV-2.
Believe it or not, but the algorithm did find common sequences between SARS-CoV-2 and HIV! So is it true? Is this a machination, a human-made virus? The resemblance between the sequences is spread over an approximate length of … 0% of the size of the sequences (the red square). How is it possible ?! Let’s look at the details of the alignments.
BLASTn alignments between HIV-2 and SARS-CoV-2
Here is the result for alignment with HIV-2. We see that the identical nucleotides are placed one in front of the other. The algorithm found two places where these sequences can be similar. There are a few points to raise:
The first place where there is a sequence that looks like is … 21 nucleotides long (1 in the photo). And again, in it you have to replace two nucleotides so that it looks the same. Recall that the genomes of these viruses are tens of thousands of base pairs eh…
There is a second place, 14 nucleotides long (2). It’s already not very long, but in addition it is not on the strand itself that this resemblance exists, as shown in point 3 on the photo: it is only if we make the complementary strand (so A becomes T, C becomes G …) we find this resemblance …
I think that resemblance between SARS-CoV-2 and HIV-2 can be classified as “not convincing”. What about HIV-1?
BLASTn alignments between HIV-1 and SARS-CoV-2
First observation, two of the alignments are made between the strands that I gave (Strand Plus / Plus) and two between a strand that I gave and a complementary strand (Strand Plus / Minus)… And in both cases, we ends up again with foods from 14 to 29 nucleotides (with mutations in the middle). Still not convincing.
Let’s be rigorous in our paranoia
Well, at this point, we have already shown that the genetic sequence of SARS-CoV-2 does not contain HIV sequences. But let’s be a little twisted. It is possible that several different genetic sequences lead to the same chain of amino acids and therefore to the same protein.
So, as we are people who go to the end of things, I will compare the amino acid sequences of the proteins of HIV-1 / HIV-2 and those of SARS-CoV-2. I’m going to make what is called a BLASTp by putting the sequences of the different proteins in it like a bastard. These so-called coding sequences (CDS) are located in the pages of Pubmed that I gave you above. The table of references used is located in the appendix below. 🙂
Therefore ? are there protein alignments between the SARS-CoV-2 and HIV proteins? Yes. There are 162! Finally, is this proof that this virus has indeed been modified to encode HIV proteins? It’s not that simple …
Much like we saw with the genes above, just because we find a possible alignment does not mean that this alignment means anything. And 162 alignments on 38 x 33 = 1,254 comparisons … It may be background noise. To verify this, I’m going to look at two things: the length of the alignments found and the percentage of identity (that is, how identical these alignments are).
First, note that the majority of the alignments found are rather short, with sequences resembling a few tens of amino acids long at most. But there are very long sequences! What about the percentage of identity?
Well, not only are the alignments not (for the most part) very long, but in addition, the percentage of identity is low (or even all rotten), less than 50% of identity for the vast majority of alignments… It is a bit like the alignments that we found at the level of the nucleic sequences: the algorithm said that they look a little like because … Well it’s like your kid’s drawing what: “Dad he has 2 legs and 2 arms ”but if he hadn’t told you it was dad, you wouldn’t have guessed it. But we notice that there are some who are really high! With more than 60% or even 100%!
But … is it because the sequences that have a high identity are very short? Like the two viruses have a protein with 4 amino acids the same in one place or another? And those who are long have rather low identities?
Yes. It’s exactly that. We have very short sequences of amino acids that look pretty similar, but the longer it gets, the less it looks. These are false positives, background noise. Like the photo below where a long sequence of 16 amino acids has 38% identity… a little for nothing (the white spaces and the + mark different amino acids).
Example of one of the long alignments of 16 amino acids with 38% identity
The longest protein sequence (183 amino acids) between the two viruses finds a resemblance between an HIV enzyme called polymerase, which transcribes a strand of nucleic acids complementary to a strand it is given. Always on the same principle of putting an A in front of a T, a C in front of a G and vice versa. BLASTp finds that a SARS-CoV-2 enzyme is very similar to this HIV polymerase, and this enzyme is called Nsp3 (for non-structural multi-domain protein 3). This enzyme is essential for replication of the virus … Presumably, it codes for an enzyme that has a polymerase function too. And she has only 24% identity with that of HIV.
Alignment between an HIV polymerase and Nsp3 of SARS-CoV-2
Conclusions
Do not over-interpret our data. We cannot conclude here as to whether or not this virus has been manipulated in the laboratory, or even modified to make it more virulent.
On the other hand, very clearly, we find neither the genetic sequence of HIV in that of SARS-CoV-2, nor strong similarities between the proteins of these two viruses. It is therefore highly unlikely that anyone has modified SARS-CoV-2 to incorporate proteins from the AIDS-causing virus.
Annexes
Reference 1: https://cns.sante.fr/wp-content/uploads/2017/01/experts-vih_diversite.pdf
Table of proteins compared:
References of SARS-CoV-2 proteins | HIV-1/2 protein references |
YP_009724389.1 | NP_663784.1 |
YP_009725297.1 | NP_056837.1 |
YP_009725298.1 | NP_056839.1 |
YP_009725299.1 | NP_056840.1 |
YP_009725300.1 | NP_056841.1 |
YP_009725301.1 | NP_056842.1 |
YP_009725302.1 | NP_056843.1 |
YP_009725303.1 | NP_056844.1 |
YP_009725304.1 | NP_056845.1 |
YP_009725305.1 | NP_057849.4 |
YP_009725306.1 | NP_787043.1 |
YP_009725307.1 | NP_789740.1 |
YP_009725308.1 | NP_705926.1 |
YP_009725309.1 | NP_705927.1 |
YP_009725310.1 | NP_789739.1 |
YP_009725311.1 | NP_705928.1 |
YP_009725295.1 | NP_057850.1 |
YP_009742608.1 | NP_579876.2 |
YP_009742609.1 | NP_579880.1 |
YP_009742610.1 | NP_579882.1 |
YP_009742611.1 | NP_579881.1 |
YP_009742612.1 | NP_787042.1 |
YP_009742613.1 | NP_579883.1 |
YP_009742614.1 | NP_057851.1 |
YP_009742615.1 | NP_057852.2 |
YP_009742616.1 | NP_057853.1 |
YP_009742617.1 | NP_057854.1 |
YP_009725312.1 | NP_057855.1 |
YP_009724390.1 | NP_057856.1 |
YP_009724391.1 | NP_579893.2 |
YP_009724392.1 | NP_579894.2 |
YP_009724393.1 | NP_579895.1 |
YP_009724394.1 | NP_057857.2 |
YP_009724395.1 | |
YP_009725318.1 | |
YP_009724396.1 | |
YP_009724397.2 | |
YP_009725255.1 |
Table containing the references of the protein sequences of SARS-CoV-2 and HIV-1/2