Cetacean science: A new understanding of humpback whale genetics
Humpback whale with calf off Moorea, French Polynesia (Photo: Charles J. Sharp)
How a team of CU Boulder PhD students produced the first chromosome-level reference genome for humpback whales
Humpback whales are striking animals, not only because of their size, but also because of their complex vocalizations, acrobatic swimming and thousand-mile migrations.
Moreover, they hold a vital role in marine ecosystems, as their fecal matter, which is released as floating plumes, fertilizes the upper layer of the ocean and stimulates the growth of the photosynthesizing plankton there. These plankton are the basis of the marine food chain and are major contributors to the global carbon cycle.
PhD student Maria-Vittoria Carminati worked with colleagues to create the first chromosome-level reference genome for humpback whales.
Despite the importance and charisma of humpback whales, research into the species has been limited by the lack of complete genetic information.
Maria-Vittoria Carminati, a PhD student in the University of baby直播app Boulder Department of Ecology and Evolutionary Biology, changed this when, along with Associate Professor of Ecology and Evolutionary Biology Nolan Kane and a team of fellow graduate students*, she created the first chromosome-level reference genome for the species.
Moving the needle
Carminati became an attorney in 2008 and worked in that field until recently. 鈥淚 came to the realization that I wanted to do something more meaningful with my brain power,鈥 she says. 鈥淭hat鈥檚 why I switched to science: I thought it would allow me to make greater contributions to society.
鈥淪o, three years ago, I went back to college and got my bachelor鈥檚 in ecology and evolutionary biology.鈥 After that, she started her PhD at CU Boulder. There remained the question of what she would do to 鈥渕ove the needle forward,鈥 but Carminati knew it would probably involve the ocean.
鈥淚鈥檓 a diver, I鈥檓 a dive instructor, I like to sail even though I鈥檓 not very good at it,鈥 she continues. After seeing a humpback whale in person one day, she started reading about them and found a paper that mentioned they were splitting into different subspecies. 鈥淚 thought the paper was trying its best, but I don鈥檛 think it had the tools it needed to be assertive about what it was saying.鈥
One of those tools is a reference genome. So, Carminati went to for funding and to for the sequencing. She got a permit to sequence the humpback DNA sample from the and obtained the sample itself from the .
The sample was from the kidney of an orphaned whale calf that was beached and died on the shore of Hawaii Kai.
Cantata Bio鈥檚 sequencing yielded half a terabyte of data, which Kane tasked a class to help Carminati process.
A humpback whale swimming off the coast of Moorea, French Polynesia. (Photo: /Wikimedia Commons)
The basics of genome sequencing
Genome sequencing is the process scientists use to determine a large amount, if not the entirety, of an organism鈥檚 DNA, which is packaged in threadlike structures called chromosomes. Because the entire length of a chromosome cannot be sequenced at once, several strips are sequenced and then combined in what is known as a genome assembly.
The product of the researchers鈥 work is called a reference assembly. According to Carminati, this means that the chromosomes are represented well enough to be used in comparison with the DNA of other organisms. 鈥淚t鈥檚 like having the full book of an organism鈥檚 DNA,鈥 she says. 鈥淚n our case, we are only missing 0.0003% of the entire genome.鈥
This level of accuracy distinguishes their assembly from others, such as the scaffold-level assembly of the humpback whale genome that already existed. To continue the book analogy, this level of assembly can be compared to a collection of passages that cannot be definitively ordered or associated with a particular 鈥渃hapter,鈥 or chromosome.
Such uncertainty is partially the result of short read lengths. 鈥淪hort reads are cheaper, so often, labs will do short reads,鈥 Carminati says. 鈥淭he problem with a short read is that you are only getting, say, a couple of sentences from each page in the book.鈥 These few sentences are less distinctive than longer passages, which leaves more doubt in the final genome assembly.
The DNA in the researchers鈥 assembly was created from long reads, which allows it to be organized into chromosomes. Their assembly also had a high depth, which is to say that reads were performed 30 times to ensure accuracy, consistent with the platinum standard introduced by Philip Morin of the .
Insight and annotation
While this chromosome-level genome was created too recently for researchers to have made discoveries by using it, Carminati says that the resource can be expected to provide insights into interesting traits of humpback whales, such as their cell regulation, large size and cancer resistance, as well as the formation of subspecies and other elements of genetic variation.
A humpback whale breaches off the coast of Tahiti. (Photo: /Wikimedia Commons)
鈥淲e are right at the beginning of this process,鈥 Carminati explains, 鈥渂ut the reason that you can start making those insights is because if you have a platinum-level assembly, you have a far greater degree of certainty of what genes are and are not there.鈥 This will allow scientists to tell with certainty whether a gene exists, does not exist or exists and is expressed multiple times.
鈥淭hat goes to cell regulation and cancer resistance,鈥 Carminati says, 鈥渂ecause, for example, if you have a lot of genes that relate to cell regulation, cell repair and cell control, that indicates a cancer-preventing or cancer-halting mechanism because cancer is the result of the misregulation of cell division.
鈥淪o, if you have multiple genes like this, that might be one way that these enormous, 40-ton creatures are able to get so big and have so much cell division but not develop cancer.鈥
Other insights could be provided by synteny analyses, which are comparisons between sets of chromosomes. According to Carminati, these comparisons can help identify conserved areas: regions of genes that are unlikely to be rearranged between generations. When genes are together in a conserved area, this could indicate that they work together or are necessary for each other鈥檚 function.
The researchers performed a synteny analysis between the chromosomes from the humpback whale reference genome and the chromosomes of a blue whale. Synteny analyses can also indicate evolutionary relationships, and their analysis showed that there is a high level of consistency in the evolutionary relationships between the two species.
They also used BUSCOs (benchmarking universal single-copy orthologs), which are genetic reference guides developed in Switzerland, to evaluate genome completeness. BUSCO genes for mammals correspond to common mammalian traits, Carminati says, like lactation, placentas and live births. This analysis showed high completeness, too, but also represents another possible application of the reference genome: comparing whales to other mammals.
鈥淲e said, 鈥榃hat genes within this mammal BUSCO reference list do both of these creatures [humpback and blue whales] have, but more interestingly, which ones do they not have?鈥欌 Spending more time with this sort of analysis in the future could provide information about the evolution of whales, since missing mammalian genes would have either served no purpose to whales or even been counterproductive.
Finally, the researchers asked Cantata Bio start to annotate the reference genome. 鈥淎nnotation tells you what genes are where,鈥 Carminati says, and it is a necessary part of genome analysis. The annotation has not been made public yet, since the process is ongoing.
However, the research has already drawn attention, since Carminati presented it at the International Marine Conservation Conference in Cape Town, South Africa, last month. 鈥淪o,鈥 Carminati says, 鈥淚 went from seeing a humpback whale in Hawaii to presenting a genome in Cape Town. Four years ago, I was trying cases. It is a very surreal trajectory.鈥
*Contributing graduate students are Vlonjat Lonnie Gashi, Ruiqi Li, Daniel Jacob Klee, Sara Rose Padula, Ajay Manish Patel, Andy Dick Yee Tan and Jacqueline Mattos.
Did you enjoy this article? Passionate about ecology and evolutionary biology? Show your support.