Genetic Patterns in Oncogenesis and Identifying Gaps in Treatment

Catarina Bettencourt


Understanding where mutations occur on the human nuclear genome

To visualize where mutations occur on the human genome, each mutation was plotted at its respective location on the average human genome. To avoid misrepresentation of the prevalence of mutations, the line is located at the start of each mutation. As a result, point mutations and deletions are the same size on this chart, but the length of a deletion or insertion is stated in the hover data. With this visualization, it can be observed that mutations associated with ‘any type of cancer’ are disproportionately located at the ends of chromosomes, which are close to the telomeres. Literature states that telomere dysfunction can lead to increased chromosome instability, including random chromosome breakage and formation of dicentric chromosomes [7]. This, in tandem with Figures 1 and 2, could suggest that areas near telomeres may be more susceptible to alterations that can lead to cancer.

In addition, the investigation of cancerous mutations of the X (23rd) chromosome was completed to determine if there was any link between X-chromosome mutations and sex-based dispositions to cancer. According to the NIH National Cancer Institute, leukemia is slightly more common in men that women [8] and Ozga et al. states that acute myeloid leukemia is more common in men than women [9]. Megakaryoblastic leukemia, a type of acute myeloid leukemia [10], has a mutation located on the X-chromosome, which is visible in Figure 1.

Figure 1. Visual representation of cancerous mutations on the average human chromosome. The light gray bars represent the average length of each chromosome in base pairs. The chromosomes are ordered from 1-24, with 23 representing the X chromosome and 24 representing the Y chromosome. Each dark gray line represents an mutation on the chromosome, and hovering over the line will return the location of the mutation, the associated cancer type, the associated gene, and the type of mutation. To compare a specific type of cancer to all mutations, use the drop-down menu to select a cancer type. The resulting dark purple lines are the mutations that are associated with the selected cancer.


Finding the most common mutation locations

Although it was qualitatively assumed that a large number of mutations are present at the ends of chromosomes, it was difficult to determine due to the occurrence of multiple mutations occurring in the same location. As a result, Figure 2 was created to visualize the density of mutation occurrences based on location. Although the X-axis is log transformed, it can be determined that the highest density of mutations occurs below the 10 million base pair mark. This confirmed the need for the density plot, as it was difficult to visualize how many mutations could be overlapping on the same line, even after the ability to zoom was added to the Y-axis of Figure 1.

Although this chart is ideal for confirming that many mutations occur at the top of the chromosome, it is difficult to confirm if the same is true for the end of the chromosome since chromosomes differ in length. To investigate a specific chromosome, the data would need to be filtered before the visualization is created.

Figure 2. Log transformed density plot for mutation occurrences based on base pair location. To see the number of mutation occurrences for each location on the chromosome, hover over the graph to see the density of mutations and mutation location. Please note that the X-axis is log-transformed to account for the high proportion of mutations that occur at the top of the chromosome.


Heredity of cancerous mutations by type

A major concern that people often have regarding cancer is if it is hereditary. Somatic mutations are not heritable, but germline mutations are heritable and can be screened for in genetic counseling. Although it cannot be confirmed if someone’s germline mutations will be passed down, or if someone with an inherited gene mutation will develop cancer, Figure 3 can at least visualize the differences between germline and somatic mutation types. For example, in the somatic column, many mutations that occurred were point mutations. One possible environmental cause for somatic point mutations is exposures to carcinogens, which are known to cause DNA damage such as double and single strand breaks [11]. Although there are DNA repair pathways to correct this damage, mutations can occur when the DNA repair does not occur correctly, leading to mutations such as point mutations [12]. It should be noted that not all point mutations are cancerous, as the average middle-aged person may have over 10^16 point mutations [13].

For germline mutations, the major type of mutation was split between point mutations and deletions, with the next major type of mutation being duplications. Deletions and duplications likely occur during the recombination step of meiosis, a type of cell division that occurs to produce gametes, also known as sperm and egg cells [14].

Figure 3. Normalized mutation types for somatic and germline mutations. The two columns represent germline and somatic mutations respectively. Both columns are normalized to represent percentages of each type of mutation. Hovering over the chart will show the count of mutation types, the type of mutation, and the heritability. Mutation types can be popped out by clicking the corresponding box in the key.


Locating the gap between cancer associated genes and responsive treatments

Following the analysis of the location and types of mutations that are associated with cancer, the focus of the rest of this project was directed toward current cancer treatments. For Figure 4, the list of current therapies was filtered to ‘responsive only’, to avoid misrepresenting treatments that do not work for an associated gene. Following this, any gene that had 150 associated mutations or more was plotted (pink), and any responsive treatments to that gene was also plotted (navy). From this, gaps or inadequacies in drug development can be visualized. For example, there are many drugs that are responsive to EGFR and BRAF mutations, but very few for APC. Upon further review of literature, it was found that the APC mutations are associated with 80-85% of sporadic colorectal cancers [15].

It should be noted that the number of mutations associated with a gene does not correspond to the frequency of the gene being the cause for cancer development. For example, when looking at the must commonly mutated genes that are associated with the development of cancer, TP53 is the most frequently mutated gene and was mutated in 36.6% of tumors which can be found in Figure 4. However MUC16, which is mutated in 18.9% of tumors, did not pass the 150 mutation threshold [16]. Further analysis would be ideal to recreate this visualization with the counts of a gene’s mutation occurrence in tumors.

Figure 4. Responsive drugs for genes with 150+ cancerous mutation occurrences. The stacked bar chart is organized by count of mutations for each gene, with the lower cut-off occurring at 150 mutations per gene. The navy bar represents the number of drugs that are responsive to the gene on the left, while the pink bar represents the number of cancerous mutations that occur on the given gene. Hovering over the bar will show the respective count of occurrences.


Gene to drug connection: Response of cancerous mutations to therapies

Since Figure 4 focused on responsive drugs, it was necessary to provide a visualization of all remaining drugs and their associated response to genes. The possible responses were:

  • Responsive: The treatment improves a patient’s condition
  • Resistant: The condition does not respond to the treatment as expected
  • No response: Treatment provides no change in the patient’s condition
  • Increased toxicity: Treatment causes adverse effects to the patient
  • Increased toxicity via myelosuppression: Treatment reduced or stopped the bone marrow’s ability to produce blood cells [17]
  • Increased toxicity via haemolytic anemia: Treatment caused the destruction of red blood cells [18]
  • Increased toxicity via ototoxicity: Treatment caused damage to the inner ear [19]
  • Increased toxicity via hyperbilirubinemia: Treatment resulted in high levels of bilirubin in the blood, which is associated with liver damage or dysfunction [20]


Depending on the validity of the dataset used, this type of visualization can allow the user to quickly find drug families that are responsive to a specific mutation, or to determine if the presence of a specific mutation may be associated with an adverse reaction.

Figure 5. Sunburst diagram detailing response type, associated gene and drug family for each drug. Above states the order for each ring of the sunburst diagram. Clicking on a section of the ring will show all sub contents of selected section. Hovering over a section will show corresponding parent rings. Clicking in the center of ring will return user to previous ring. Derived from Mike Bostock's zoomable sunburst [21].

Contact Me

Preferred email: catarina.bettencourt.a@gmail.com | School email: bettencourt.c@northeastern.edu

LinkedIn | Bluesky