Conclusions
At the conclusion of this project, it was determined that there are not specific chromosomes that are responsible for more oncogenic mutations, but that the ends of chromosomes, and specifically the beginning, have a higher density of cancer-associated mutations when compared to the rest of the chromosome (Figure 1, Figure 2). Regarding the distribution of types of mutations across somatic and germline DNA, the somatic mutations group was composed primarily of point mutations, while the germline mutation group has high percentages of point mutations and deletions (Figure 3). Following this, the count of mutations for each cancer-associated gene was calculated and compared to the count of responsive therapies for the corresponding gene (Figure 4). Although this visualization was not comprehensive as it only shows genes with over 150 associated mutations, occurrences of genes having a disproportionate number of responsive therapeutics when compared to associated mutations did occur. For example, adenomatous polyposis coli (APC) has 258 mutation occurrences and is found in 80% of sporadic colorectal tumors [15], but there is only one responsive drug available. On the other hand Erb-B2 receptor tyrosine kinase (ERBB2), has 177 mutations, is found in 30% of breast cancers [22], and has 75 responsive therapeutics. Lastly, the distribution of responses of cancer treatments was promising, and provided users with a simple way to investigate a gene’s associated response to a therapeutic or its corresponding drug family (Figure 5).
Reflection, disclaimers, and additional considerations
By completing this project, I have created three components using Python and JavaScript that I plan to use for future research projects. The HGVS converter will be extremely useful as I often work with genomic datasets, and I am excited to utilize this new package for future projects. The visual representation of mutations on the human genome (Figure 1) will be reused for understanding the mutation distribution of other diseases. Lastly, the zoomable sunburst chart (Figure 5), will be reused when I need to visualize multiple layers of data.
Despite these results, it should be stated that mutations should not be observed in isolation, and that a better determination of the mechanisms and patterns of cancer can be elucidated by interpreting the mutations with additional biomarkers, symptoms, demographics, and environmental factors. Although the data sources are stated, there is a risk of outdated or inaccurate data reporting. The clinical space is constantly changing, and since the validated cancer-associated mutations dataset has not been updated since 2018, there is a likelihood that new mutations have become relevant. In addition, any conclusions derived from this project should not be considered professional medical advice. This project is for informational purposes only and is not a substitute for professional medical advice.
Contact Me
Preferred email: catarina.bettencourt.a@gmail.com | School email: bettencourt.c@northeastern.edu
LinkedIn | Bluesky