Session 29 – Bioinformatics Literature

29.1 Peer-reviewed bioinformatics journals

More than +25 journals exist that are focused on bioinformatics research and tools. Many of those report the development of new tools and software with applications in computational medicine and biology. Most articles in these journals undergo peer-review evaluation, which is when one or more researchers with similar competencies as the authors (i.e., peers) determine if the presented research or paper is valid, reproducible, and the interpretation of results are accurate and consistent with the current knowledge on the subject. This process allows to improve the quality of science published and prevents flawed or misinterpreted research to become public. Most scientists, industry, and policymakers play most attention to these articles.

Here is a list of some of the most widely cited journals in bioinformatics or with a strong emphasis on research derived from bioinformatics. All these journals publish articles after peer-reviewed evaluations:

  1. Bioinformatics (2019, impact factor IF: 5.610). Mostly brief notes on new software and some reviews.
  2. Briefings in Bioinformatics (2019, impact factor IF: 8.990). This journal publishes reviews for the users of databases and analytical tools of contemporary genetics, molecular and systems biology and is unique in providing practical help and guidance to the non-specialist in computerized methodology.
  3. PLOS Computational Biology (does not provide impact factor). This journal publishes articles focused on the application of computational methods from molecules to ecosystems.
  4. BMC Bioinformatics (2019, impact factor IF: 3.169). This is an open access journal that considers articles describing novel computational algorithms and software, models and tools, including statistical methods, machine learning and artificial intelligence, as well as systems biology.
  5. GigaScience (2020, impact factor IF: 6.524). This is an open access journal focusing on ‘big data’ research from the life and biomedical sciences.
  6. Molecular Biology and Evolution (2020, impact factor IF: 16.240). This is an open access journal focusing on patterns and processes that impact the evolution of life at molecular, it usually includes new and novel applications of bioinformatic tools.
  7. Genome Biology and Evolution (2020, impact factor IF: 3.416). This journal focusses on research at the interface between evolutionary biology and genomics.
  8. Nucleic Acids Research (2020, impact factor IF: 16.971). This journal publishes research into physical, chemical, biochemical and biological aspects of nucleic acids and proteins involved in nucleic acid metabolism and/or interactions.
  9. Methods in Ecology and Evolution (2020, impact factor IF: 7.780). This journal publishes articles that promote development of new methods (usually computational) in ecology & evolution.
  10. Nature Methods (2020, impact factor IF: 28.550). This journal publishes detailed protocols for diverse fields of the biological, chemical, and clinical sciences.
  11. Molecular Systems Biology (2020, impact factor IF: 11.400). This is an open access journal focusing on the fields of systems biology, synthetic biology and systems medicine.
  12. Genome Research (2020, impact factor IF: 9.043). This journal publishes studies on the structure, function, biology, and evolution of genomes.
  13. Genome Biology (2020, impact factor IF: 13.583). This journal publishes research in all areas of biology and biomedicine from a genomic and post-genomic perspective.
  14. Systematic Biology (2020, impact factor IF: 15.683). This journal publishes research about the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things.

Note: Impact factor or IF/JIF is a relative index of the overall quality of journal and it is calculated as the yearly mean number of citations of articles published in the last two years. IF is considered as a proxy for the importance and prestige of a journal within its field of study. Low IF journals (e.g., those with less than 2.0) are considered as less impactful than those with higher IF. However, this index is not without controversy, here is other source on this controversy.

29.2 Access to journals articles

Most published research is now accessible through online sources (i.e., print forms are becoming obsolete). However, to access such articles, many of the journals above require an institutional or personal subscription that can be costly. However, many of these papers are provided to readers for free through different databases that are public.

Here is a list of some of the most comprehensive collections of free papers that accessible to the public:

  1. PubMed. PubMed is a free search engine accessing primarily the MEDLINE database of references, abstracts and online books on life sciences and biomedical topics. Citations may include links to full text content from PubMed Central and publisher web sites.
  2. Zenodo. This open-access repository that allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts.
  3. bioRxiv. This is an open access preprint repository for the biological sciences. It provides preprints and papers that are not peer-reviewed.
  4. ResearchGate. This is is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators.
  5. Academia. This is for-profit social networking website for academics. You might find full papers, yet it requires a bit navigating or searching through browsers.

Unfortunately, many recently published papers are not available on the sources above and you might get stuck trying to get a PDF of a paper of interest. However, most universities can provide you a PDF copy of a paper of interest through the Interlibrary Loan Service or ILS where a researcher of an institution that does not have access to article, book, or other library resource can receive a PDF or the actual document from another library that has access to it. The process requires that the user makes a request through ILS with their home library. Your institution library will act as an intermediary, identifies libraries with the desired item, places the request, receives the item, makes it available to the user, as well as arranges for its return (if necessary). For books, the lending library usually sets a due date and overdue fees of the material borrowed.

29.3 Informal access to code: Preprints and vignettes

The top bioinformatic journals (and other publications) provide a formal outlet about bioinformatics research, yet most research in biology now involves some level of development and implementation of bioinformatic tools. This trend is consequence of the rapid expansion, generation, and accessibility of big datasets that require some level of automation and complex computation analyses. In most multidisciplinary research articles, it has become customary that bioinformatics code is provided as annexes, appendices, supplementary materials, or hosted in dedicated online databases such as GitHub. Likewise, many authors prefer to publish manuals and description of their bioinformatic methods as a preprints, which are draft versions of a research article that precedes formal peer-review process and its publication in a journal. These preprints are usually hosted in specialized databases such as bioRxiv. For introductory, manuals, application examples, and guidelines on use of specific bioinformatic tools (like those written in R language), you might find vignettes associated with a specific R package in the CRAN or Bioconductor repositories.

Here is an example of how to access the DESeq2 R-package vignette, which is a package for the analysis of differential gene expression analysis.

## If you need to install this package in your computer:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")

## If you already have installed the DESeq2 package, load it in R space:

library(DESeq2)

## To view documentation and vignette associated with DESeq2:

browseVignettes("DESeq2")

29.4 Online bioinformatics courses

Now days most students and researchers can find free bioinformatics classes that range in depth, quality, specificity, and usefulness. It all depends on your needs and time that you are willing to dedicate to such courses. Some are short (few hours) and others are semester/months long. In my experience, those that are short are not really useful unless the topic is something very specific that enhances or expands your previous knowledge on the topic (i.e., data mining for genomics, if you already know basic data mining). Those courses that more useful for students starting in bioinformatics are those classes that take more time, require you to do exercises, and do lots of coding exercises and provides you with worked examples. Nevertheless, the regular practice and goal-oriented implementation of your basic coding skills are, in my opinion, the best form to improve your ability to do coding in a meaningful way.

Here is a list of some of free online courses that you can access, yet I have not tried or revised most of them.

  1. https://www.coursera.org/learn/r-programming. This is a very popular online class for beginners focused on how to program in R and how to use it for effective data analysis. The course covers practical issues that include programming, reading data, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. It includes examples and exercises.
  2. https://www.coursera.org/learn/data-analysis-r. This course introduces data analysis using R. It provides an overview of RStudio withR packages relevant to organize, analyze, visualize, and report data.
  3. https://www.coursera.org/learn/introducton-r-programming-data-science. This is a basic course in R programming for data analysis.