Session 27 – Python Subprocess

27.1 Introduction and Installation

For our last application of python we will utilize the subprocess module which will allow easy integration of outside software and even other coding languages in python.

More information on the subprocess module can be found here.

We will be working with software from the Joint Genome Institute from the Department of Energy to demonstrate how subprocess works with outside software.

More specifically we will work with BBTools for analyzing and manipulation of DNA and RNA sequence data. Note that this is just an example, you can use subprocess on any software you wish, as long as it has command-line compatibility. Lets go to the installation page BBTools and install it for our systems using this installation guide.

If you are working with a windows system, you will need 7-zip to uncompress the software once it is downloaded.

27.2 Creating your first python script w/ BBTools

Once you have downloaded and uncompressed BBTools they should be ready to be used. So now we can focus on creating a python pipeline to connect several features of BBTools. Note this is just for an example.

Very Important!!! Be sure to place the bbmap folder inside of your class work folder. This will make it easier for python to find the bbtools without you stating the path every time.

Before we begin copy the following FASTQ data into a new file labeled: ‘data.fasta’

>SRR123456789.001 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50
TTTCGTAGCTAGATCGACTGACTGCTGCTACTACGATCGACTGCTGCGGG
>SRR123456789.002 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50
TTTCGTAGCTAGATCGACTGACTGCTGCTACTACGATCGACTGCTGCGGG
>SRR123456789.003 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50
TTTCGTAGCTAGATCGACTGACTGCTGCTACTACGATCGACTGCTGCTTT
>SRR123456789.004 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50
TTTCGTAGCTAGATCGACTGACTGCTGCTACTACGATCGACTGCTGCGGG
>SRR123456789.005 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50
ACTGATCGATCGTCGATCGATCGCTCGTACGTGATCGATCGATCGTACGG
>SRR123456789.006 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50
ACTGATCGATCGTCGATCGATCGCTCGTACGTGATCGATCGATCGTACGG

We will now construct our own script, save the script below which will be called: ‘my_first_script.py’

#!/usr/bin/env python3
import subprocess
import os
import tkinter as tk
from tkinter import filedialog
from my_first_script import *

def main(): # this can be the main backbone to call the rest of your functions
    function_one() # this is one of many of the functions you can create to fuel your experiment
    function_two()

if __name__ == '__main__': # guards script from accidental usage outside of context
    main()  # run main script

def function_one():
    cwd = os.getcwd()
    root = tk.Tk()
    root.withdraw()
    in_file = filedialog.askopenfilename()
    subprocess.run(f"{cwd}/bbmap/reformat.sh in={in_file} out=reformat_result.fasta trd", shell=True)
    # reformat: trim headers

def function_two():
    cwd = os.getcwd()
    subprocess.run(f"{cwd}/bbmap/dedupe.sh in=reformat_result.fasta out=dedupe_result.fasta", shell=True)
    # dedupe: Remove duplicate contigs

Now we are ready to use our new script with our outside software.

  1. Open terminal in your system and change the working directory to your python work folder that we created for all our python stuff.

  2. Drag your script from the python work folder, the .py file we just made, into the terminal.

  3. Press enter to run the script.

The script will prompt you to open a file, you can select the fasta we just saved to the python work folder as well. The script will the perform the listed processes on that file and the output will be listed in you python work folder as: ‘reformat_result.fasta’ and ‘dedupe_result.fasta’

If it works, congrats! Feel free to use this script as a skeleton to incorporate your workflow into.