Biopython extract cds from modified genbank records

When working with genbank records in Biopython, it is often necessary to extract the coding sequences (CDS) from the records. This can be a challenging task, especially when dealing with modified genbank records. In this article, we will explore three different ways to extract CDS from modified genbank records using Biopython.

Option 1: Using SeqIO

The first option is to use the SeqIO module from Biopython. This module provides a simple and efficient way to read and write sequences in various formats, including genbank. Here is a sample code that demonstrates how to extract CDS using SeqIO:


from Bio import SeqIO

def extract_cds_from_genbank(genbank_file):
    cds_list = []
    for record in SeqIO.parse(genbank_file, "genbank"):
        for feature in record.features:
            if feature.type == "CDS":
                cds_list.append(feature)
    return cds_list

genbank_file = "modified_genbank.gb"
cds_list = extract_cds_from_genbank(genbank_file)
for cds in cds_list:
    print(cds)

This code uses the SeqIO.parse() function to read the genbank file and iterate over each record. It then iterates over each feature in the record and checks if the feature type is “CDS”. If it is, the feature is added to the cds_list. Finally, the extracted CDS are printed.

Option 2: Using SeqFeature

The second option is to use the SeqFeature module from Biopython. This module provides a more flexible way to work with features in genbank records. Here is a sample code that demonstrates how to extract CDS using SeqFeature:


from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation

def extract_cds_from_genbank(genbank_file):
    cds_list = []
    for record in SeqIO.parse(genbank_file, "genbank"):
        for feature in record.features:
            if feature.type == "CDS":
                cds_feature = SeqFeature(FeatureLocation(feature.location.start, feature.location.end), type="CDS")
                cds_list.append(cds_feature)
    return cds_list

genbank_file = "modified_genbank.gb"
cds_list = extract_cds_from_genbank(genbank_file)
for cds in cds_list:
    print(cds)

This code follows a similar approach as the previous option, but instead of directly adding the feature to the cds_list, it creates a SeqFeature object using the FeatureLocation of the CDS. This allows for more flexibility in manipulating the extracted CDS.

Option 3: Using regular expressions

The third option is to use regular expressions to extract CDS from the genbank records. This approach can be useful when dealing with complex modifications in the genbank records. Here is a sample code that demonstrates how to extract CDS using regular expressions:


import re

def extract_cds_from_genbank(genbank_file):
    cds_list = []
    with open(genbank_file, "r") as file:
        genbank_data = file.read()
        cds_matches = re.findall(r"     CDS     ", genbank_data)
        for match in cds_matches:
            cds_list.append(match)
    return cds_list

genbank_file = "modified_genbank.gb"
cds_list = extract_cds_from_genbank(genbank_file)
for cds in cds_list:
    print(cds)

This code uses the re.findall() function to search for the “CDS” pattern in the genbank file. It then appends the matches to the cds_list. This approach can be useful when the genbank records have complex modifications that cannot be easily handled using the previous options.

After exploring these three options, it is clear that Option 1, using SeqIO, is the better choice for extracting CDS from modified genbank records. It provides a simple and efficient way to handle genbank files and allows for easy manipulation of the extracted CDS. However, depending on the specific requirements of the task, Option 2 or Option 3 may be more suitable.

Rate this post

10 Responses

  1. Option 3 using regular expressions for Biopython extraction? Seems like a regex adventure not worth taking! 🤷‍♂️

    1. Are you serious? Regular expressions are a powerful tool, not Stone Age technology. But hey, if you prefer SeqIO, go ahead and use it. Different strokes for different folks, right?

  2. Option 1 seems like the easiest way to extract cds, but Option 3 sounds intriguingly challenging! Whats your take?

    1. I totally agree! Option 1 is definitely the most convenient, but wheres the fun in that? Option 3 is like a puzzle waiting to be solved. Im all for a challenge!

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents