The gene/protein map for NC_009972 is currently unavailable.
Definition Herpetosiphon aurantiacus ATCC 23779 chromosome, complete genome.
Accession NC_009972
Length 6,346,587

Click here to switch to the map view.

The map label for this gene is celB [H]

Identifier: 159899339

GI number: 159899339

Start: 3586898

End: 3588799

Strand: Direct

Name: celB [H]

Synonym: Haur_2820

Alternate gene names: 159899339

Gene position: 3586898-3588799 (Clockwise)

Preceding gene: 159899336

Following gene: 159899345

Centisome position: 56.52

GC content: 48.0

Gene sequence:

>1902_bases
ATGTCGCAGAAACAACGTTCGTTTCGCACCCGCCTAGCCATGATCGGCGGGCTAGTTACCCTACTCTTGGCTGGTCAGCC
CGCTACTACCAAGCCAACCGCTGCCGCCGCTATCTGCGAGGTCACTTACACAATCTCGAATCAGTGGTCTACTGGCTTCA
CTGCTAATGTGAGTGTTAAGAATCTTGGGATCGGTCTCAATAATTGGCAGGTTGGCTGGACATTCGCGGGCAATCAGGCA
ATTACCAATCTTTGGAATGGGGTGCTCACCCAAACTGGCGCTCAGGTCAGTGTGTCAAATCCAGCATGGGCCGCCAGTTT
ACCCAGCAATGGTACTGCCAGCTTTGGTTTCCAAGCTTCGTATACTGGTAGTAATGCGATTCCAAACGCATTTACCTTGA
ATGGCGTTAGTTGCAACGGCGATCAGCCAAGCCCAATGCCTACAAATACTGCTATTCCGAGCATTCCACCTGCCACCAAC
ACCCCTAATCCGCCAACCAATACGCCAATTGCTACCACAACCGGAACCCCTCGCCCAACCAATACGCCAACCAGCGTCAT
TCCAACGGTCACGAATACACCTCGCCCAACCAATACTCCAGTTCCAACCACGGTCAATCCAACGGCTACGAGTACCCCAA
CTGGTAATAACAATAATGATGATTGGCTCCACACCAATGGCAATCAAATTGTTGATAGCGCAGGTCGCCCAGTTTGGTTA
ACTGGAGTCAATTGGTTTGGCTTCAATGCAACTGAGCGGGTGTTTCATGGCTTGTGGTCGGCCAATTTGACCAGCATGAT
GCAAAGCATTTCGCAACGTGGATTGAACATTATTCGCGTACCAATCTCAACTGAATTGATTTTGGAGTGGAAAGCCGGGG
TTTTCAAAACACCAAATGTCAACACTTACGCCAATCCTGAATTAGAAGGCTTAACCTCGTTGCAAATATTTGATCGCTTC
GTGATGCTTTCAAAGCAATTTGGCATTAAGGTGATGATCGATGTGCATAGCGCCGAAGCCGATAATTCAGGCCATTATGC
GCCACTCTGGTACAAAGGTTCGTTTACCAGCGAGCAGTTTTATCAGGCTTGGGAGTGGATTACTGATCGTTACAAAAATG
ACGATACGGTGATCGCAATGGATATTAAGAATGAGCCACACGGCACGGCCCACGATAATCAAACCAGCAGTCAATTTGCC
AAATGGGATAACTCGACCGATATCAACAACTGGAAATACGTTTGCGAAACTGCTAGCAAACGAATTTTGGCGATTAACCC
TAATGTCTTGGTGCTATGCGAAGGCAACGAGGTTTATCCAAAGGCCGGCGCAAGCTATACCTCAAGCAACAAAAATGATT
ACTACTTTACCTGGTGGGGCGGAAATTTACGTGGCGTGCGTGATTATCCGGTCAATCTTGGCAGCAACCAAGATCAATTG
GTCTACTCGCCACACGATTACGGCCCGTTGGTCTTCAATCAATCGTGGTTCTACCCTGGTTTTACCAAAGAAACGCTTTA
CAACGATGTTTGGTATCCTAATTGGTTTTTTATCCATGAAGAAAATATTGCGCCATTGTTTATTGGCGAATGGGGTGGCT
TTTTGGATGGTGGCGCAAATGAACAATGGATGAAGGCGTTGCGCGATTTGATCAAAGAGCACTATCTACACCATACCTTC
TGGGTACTCAACCCCAATTCTGGCGACACTGGCGGTTTGCTCGGATACGATTGGGCCACTTGGGATGAGGCTAAATATGC
CTTGCTCAAGCCAGCCTTGTGGGCAGATCGCAATGGTAAATTTGTCAGCCTCGATCATCAAATTCCGCTCGGTGGCACAG
CTACTGGCACAACCATTACCCAATATTATCAACAGGGCAACCAAGCTCCAAGCAATCCCTAA

Upstream 100 bases:

>100_bases
TTAGAATTGAATTGATTTGGGATGGCCTTTCTTAGGGATCGGTCTCAATCAAGATTGATCTAATCAGTTCTGCTATCATT
TCAAATACGAGGGAGTATCT

Downstream 100 bases:

>100_bases
TTGATCGATGAGCAACAAAACAAAAGCCTTGACAATTCAAATTGTCAAGGCTTTTCGATTTATGTCAATAAAAATCTAGC
GACAACCACAGCTGCTCGCT

Product: glycoside hydrolase family protein

Products: NA

Alternate protein names: Endoglucanase; Cellobiohydrolase; Cellulase; Endo-1,4-beta-glucanase; Exoglucanase; 1,4-beta-cellobiohydrolase; Exocellobiohydrolase [H]

Number of amino acids: Translated: 633; Mature: 632

Protein sequence:

>633_residues
MSQKQRSFRTRLAMIGGLVTLLLAGQPATTKPTAAAAICEVTYTISNQWSTGFTANVSVKNLGIGLNNWQVGWTFAGNQA
ITNLWNGVLTQTGAQVSVSNPAWAASLPSNGTASFGFQASYTGSNAIPNAFTLNGVSCNGDQPSPMPTNTAIPSIPPATN
TPNPPTNTPIATTTGTPRPTNTPTSVIPTVTNTPRPTNTPVPTTVNPTATSTPTGNNNNDDWLHTNGNQIVDSAGRPVWL
TGVNWFGFNATERVFHGLWSANLTSMMQSISQRGLNIIRVPISTELILEWKAGVFKTPNVNTYANPELEGLTSLQIFDRF
VMLSKQFGIKVMIDVHSAEADNSGHYAPLWYKGSFTSEQFYQAWEWITDRYKNDDTVIAMDIKNEPHGTAHDNQTSSQFA
KWDNSTDINNWKYVCETASKRILAINPNVLVLCEGNEVYPKAGASYTSSNKNDYYFTWWGGNLRGVRDYPVNLGSNQDQL
VYSPHDYGPLVFNQSWFYPGFTKETLYNDVWYPNWFFIHEENIAPLFIGEWGGFLDGGANEQWMKALRDLIKEHYLHHTF
WVLNPNSGDTGGLLGYDWATWDEAKYALLKPALWADRNGKFVSLDHQIPLGGTATGTTITQYYQQGNQAPSNP

Sequences:

>Translated_633_residues
MSQKQRSFRTRLAMIGGLVTLLLAGQPATTKPTAAAAICEVTYTISNQWSTGFTANVSVKNLGIGLNNWQVGWTFAGNQA
ITNLWNGVLTQTGAQVSVSNPAWAASLPSNGTASFGFQASYTGSNAIPNAFTLNGVSCNGDQPSPMPTNTAIPSIPPATN
TPNPPTNTPIATTTGTPRPTNTPTSVIPTVTNTPRPTNTPVPTTVNPTATSTPTGNNNNDDWLHTNGNQIVDSAGRPVWL
TGVNWFGFNATERVFHGLWSANLTSMMQSISQRGLNIIRVPISTELILEWKAGVFKTPNVNTYANPELEGLTSLQIFDRF
VMLSKQFGIKVMIDVHSAEADNSGHYAPLWYKGSFTSEQFYQAWEWITDRYKNDDTVIAMDIKNEPHGTAHDNQTSSQFA
KWDNSTDINNWKYVCETASKRILAINPNVLVLCEGNEVYPKAGASYTSSNKNDYYFTWWGGNLRGVRDYPVNLGSNQDQL
VYSPHDYGPLVFNQSWFYPGFTKETLYNDVWYPNWFFIHEENIAPLFIGEWGGFLDGGANEQWMKALRDLIKEHYLHHTF
WVLNPNSGDTGGLLGYDWATWDEAKYALLKPALWADRNGKFVSLDHQIPLGGTATGTTITQYYQQGNQAPSNP
>Mature_632_residues
SQKQRSFRTRLAMIGGLVTLLLAGQPATTKPTAAAAICEVTYTISNQWSTGFTANVSVKNLGIGLNNWQVGWTFAGNQAI
TNLWNGVLTQTGAQVSVSNPAWAASLPSNGTASFGFQASYTGSNAIPNAFTLNGVSCNGDQPSPMPTNTAIPSIPPATNT
PNPPTNTPIATTTGTPRPTNTPTSVIPTVTNTPRPTNTPVPTTVNPTATSTPTGNNNNDDWLHTNGNQIVDSAGRPVWLT
GVNWFGFNATERVFHGLWSANLTSMMQSISQRGLNIIRVPISTELILEWKAGVFKTPNVNTYANPELEGLTSLQIFDRFV
MLSKQFGIKVMIDVHSAEADNSGHYAPLWYKGSFTSEQFYQAWEWITDRYKNDDTVIAMDIKNEPHGTAHDNQTSSQFAK
WDNSTDINNWKYVCETASKRILAINPNVLVLCEGNEVYPKAGASYTSSNKNDYYFTWWGGNLRGVRDYPVNLGSNQDQLV
YSPHDYGPLVFNQSWFYPGFTKETLYNDVWYPNWFFIHEENIAPLFIGEWGGFLDGGANEQWMKALRDLIKEHYLHHTFW
VLNPNSGDTGGLLGYDWATWDEAKYALLKPALWADRNGKFVSLDHQIPLGGTATGTTITQYYQQGNQAPSNP

Specific function: This protein is made up of two domains:the N-terminal domain has exoglucanase activity while the C-terminal domain is an endoglucanase [H]

COG id: COG2730

COG function: function code G; Endoglucanase

Gene ontology:

Cell location: Cytoplasmic

Metaboloic importance: NA

Operon status: Not Known

Operon components: None

Similarity: Contains 1 CBM3 (carbohydrate binding type-3) domain [H]

Homologues:

None

Paralogues:

None

Copy number: NA

Swissprot (AC and ID): NA

Other databases:

- InterPro:   IPR008965
- InterPro:   IPR001956
- InterPro:   IPR001000
- InterPro:   IPR001547
- InterPro:   IPR018087
- InterPro:   IPR017853
- InterPro:   IPR013781 [H]

Pfam domain/function: PF00942 CBM_3; PF00150 Cellulase; PF00331 Glyco_hydro_10 [H]

EC number: =3.2.1.4; =3.2.1.91 [H]

Molecular weight: Translated: 69860; Mature: 69729

Theoretical pI: Translated: 6.09; Mature: 6.09

Prosite motif: PS00659 GLYCOSYL_HYDROL_F5

Important sites: NA

Signals:

None

Transmembrane regions:

None

Cys/Met content:

0.6 %Cys     (Translated Protein)
1.4 %Met     (Translated Protein)
2.1 %Cys+Met (Translated Protein)
0.6 %Cys     (Mature Protein)
1.3 %Met     (Mature Protein)
1.9 %Cys+Met (Mature Protein)

Secondary structure:

>Translated Secondary Structure
MSQKQRSFRTRLAMIGGLVTLLLAGQPATTKPTAAAAICEVTYTISNQWSTGFTANVSVK
CCHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHEEEEEEEEEECCCCCCCEEEEEEEE
NLGIGLNNWQVGWTFAGNQAITNLWNGVLTQTGAQVSVSNPAWAASLPSNGTASFGFQAS
EEECCCCCCEEEEEEECCHHHHHHHHHHHHCCCCEEEECCCCEEEECCCCCCCCCCEEEE
YTGSNAIPNAFTLNGVSCNGDQPSPMPTNTAIPSIPPATNTPNPPTNTPIATTTGTPRPT
CCCCCCCCCCEEECCEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEECCCCCCCC
NTPTSVIPTVTNTPRPTNTPVPTTVNPTATSTPTGNNNNDDWLHTNGNQIVDSAGRPVWL
CCCCEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEECCCCEEECCCCCEEEE
TGVNWFGFNATERVFHGLWSANLTSMMQSISQRGLNIIRVPISTELILEWKAGVFKTPNV
EECCEECCCHHHHHHHHHHHHHHHHHHHHHHHCCCEEEEECCCEEEEEEECCCEEECCCC
NTYANPELEGLTSLQIFDRFVMLSKQFGIKVMIDVHSAEADNSGHYAPLWYKGSFTSEQF
CCCCCCCCCCCHHHHHHHHHHHHHHCCCEEEEEEEECCCCCCCCCEEEEEEECCCCHHHH
YQAWEWITDRYKNDDTVIAMDIKNEPHGTAHDNQTSSQFAKWDNSTDINNWKYVCETASK
HHHHHHHHHHCCCCCEEEEEEECCCCCCCCCCCCCHHHHHHCCCCCCCCCHHHHEECCCC
RILAINPNVLVLCEGNEVYPKAGASYTSSNKNDYYFTWWGGNLRGVRDYPVNLGSNQDQL
EEEEECCCEEEEECCCEECCCCCCCCCCCCCCCEEEEEECCCCCCCEECCCCCCCCCCEE
VYSPHDYGPLVFNQSWFYPGFTKETLYNDVWYPNWFFIHEENIAPLFIGEWGGFLDGGAN
EECCCCCCCEEECCCCCCCCCCHHHHHCCCCCCCEEEEECCCCCEEEEECCCCCCCCCCC
EQWMKALRDLIKEHYLHHTFWVLNPNSGDTGGLLGYDWATWDEAKYALLKPALWADRNGK
HHHHHHHHHHHHHHCCEEEEEEEECCCCCCCCEEECCCCCCCCCCCEEECCCCEECCCCC
FVSLDHQIPLGGTATGTTITQYYQQGNQAPSNP
EEEECCCCCCCCCCCCHHHHHHHHCCCCCCCCC
>Mature Secondary Structure 
SQKQRSFRTRLAMIGGLVTLLLAGQPATTKPTAAAAICEVTYTISNQWSTGFTANVSVK
CHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHEEEEEEEEEECCCCCCCEEEEEEEE
NLGIGLNNWQVGWTFAGNQAITNLWNGVLTQTGAQVSVSNPAWAASLPSNGTASFGFQAS
EEECCCCCCEEEEEEECCHHHHHHHHHHHHCCCCEEEECCCCEEEECCCCCCCCCCEEEE
YTGSNAIPNAFTLNGVSCNGDQPSPMPTNTAIPSIPPATNTPNPPTNTPIATTTGTPRPT
CCCCCCCCCCEEECCEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEECCCCCCCC
NTPTSVIPTVTNTPRPTNTPVPTTVNPTATSTPTGNNNNDDWLHTNGNQIVDSAGRPVWL
CCCCEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEECCCCEEECCCCCEEEE
TGVNWFGFNATERVFHGLWSANLTSMMQSISQRGLNIIRVPISTELILEWKAGVFKTPNV
EECCEECCCHHHHHHHHHHHHHHHHHHHHHHHCCCEEEEECCCEEEEEEECCCEEECCCC
NTYANPELEGLTSLQIFDRFVMLSKQFGIKVMIDVHSAEADNSGHYAPLWYKGSFTSEQF
CCCCCCCCCCCHHHHHHHHHHHHHHCCCEEEEEEEECCCCCCCCCEEEEEEECCCCHHHH
YQAWEWITDRYKNDDTVIAMDIKNEPHGTAHDNQTSSQFAKWDNSTDINNWKYVCETASK
HHHHHHHHHHCCCCCEEEEEEECCCCCCCCCCCCCHHHHHHCCCCCCCCCHHHHEECCCC
RILAINPNVLVLCEGNEVYPKAGASYTSSNKNDYYFTWWGGNLRGVRDYPVNLGSNQDQL
EEEEECCCEEEEECCCEECCCCCCCCCCCCCCCEEEEEECCCCCCCEECCCCCCCCCCEE
VYSPHDYGPLVFNQSWFYPGFTKETLYNDVWYPNWFFIHEENIAPLFIGEWGGFLDGGAN
EECCCCCCCEEECCCCCCCCCCHHHHHCCCCCCCEEEEECCCCCEEEEECCCCCCCCCCC
EQWMKALRDLIKEHYLHHTFWVLNPNSGDTGGLLGYDWATWDEAKYALLKPALWADRNGK
HHHHHHHHHHHHHHCCEEEEEEEECCCCCCCCEEECCCCCCCCCCCEEECCCCEECCCCC
FVSLDHQIPLGGTATGTTITQYYQQGNQAPSNP
EEEECCCCCCCCCCCCHHHHHHHHCCCCCCCCC

PDB accession: NA

Resolution: NA

Structure class: Alpha Beta

Cofactors: NA

Metal ions: NA

Kcat value (1/min): NA

Specific activity: NA

Km value (mM): NA

Substrates: NA

Specific reaction: NA

General reaction: NA

Inhibitor: NA

Structure determination priority: 9.0

TargetDB status: NA

Availability: NA

References: 2789517 [H]