|  Help  |  About  |  Contact Us

Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, diseases, strains, ontology terms, etc. (e.g. Pax6, Parkinson, ataxia)
  • Use OR to search for either of two terms (e.g. OR mus) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. Balb* for partial matches or mus AND NOT embryo to exclude a term

Search results 1401 to 1500 out of 1733 for Was

Category restricted to ProteinDomain (x)

0.025s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Type: Domain
Description: A carbohydrate-binding module (CBM) is defined as a contiguous amino acid sequence within a carbohydrate-active enzyme with a discreet fold having carbohydrate-binding activity. A few exceptions are CBMs in cellulosomal scaffolding proteins and rare instances of independent putative CBMs. The requirement of CBMs existing as modules within larger enzymes sets this class of carbohydrate-binding protein apart from other non-catalytic sugar binding proteins such as lectins and sugar transport proteins.CBMs were previously classified as cellulose-binding domains (CBDs) based on the initial discovery of several modules that bound cellulose [, ]. However, additional modules in carbohydrate-active enzymes are continually being found that bind carbohydrates other than cellulose yet otherwise meet the CBM criteria, hence the need to reclassify these polypeptides using more inclusive terminology.Previous classification of cellulose-binding domains were based on amino acid similarity. Groupings of CBDs were called "Types"and numbered with roman numerals (e.g. Type I or Type II CBDs). In keeping with the glycoside hydrolase classification, these groupings are now called families and numbered with Arabic numerals. Families 1 to 13 are the same as Types I to XIII. For a detailed review on the structure and binding modes of CBMs see [].This entry represents which was previously known as cellulose-binding domain family VI (CBD VI). CBM6 bind to amorphous cellulose, xylan, mixed beta-(1,3)(1,4)glucan and beta-1,3-glucan[, , ].CBM6 adopts a classic lectin-like β-jelly roll fold, predominantly consisting of five antiparallel β-strands on one face and four antiparallel β-strands on the other face. It contains two potential ligand binding sites, named respectively cleft A and B. These clefts include aromatic residues which are probably involved in the substrate binding. The cleft B is located on the concave surface of one β-sheet, and the cleft A on one edge of the protein between the loop that connects the inner and outer β-sheets of the jellyroll fold []. The multiple binding clefts confer the extensive range of specificities displayed by the domain [, , ].
Protein Domain
Type: Conserved_site
Description: Lectins occur in plants, animals, bacteria and viruses. Initially described for their carbohydrate-binding activity [], they are now recognised as a more diverse group of proteins, some of which are involved in protein-protein, protein-lipid or protein-nucleic acid interactions []. There are at least twelve structural families of lectins:C-type lectins, which are Ca+-dependent. S-type (galectins), a widespread family of glycan-binding proteins [].I-type, which have an immunoglobulin-like fold and can recognise sialic acids, other sugars and glycosaminoglycans [].P-type, which bind phosphomannosyl receptors [].Pentraxins [].(Trout) egg lectins.Calreticulin and calnexin, which act as molecular chaperones of the endoplasmic reticulum [].ERGIC-53 and VIP-36 [].Discoidins [].Eel aggutinins (fucolectins) [].Annexin lectins [].Fibrinogen-type lectins, which includes ficolins, tachylectins 5A and 5B, and Limax flavus (Spotted garden slug) agglutinin (these proteins have clear distinctions from one another, but they share a homologous fibrinogen-like domain used for carbohydrate binding).Also unclassified orphan lectins, including amphoterin, Cel-II, complement factor H, thrombospondin, sailic acid-binding lectins, adherence lectin, and cytokins (such as tumour necrosis factor and several interleukins).C-type lectins can be further divided into seven subgroups based on additional non-lectin domains and gene structure: (I) hyalectans, (II) asialoglycoprotein receptors, (III) collectins, (IV) selectins, (V) NK group transmembrane receptors, (VI) macrophage mannose receptors, and (VII) simple (single domain) lectins [].Therefore, lectins are a diverse group of proteins, both in terms of structure and activity. Carbohydrate binding ability may have evolved independently and sporadically in numerous unrelated families, where each evolved a structure that was conserved to fulfil some other activity and function. In general, animal lectins act as recognition molecules within the immune system, their functions involving defence against pathogens, cell trafficking, immune regulation and the prevention of autoimmunity [].
Protein Domain
Type: Family
Description: This entry represents the SKI/SnoN family of proteins, which are the products of the oncogenic sno gene. This gene was identified based on its homology to v-ski, the transforming component of the Sloan-Kettering virus. Both Ski and SnoN are potent negative regulators of TGF-beta []. Overexpression of Ski or SnoN results in oncogenic transformation of avian fibroblasts; however it may also result in terminal differentiation and therefore the Ski/SnoN mechanism of action is thought to be complex [].These proteins do not have catalytic or DNA-binding activity and therefore function primarily through interaction with other proteins, acting as transcriptional cofactors. Despite their lack of DNA-binding ability, their primary function is related to transcriptional regulation, in particular the negative regulation of TGF-beta signalling [, ]. Ski/SnoN interact concurrently with co-Smad and R-Smad and in doing so block the ability of the Smad complexes to activate transcription of the TGF-beta target genes []. Binding of Ski/SnoN may additionally stabilise the Smad heteromer on DNA, therefore preventing further binding of active Smad complexes []. As Smad complexes critically mediate the inhibitory signals of TGF-beta in epithelial cells, high levels of SKI/SnoN may promote cell proliferation. They repress gene transcription recruiting diverse corepressors and histone deacetylases and stablish cross-regulatory mechanisms with TGF-beta/Smad pathway that control the magnitude and duration of TGF-beta signals. The alteration in regulatory processes may lead to disease development [].High levels of SnoN have been shown to stabilise p53 with a resultant increase in premature senescence. SnoN interacts with the PML protein and is then recruited to the PML nuclear bodies, resulting in stabilisation of p53 and premature senescence [].
Protein Domain
Type: Family
Description: These proteins belong to MEROPS peptidase family S1 (chymotrypsin family, clan PA(S)), subfamily S1A.This family contains two mammalian proteins, complement C2 and complement factor B, which, respectively, have analogous roles in the classical and alternative pathways of complement activation. These proteins are composed of three regions, an N-terminal three-module complement control protein domain, a von Willebrand factor A domain, and a C-terminal serine protease domain. Briefly, they are activated by cleavage and function as the serine protease components of the C3/C5 convertases, which play similar roles in these pathways although composed of different proteins. Homologs in non-mammalian species are often more or less equally related to mammalian C2 and B and may be designated as complement B/C2. Strongylocentrotus purpuratus (Purple sea urchin) has an atypical factor B with a five-module complement control protein domain.The structures of the von Willebrand factor A and serine protease domains from human complement factor B () have been analysed [, ]. The A domain forms the classical vWF A domain fold, which consists of a central β-sheet flanked on both sides by amphipathic alpha helices. It contains an integrin-like MIDAS (metal ion-dependent adhesion site) motif that adopts the open conformation typical of integrin-ligand complexes, with an acidic residue from another A domain (provided by a fortuitous crystal contact) completing the coordination of the metal ion. Although a closed conformation was not observed, modelling studies suggest that the A domain could adopt this conformation, implying that as with integrins, ligand-binding may induce conformational changes which transduce a signal to other domains in the protein []. The serine protease domain forms a chymotrypsin fold with several novel features []. Like other serine proteases it forms two β-sheets, composed of six β-strands each, surrounded by surface helices and loops. However, several novel deletions and insertions occur within these surface helices and loops, and differences in active site conformation also exist.
Protein Domain
Type: Domain
Description: Phenylalanine-tRNA ligase (also known as phenylalanyl-tRNA synthetase) from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the ligase family. Identification of phenylalanine-tRNA ligase a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other ligases []. This is the N-terminal domain of phenylalanine-tRNA ligase.The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [, ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].
Protein Domain
Type: Family
Description: The UGA (TGA) codon is normally a termination codon, however it is also used as a selenocysteine (Sec) codon by numerous organisms []. Sec is the 21st amino acid that is inserted into selenoproteins (protein that includes a selenocysteine (Se-Cys) amino acid residue). The synthesis of Sec and its incorporation into proteins requires the activity of a number of proteins, one of which is selenophosphate synthetase (SPS), also known as the SelD gene product [, ]. SPS catalises the production of the selenium donor compound monoselenophosphate (MSP) from selenide and ATP. MSP is then used to synthesize Sec from seryl-tRNAs []. SPS was initially identified in E. coli as the product of the gene selD, one of four essential selenoprotein synthesis genes (selA-D) []. SelC is the tRNA itself, SelD acts as a donor of reduced selenium, SelA modifies a serine residue on SelC into selenocysteine, and SelB is a selenocysteine-specific translation elongation factor. 3' or 5' non-coding elements of mRNA have been found as probable structures for directing selenocysteine incorporation. Later, the selD homologues from eukaryotes, bacteria, and archaea were identified [].In mammals, two gene products, SPS1 and SPS2 are proposed to be selenophosphate synthetases. SPS1 may be involved in Sec recycling via a selenium salvage pathway, whereas SPS2 may play a role in the synthesis of selenophosphate []. SPS2 is a selenoprotein and could serve as an autoregulator of selenoprotein synthesis []. Drosophila SPS1 (UniProt: O18373) lacks selenide-dependent SPS activity due to an arginine substitution of the critical Cys (or Sec) residue in the catalytic domain of the enzyme when expressed in E. coli []. Drosophila SPS2 (also known as Dsps2) is a selenoprotein that contains a UGA stop codon in the catalytic centre of the enzyme, nevertheless, the read-through activity can be provided by a mammalian-like SECIS element in its 3'UTR [].
Protein Domain
Type: Family
Description: Pyridoxal phosphate is the active form of vitamin B6 (pyridoxine or pyridoxal). Pyridoxal 5'-phosphate (PLP) is a versatile catalyst, acting as a coenzyme in a multitude of reactions, including decarboxylation, deamination and transamination [, , ]. PLP-dependent enzymes are primarily involved in the biosynthesis of amino acids and amino acid-derived metabolites, but they are also found in the biosynthetic pathways of amino sugars and in the synthesis or catabolism of neurotransmitters; pyridoxal phosphate can also inhibit DNA polymerases and several steroid receptors []. Inadequate levels of pyridoxal phosphate in the brain can cause neurological dysfunction, particularly epilepsy [].PLP enzymes exist in their resting state as a Schiff base, the aldehyde group of PLP forming a linkage with the ε-amino group of an active site lysine residue on the enzyme. The α-amino group of the substrate displaces the lysine ε-amino group, in the process forming a new aldimine with the substrate. This aldimine is the common central intermediate for all PLP-catalysed reactions, enzymatic and non-enzymatic [].In Escherichia coli, the pdx genes involved in vitamin B6 have been characterised [, , ]. This entry represents PdxJ (also called PNP synthase), which catalyses the condensation of 4-hydroxy-L-threonine and 1-deoxy-D-xylulose-5-phosphate to form pyridoxine-5'-phosphate (PNP) []. The product of the PdxJ reaction is then oxidized by PdxH to form pyridoxal 5'-phosphate (PLP).PNP synthase (PdxJ) adopts a TIM barrel topology. Intersubunit contacts are mediated by three ''extra'' helices, generating a tetramer of symmetric dimers with shared active sites. The open state has been proposed to accept substrates and to release products, while most of the catalytic events are likely to occur in the closed state. A hydrophilic channel running through the centre of the barrel was identified as the essential structural feature that enables PNP synthase to release water molecules produced during the reaction from the closed, solvent-shielded active site [].
Protein Domain
Type: Family
Description: There are two distinct classes of hydroxymethylglutaryl-coenzyme A (HMG-CoA) reductase enzymes: class I consists of eukaryotic and most archaeal enzymes (), while class II consists of prokaryotic enzymes () [, ].Class I HMG-CoA reductases catalyse the NADP-dependent synthesis of mevalonate from 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA). In vertebrates, membrane-bound HMG-CoA reductase is the rate-limiting enzyme in the biosynthesis of cholesterol and other isoprenoids. In plants, mevalonate is the precursor of all isoprenoid compounds []. The reduction of HMG-CoA to mevalonate is regulated by feedback inhibition by sterols and non-sterol metabolites derived from mevalonate, including cholesterol. In archaea, HMG-CoA reductase is a cytoplasmic enzyme involved in the biosynthesis of the isoprenoids side chains of lipids []. Class I HMG-CoA reductases consist of an N-terminal membrane domain (lacking in archaeal enzymes), and a C-terminal catalytic region. The catalytic region can be subdivided into three domains: an N-domain (N-terminal), a large L-domain, and a small S-domain (inserted within the L-domain). The L-domain binds the substrate, while the S-domain binds NADP.Class II HMG-CoA reductases catalyse the reverse reaction of class I enzymes, namely the NAD-dependent synthesis of HMG-CoA from mevalonate and CoA []. Some bacteria, such as Pseudomonas mevalonii, can use mevalonate as the sole carbon source. Class II enzymes lack a membrane domain. Their catalytic region is structurally related to that of class I enzymes, but it consists of only two domains: a large L-domain and a small S-domain (inserted within the L-domain). As with class I enzymes, the L-domain binds substrate, but the S-domain binds NAD (instead of NADP in class I).This entry represents class II HMG-CoA reductases, as well as some class I enzymes from archaea. This family was built from two class II NAD-dependent enzymes from organisms closely related to Pseudomonas mevalonii, a bacterium that can use mevalonate as its sole carbon source. Some archaeal HMG-CoA reductases were found to be of bacterial origin []. This family is occasionally found together with a thiolase () to form a putative bifunctional acetyl-CoA acetyltransferase/HMG-CoA reductase protein [].
Protein Domain
Type: Family
Description: This entry represents a group of LIM domain containing proteins, among which some proteins have been shown to bind actin, such as LIMA1/MICAL1/MICAL3 from humans. LIM domain and actin-binding protein 1 (LIMA1, also know as EPLIN) is a cytoskeleton-associated protein that regulates actin dynamics by cross-linking and stabilising filaments []. It was first identified as the product of a gene that is transcriptionally down-regulated or lost in a number of human epithelial tumor cells [, ]. In humans, there are two EPLIN isoforms, EPLIN alpha and EPLIN beta, both have a centrally located LIM domain that may mediate self-dimerisation. EPLIN inhibits Arp2/3 complex-mediated branching nucleation of actin filaments and stabilises actin filament networks []. EPLIN can be regulated through phophoryltion by extracellular signal-regulated kinase (ERK) [].MICAL (molecule Interacting with CasL) family is a group of multifunctional proteins that contain the calponin homology (CH), a LIM and a coiled-coil (CC) domains []. They interact with receptors on the target cells, help recruiting other proteins, and promote the modulation of their activity with respect to the downstream events []. There is only one MICAL protein found in Drosophila [], while there are 5 MICAL (MICAL1/2/3, MICAL-like1/2) isoforms found in vertebrates []. Drosophila MICAL and vertebrate MICAL1/2/3 contain an extra N-terminal FAD (flavin adenine dinucleotide binding monooxygenase) domain, whose structure resembles that of a flavo-enzyme, p-hydroxybenzoate hydroxylase []. Drosophila MICAL has an NADPH-dependent actin depolymerising activity []. Vertebrate MICALs are also shown to be effectors of small Rab GTPases, which play important roles in vesicular trafficking []. MICALs play roles in neural development and plasticity [].
Protein Domain
Type: Family
Description: Cytochrome c oxidase () is a key enzyme in aerobic metabolism. Proton pumping haem-copper oxidases represent the terminal, energy-transfer enzymes of respiratory chains in prokaryotes and eukaryotes. The CuB-haem a3 (or haem o) binuclear centre, associated with the largest subunit I of cytochrome c and ubiquinol oxidases (), is directly involved in the coupling between dioxygen reduction and proton pumping [, ].Some terminal oxidases generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner membrane (eukaryotes).The enzyme complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) of which only the catalytic subunit (equivalent to mammalian subunit I (CO I) is found in all haem-copper respiratory oxidases. The presence of a bimetallic centre (formed by a high-spin haem and copper B) as well as a low-spin haem, both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common to all family members [, , ]. In contrast to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The enzyme complexes vary in haem and copper composition, substrate type and substrate affinity. The different respiratory oxidases allow the cells to customize their respiratory systems according to a variety of environmental growth conditions []. It has been shown that eubacterial quinol oxidase was derived from cytochrome c oxidase in Gram-positive bacteria and that archaebacterial quinol oxidase has an independent origin. A considerable amount of evidence suggests that proteobacteria (Purple bacteria) acquired quinol oxidase through a lateral gene transfer from Gram-positive bacteria [].Please note, this entry also identifies a number of proteins that are cleaved into two chains - a truncated non-functional cytochrome oxidase 1 and an intron-encoded endonuclease.
Protein Domain
Type: Family
Description: The sirtuin (also known as Sir2) family is broadly conserved from bacteria to human. Yeast Sir2 (silent mating-type information regulation 2),the founding member, was first isolated as part of the SIR complex required for maintaining a modified chromatin structure at telomeres. Sir2 functionsin transcriptional silencing, cell cycle progression, and chromosome stability []. Although most sirtuins in eukaryotic cells are located in the nucleus, others are cytoplasmic or mitochondrial.This family is divided into five classes (I-IV and U) on the basis of a phylogenetic analysis of 60 sirtuins from a wide array of organisms []. Class I and class IV are further divided into three and two subgroups, respectively. The U-class sirtuins are found only in Gram-positive bacteria []. The S. cerevisiae genome encodes five sirtuins, Sir2 and four additional proteins termed 'homologues of sir two' (Hst1p-Hst4p) []. The human genome encodes seven sirtuins, with representatives from classes I-IV [, ].Sirtuins are responsible for a newly classified chemical reaction, NAD-dependent protein deacetylation. The final products of the reaction are thedeacetylated peptide and an acetyl ADP-ribose []. In nuclear sirtuins this deacetylation reaction is mainly directed against histones acetylated lysines [].Sirtuins typically consist of two optional and highly variable N- and C-terminal domain (50-300 aa) and a conserved catalytic core domain (~250 aa). Mutagenesis experiments suggest that the N- and C-terminal regions help direct catalytic core domain to different targets [, ].The 3D-structure of an archaeal sirtuin in complex with NAD reveals that the protein consists of a large domain having a Rossmann fold and a small domain containing a three-stranded zinc ribbon motif. NAD is bound in a pocket between the two domains [].This entry represents the class U sirtuins.
Protein Domain
Type: Family
Description: The sirtuin (also known as Sir2) family is broadly conserved from bacteria to human. Yeast Sir2 (silent mating-type information regulation 2),the founding member, was first isolated as part of the SIR complex required for maintaining a modified chromatin structure at telomeres. Sir2 functionsin transcriptional silencing, cell cycle progression, and chromosome stability []. Although most sirtuins in eukaryotic cells are located in the nucleus, others are cytoplasmic or mitochondrial.This family is divided into five classes (I-IV and U) on the basis of a phylogenetic analysis of 60 sirtuins from a wide array of organisms []. Class I and class IV are further divided into three and two subgroups, respectively. The U-class sirtuins are found only in Gram-positive bacteria []. The S. cerevisiae genome encodes five sirtuins, Sir2 and four additional proteins termed 'homologues of sir two' (Hst1p-Hst4p) []. The human genome encodes seven sirtuins, with representatives from classes I-IV [, ].Sirtuins are responsible for a newly classified chemical reaction, NAD-dependent protein deacetylation. The final products of the reaction are thedeacetylated peptide and an acetyl ADP-ribose []. In nuclear sirtuins this deacetylation reaction is mainly directed against histones acetylated lysines [].Sirtuins typically consist of two optional and highly variable N- and C-terminal domain (50-300 aa) and a conserved catalytic core domain (~250 aa). Mutagenesis experiments suggest that the N- and C-terminal regions help direct catalytic core domain to different targets [, ].The 3D-structure of an archaeal sirtuin in complex with NAD reveals that the protein consists of a large domain having a Rossmann fold and a small domain containing a three-stranded zinc ribbon motif. NAD is bound in a pocket between the two domains [].
Protein Domain
Type: Family
Description: Transcription factors of the T-box family are required both for early cell-fate decisions, such as those necessary for formation of the basic vertebrate body plan, for differentiation and organogenesis []and also have been associated to multiple aspects of development and in adult terminal cell-type differentiation in different animal lineages []. The T-box is defined as the minimal region within the T-box protein that is both necessary and sufficient for sequence-specific DNA binding, all members of the family so far examined bind to the DNA consensus sequence TCACACCT and function as transcriptional repressors and/or activators []. The T-box is a relatively large DNA-binding domain, generally comprising about a third of the entire protein (17-26kDa) [].These genes were uncovered on the basis of similarity to the DNA binding domain []of Mus musculus (Mouse) Brachyury (T) gene product, which similarity is the defining feature of the family. The Brachyury gene is named for its phenotype, which was identified 70 years ago as a mutant mouse strain with a short blunted tail. The gene, and its paralogues, have become a well-studied model for the family, and hence much of what is known about the T-box family is derived from the murine Brachyury gene.Consistent with its nuclear location, Brachyury protein has a sequence-specific DNA-binding activity and can act as a transcriptional regulator []. Homozygous mutants for the gene undergo extensive developmental anomalies, thus rendering the mutation lethal []. The postulated role of Brachyury is as a transcription factor, regulating the specification and differentiation of posterior mesoderm during gastrulation in a dose-dependent manner [].T-box proteins tend to be expressed in specific organs or cell types, especially during development, and they are generally required for the development of those tissues, for example, Brachyury is expressed in posterior mesoderm and in the developing notochord, and it is required for the formation of these cells in mice []. The T-box family is an ancient group that appears to play a critical role in development in all animal species [].
Protein Domain
Type: Family
Description: Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds [, ]. An empirical classification into three classes has been proposed by Fowler and coworkers []and Kojima []. Members of class I are defined to include polypeptides related in the positions of their cysteines to equine MT-1B, and include mammalian MTs as well as from crustaceans and molluscs. Class II groups MTs from a variety of species, including sea urchins,fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units [].This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Subsequently, a new classification was proposed on the basis of sequence similarity derived from phylogenetic relationships, which basically proposes an MT family for each main taxonomic group of organisms []. Crustacean MTs belong to family 3. They are small proteins, with 18 totally conserved cysteines. The members of this family are recognised by the sequence pattern P-[GD]-P-C-C-x(3,4)-C-x-C located at the Nterm. The taxonomic range of the members extends to crustaceans. Known characteristics of this family are: 58 to 60 AAs; variants exist with and without the N-terminal Met. Protein sequence is divided into two structural domains, containing each 9 Cys binding 3 bivalent metal ions. Family 3 includes subfamilies: c1, c2, c. All sequences are very similar. c1 and c2 are forming two distinct monophyletic groups in the AA phylogenetic tree. c are crustacean MTs different from c1 and c2 based on phylogenetic analyses.
Protein Domain
Type: Domain
Description: DNA carries the biological information that instructs cells how to existin an ordered fashion: accurate replication is thus one of the mostimportant events in the cell life cycle. This function is mediated byDNA-directed DNA-polymerases, which add nucleotide triphosphate (dNTP)residues to the 5'-end of the growing DNA chain, using a complementary DNA as template. Small RNA molecules are generally used as primers forchain elongation, although terminal proteins may also be used. Three motifs, A, B and C [], are seen to be conserved across all DNA-polymerases, with motifs A and C also seen in RNA- polymerases. They are centred on invariant residues, and their structural significance was implied from the Klenow (Escherichia coli) structure: motif A contains a strictly-conserved aspartate at the junction of a β-strand and an α-helix; motif B contains an α-helix with positive charges; and motif C has a doublet of negative charges, located in a β-turn-beta secondary structure [].DNA polymerases () can be classified, on the basis of sequencesimilarity [, ], into at least four different groups: A, B, C and X. X family polymerases fill in short gaps during DNA repair, and are small (about 40kDa) compared with other polymerases. They are relatively inaccurate enzymes and play roles in base excision repair, in non-homologous end joining (NHEJ) which acts mainly to repair damage due to ionizing radiation, and in V(D)J recombination [, ]. X family polymerases include eukaryotic Pol beta, Pol lambda, Pol mu and terminal deoxynucleotidyl-transferase (TdT) (). Pol beta and Pol lambda are primarily DNA template-dependent polymerases, whereas TdT is a DNA template-independent polymerase []. Pol mu has both template dependent and template independent activities []. These enzymes catalyse addition of nucleotides in a distributive manner, i.e. they dissociate from the template-primer after addition of each nucleotide.DNA-polymerases show a degree of structural similarity with RNA-polymerases.This domain is found either at the extreme N or C termini of DNA polymerase X proteins.
Protein Domain
Type: Family
Description: Amidase signature (AS) enzymes are a large group of hydrolytic enzymes that contain a conserved stretch of approximately 130 amino acids known as the AS sequence. They are widespread, being found in both prokaryotes and eukaryotes. AS enzymes catalyse the hydrolysis of amide bonds (CO-NH2), although the family has diverged widely with regard to substrate specificity and function. Nonetheless, these enzymes maintain a core alpha/beta/alpha structure, where the topologies of the N- and C-terminal halves are similar. AS enzymes characteristically have a highly conserved C-terminal region rich in serine and glycine residues, but devoid of aspartic acid and histidine residues, therefore they differ from classical serine hydrolases. These enzymes posses a unique, highly conserved Ser-Ser-Lys catalytic triad used for amide hydrolysis, although the catalytic mechanism for acyl-enzyme intermediate formation can differ between enzymes [].Examples of AS enzymes include:Peptide amidase (Pam) [], which catalyses the hydrolysis of the C-terminal amide bond of peptides.Fatty acid amide hydrolases [], which hydrolyse fatty acid amid substrates (e.g. cannabinoid anandamide and sleep-inducing oleamide), thereby controlling the level and duration of signalling induced by this diverse class of lipid transmitters.Malonamidase E2 [], which catalyses the hydrolysis of malonamate into malonate and ammonia, and which is involved in the transport of fixed nitrogen from bacteroids to plant cells in symbiotic nitrogen metabolism.Subunit A of Glu-tRNA(Gln) amidotransferase [],a heterotrimeric enzyme that catalyses the formation of Gln-tRNA(Gln) by the transamidation of misacylated Glu-tRNA(Gln) via amidolysis of glutamine.This family refers to 1-carboxybiuret hydrolase subunit AtzE, which hydrolyzes 1-carboxybiuret to urea-1,3-dicarboxylate and NH3. As 1-carboxybiuret hydrolyzes spontaneously to biuret, AtzE was previously thought to use biuret as substrate, but this has since been corrected [].
Protein Domain
Type: Family
Description: This entry represents the 3-amino-5-hydroxybenzoic acid synthase family (AHBA_syn) that are probably all pyridoxal-phosphate-dependent aminotransferase enzymes with a variety of molecular functions. Members of the family have the same structural fold as members of the pyridoxal phosphate (PLP)-dependent aspartate aminotransferase superfamily []. The AHBA_synfamily includes are involved in various biosynthetic pathways for secondary metabolites. The AHBA_synfamily includes StsA , StsC and StsS []. The aminotransferase activity was demonstrated for purified StsC protein as the L-glutamine:scyllo-inosose aminotransferase , which catalyses the first amino transfer in the biosynthesis of the streptidine subunit of streptomycin []. Some other well studied proteins in this family are AHBA_synthase, the protein product of the pleiotropic regulatory gene degT, Arnb aminotransferase and pilin glycosylation protein. The prototype of this family, the AHBA_synthase, is a dimeric PLP dependent enzyme. AHBA_syn is the terminal enzyme of 3-amino-5-hydroxybenzoic acid (AHBA) formation which is involved in the biosynthesis of ansamycin antibiotics, including rifamycin B. Some members of this family are involved in 4-amino-6-deoxy-monosaccharide D-perosamine synthesis. Perosamine is an important element in the glycosylation of several cell products, such as antibiotics and lipopolysaccharides of Gram-positive and Gram-negative bacteria. The pilin glycosylation protein encoded by gene pglA, is a galactosyltransferase involved in pilin glycosylation. Additionally, this family consists of ArnB (PmrH) aminotransferase, a 4-amino-4-deoxy-L-arabinose lipopolysaccharide-modifying enzyme. This family also includes several predicted pyridoxal phosphate-dependent enzymes apparently involved in regulation of cell wall biogenesis. The catalytic lysine which is present in all characterized PLP dependent enzymes is replaced by histidine in some members of this family [, , , , , , , , , , , , ].
Protein Domain
Type: Family
Description: The bacterial core RNA polymerase complex, which consists of five subunits, is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme []. RNA polymerase recruits alternative sigma factors as a means of switching on specific regulons. Most bacteria express a multiplicity of sigma factors. Two of these factors, sigma-70 (gene rpoD), generally known as the major or primary sigma factor, and sigma-54 (gene rpoN or ntrA) direct the transcription of a wide variety of genes. The other sigma factors, known as alternative sigma factors, are required for the transcription of specific subsets of genes.With regard to sequence similarity, sigma factors can be grouped into two classes, the sigma-54 and sigma-70 families. Sequence alignments of the sigma70 family members reveal four conserved regions that can be further divided into subregions eg. sub-region 2.2, which may be involved in the binding of the sigma factor to the core RNA polymerase; and sub-region 4.2, which seems to harbor a DNA-binding 'helix-turn-helix' motif involved in binding the conserved -35 region of promoters recognised by the major sigma factors [, ]. The plastids of higher plants originating from an ancestral cyanobacterial endosymbiont also contain sigma factors that are encoded by a small family of nuclear genes. All plastid sigma factors belong to the superfamily of sigmaA/sigma70 and have sequences homologous to the conserved regions 1.2, 2, 3, and 4 of bacterial sigma factors [].This entry represents the transcription factor Sigma-I. This protein is found in endospore-forming species in the Firmicutes lineage of bacteria, such as Bacillus subtilis, but is not universally present among such species. Sigma-I was shown to be induced by heat shock [, ]in B. subtilis and is suggested by its phylogenetic profile to be connected to the program of sporulation [].
Protein Domain
Type: Family
Description: Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.
Protein Domain
Type: Family
Description: Amyloid-beta precursor protein, also known as amyloid beta A4 protein (APP or A4), consists of a large N-terminal extracellular region containing heparin-binding and copper-binding sites, Kunitz domain, E2 domain, a short hydrophobic transmembrane domain, and a short C-terminal intracellular domain. The N-terminal region is similar in structure to cysteine-rich growth factors and appears to function as a cell surface receptor, contributing to neurite growth, neuronal adhesion, axonogenesis and cell mobility []. There are several alternative splicing isoforms of APP in humans. Two of the main isoforms, amyloid-β40 (Abeta40) and amyloid-β42 (Abeta42), are found predominantly in the extracellular brain deposits associated with Alzheimer's disease (AD) []. The ratio of Abeta42 to Abeta40 affects the pathogenesis of AD []. The Abeta peptide is mostly unstructured, and through molecular dynamics simulations, confirmed by amyloid-oligomer-specific antibodies, was revealed that Abeta monomer acquires the atypical alpha-sheet secondary structure that adopts an alpha-strand structure which proceeds to an alpha-sheet between adjacent alpha-strands in oligomers with opposite charges on both edges, inducing self-assembly/aggregation to form soluble oligomeric amyloid protofibrils and finally, insoluble highly ordered amyloid fibrils with a cross β-sheet structure [, , ].APP can be processed by different sets of enzymes:In the non-amyloidogenic (non-plaque-forming) pathway, APP is cleaved by alpha-secretase to yield a soluble N-terminal sAPP-alpha (neuroprotective) and a membrane-bound CTF-alpha. CTF-alpha is broken-down by presenilin-containing gamma-secretase to yield soluble p3 and membrane-bound AICD (nuclear signalling). In the amyloidogenic pathway (plaque-forming), APP is broken down by beta-secretase to yield soluble sAPP-beta and membrane-bound CTF-beta. CTF-beta is broken down by gamma-secretase to yield soluble amyloid-beta and membrane-bound AICD. Amyloid-beta is required for neuronal function, but can aggregate to form amyloid plaques that seem to disrupt brain cells by clogging points of cell-cell contact.
Protein Domain
Type: Domain
Description: This entry represents the complete catalytic core domain of sirtuin proteins.The sirtuin (also known as Sir2) family is broadly conserved from bacteria to human. Yeast Sir2 (silent mating-type information regulation 2),the founding member, was first isolated as part of the SIR complex required for maintaining a modified chromatin structure at telomeres. Sir2 functionsin transcriptional silencing, cell cycle progression, and chromosome stability []. Although most sirtuins in eukaryotic cells are located in the nucleus, others are cytoplasmic or mitochondrial.This family is divided into five classes (I-IV and U) on the basis of a phylogenetic analysis of 60 sirtuins from a wide array of organisms []. Class I and class IV are further divided into three and two subgroups, respectively. The U-class sirtuins are found only in Gram-positive bacteria []. The S. cerevisiae genome encodes five sirtuins, Sir2 and four additional proteins termed 'homologues of sir two' (Hst1p-Hst4p) []. The human genome encodes seven sirtuins, with representatives from classes I-IV [, ].Sirtuins are responsible for a newly classified chemical reaction, NAD-dependent protein deacetylation. The final products of the reaction are thedeacetylated peptide and an acetyl ADP-ribose []. In nuclear sirtuins this deacetylation reaction is mainly directed against histones acetylated lysines [].Sirtuinstypically consist of two optional and highly variable N- and C-terminal domain (50-300 aa) and a conserved catalytic core domain (~250 aa). Mutagenesis experiments suggest that the N- and C-terminal regions help direct catalytic core domain to different targets [, ].The 3D-structure of an archaeal sirtuin in complex with NAD reveals that the protein consists of a large domain having a Rossmann fold and a small domain containing a three-stranded zinc ribbon motif. NAD is bound in a pocket between the two domains [].
Protein Domain
Type: Family
Description: The sirtuin (also known as Sir2) family is broadly conserved from bacteria to human. Yeast Sir2 (silent mating-type information regulation 2),the founding member, was first isolated as part of the SIR complex required for maintaining a modified chromatin structure at telomeres. Sir2 functionsin transcriptional silencing, cell cycle progression, and chromosome stability []. Although most sirtuins in eukaryotic cells are located in the nucleus, others are cytoplasmic or mitochondrial.This family is divided into five classes (I-IV and U) on the basis of a phylogenetic analysis of 60 sirtuins from a wide array of organisms []. Class I and class IV are further divided into three and two subgroups, respectively. The U-class sirtuins are found only in Gram-positive bacteria []. The S. cerevisiae genome encodes five sirtuins, Sir2 and four additional proteins termed 'homologues of sir two' (Hst1p-Hst4p) []. The human genome encodes seven sirtuins, with representatives from classes I-IV [, ].Sirtuins are responsible for a newly classified chemical reaction, NAD-dependent protein deacetylation. The final products of the reaction are thedeacetylated peptide and an acetyl ADP-ribose []. In nuclear sirtuins this deacetylation reaction is mainly directed against histones acetylated lysines [].Sirtuins typically consist of two optional and highly variable N- and C-terminal domain (50-300 aa) and a conserved catalytic core domain (~250 aa). Mutagenesis experiments suggest that the N- and C-terminal regions help direct catalytic core domain to different targets [, ].The 3D-structure of an archaeal sirtuin in complex with NAD reveals that the protein consists of a large domain having a Rossmann fold and a small domain containing a three-stranded zinc ribbon motif. NAD is bound in a pocket between the two domains [].
Protein Domain
Type: Homologous_superfamily
Description: Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.This superfamily represents the core domain of the cholide ion channel.
Protein Domain
Type: Family
Description: Neurotransmitter transport systems are integral to the release, re-uptake and recycling of neurotransmitters at synapses. High affinity transport proteins found in the plasma membrane of presynaptic nerve terminals and glial cells are responsible for the removal from the extracellular space of released-transmitters, thereby terminating their actions []. Plasma membrane neurotransmitter transporters fall into two structurally and mechanistically distinct families. The majority of the transporters constitute an extensive family of homologous proteins that derive energy from the co-transport of Na+and Cl-, in order to transport neurotransmitter molecules into the cell against their concentration gradient. The family has a common structure of 12 presumed transmembrane helices and includes carriers for gamma-aminobutyric acid (GABA), noradrenaline/adrenaline, dopamine, serotonin, proline, glycine, choline, betaine and taurine. They are structurally distinct from the second more-restricted family of plasma membrane transporters, which are responsible for excitatory amino acid transport. The latter couple glutamate and aspartate uptake to the cotransport of Na+and the counter-transport of K+, with no apparent dependence on Cl-[]. In addition, both of these transporter families are distinct from the vesicular neurotransmitter transporters [, ].GABA is the major inhibitory transmitter in the mammalian brain, and is widely distributed throughout the nervous system. Molecular cloning studies have resulted in the cloning of three Na+and Cl--coupled GABA transporters (known as GAT-1, GAT-2, GAT-3) and a betaine/GABA transporter (BGT-1). Each transporter shows varying affinities for GABA, different substrate and blocker pharmacologies, and different tissue localisation []. Brain regions containing GAT-3 mRNA transcripts include the retina, olfactory bulb, subfornical organ, hypothalamus, midline thalamus and brainstem. GAT-3 mRNA was found to be absent from the neocortex and cerebellar cortex, and very weak in the hippocampus []. Furthermore, immunocytological studies have demonstrated that this transporter may be localised solely to glial (non-neuronal) cells, suggesting that glial GABA uptake may function to limit the spread of GABA from the synapse, as well as to regulate overall GABA levels.
Protein Domain
Type: Family
Description: Neurotransmitter transport systems are integral to the release, re-uptake and recycling of neurotransmitters at synapses. High affinity transport proteins found in the plasma membrane of presynaptic nerve terminals and glial cells are responsible for the removal from the extracellular space of released-transmitters, thereby terminating their actions []. Plasma membrane neurotransmitter transporters fall into two structurally and mechanistically distinct families. The majority of the transporters constitute an extensive family of homologous proteins that derive energy from the co-transport of Na+and Cl-, in order to transport neurotransmitter molecules into the cell against their concentration gradient. The family has a common structure of 12 presumed transmembrane helices and includes carriers for gamma-aminobutyric acid (GABA), noradrenaline/adrenaline, dopamine, serotonin, proline, glycine, choline, betaine and taurine. They are structurally distinct from the second more-restricted family of plasma membrane transporters, which are responsible for excitatory amino acid transport. The latter couple glutamate and aspartate uptake to the cotransport of Na+and the counter-transport of K+, with no apparent dependence on Cl-[]. In addition, both of these transporter families are distinct from the vesicular neurotransmitter transporters [, ].A Na+and Cl--coupled creatine transporter has been cloned from human androdent tissues. Initially it was mistaken for a choline transporter [, ].The creatine transporter species homologues are near identical (98% identityhuman vs. rat and rabbit) and they are most closely related to thetransporters reported for taurine, GABA and betaine. Northern blot analysis ofcreatine transporter distribution reveals that the highest levels of mRNAexpression are in: skeletal muscle, kidney and heart, with lower levels inbrain and other tissues. Within the brain, the highest levels were detectedin the cerebellum and hippocampus. This expression pattern correlates wellwith those tissues known to possess a high creatine uptake capacity [].
Protein Domain
Type: Homologous_superfamily
Description: Pyridoxal phosphate is the active form of vitamin B6 (pyridoxine or pyridoxal). Pyridoxal 5'-phosphate (PLP) is a versatile catalyst, acting as a coenzyme in a multitude of reactions, including decarboxylation, deamination and transamination [, , ]. PLP-dependent enzymes are primarily involved in the biosynthesis of amino acids and amino acid-derived metabolites, but they are also found in the biosynthetic pathways of amino sugars and in the synthesis or catabolism of neurotransmitters; pyridoxal phosphate can also inhibit DNA polymerases and several steroid receptors []. Inadequate levels of pyridoxal phosphate in the brain can cause neurological dysfunction, particularly epilepsy [].PLP enzymes exist in their resting state as a Schiff base, the aldehyde group of PLP forming a linkage with the ε-amino group of an active site lysine residue on the enzyme. The α-amino group of the substrate displaces the lysine ε-amino group, in the process forming a new aldimine with the substrate. This aldimine is the common central intermediate for all PLP-catalysed reactions, enzymatic and non-enzymatic [].In Escherichia coli, the pdx genes involved in vitamin B6 have been characterised [, , ]. This entry represents PdxJ (also called PNP synthase), which catalyses the condensation of 4-hydroxy-L-threonine and 1-deoxy-D-xylulose-5-phosphate to form pyridoxine-5'-phosphate (PNP) []. The product of the PdxJ reaction is then oxidized by PdxH to form pyridoxal 5'-phosphate (PLP).PNP synthase (PdxJ) adopts a TIM barrel topology. Intersubunit contacts are mediated by three ''extra'' helices, generating a tetramer of symmetric dimers with shared active sites. The open state has been proposed to accept substrates and to release products, while most of the catalytic events are likely to occur in the closed state. A hydrophilic channel running through the centre of the barrel was identified as the essential structural feature that enables PNP synthase to release water molecules produced during the reaction from the closed, solvent-shielded active site [].
Protein Domain
Type: Domain
Description: The ~200 amino acid TBC/rab GTPase-activating protein (GAP) domain is well conserved across species and has been found in a wide range of different proteins from plant adhesion molecules to mammalian oncogenes. The name TBC derives from the name of the murine protein Tbc1 in which this domain was first identified based on its similarity to sequences in the tre-2 oncogene, and the yeast regulators of mitosis, BUB2 and cdc16 []. The connection of this domain with rab GTPase activation stems from subsequent in-depth sequence analyses and alignments []and recent work demonstrating that it appears to contain the catalytic activities of the yeast rab GAPs, GYP1, and GYP7 [].The TBC/rab GAP domain has also been named PTM after three proteins known to contain it: the Drosophila pollux, the human oncoprotein TRE17 (oncoTRE17), and a myeloid cell line-expressed protein []. The TBC/rab GAP domain contains six conserved motifs named A to F []. A conserved arginine residue in the sequence motif B has been shown to be critical for the full GAP activity []. Resolution of the 3D structure of the TBC/rab GAP domain of GYP1 has shown that it is a fully α-helical V-shaped molecule. The conserved arginine residue is positioned at the side of the narrow cleft on the concave site of the V-shaped molecule. It has been proposed that this cleft is the binding site for the GTPase. The conserved arginine residue probably functions as a catalytic arginine finger analogous to that seen in ras and Rho-GAPs. The two key features of the arginine finger activation mechanism appear to be (i) the positioning of the catalytically essential GTPase glutamine side chain via a hydrogen bonding interaction between the glutamine carbamoyl-NH2 group and the main chain carbonyl group of the GAP arginine, and (ii) the polarization of the gamma-phosphate group or the stabilization of charge on it via the interaction of the positively charged side chain guanidinoyl group of the GAP arginine [].
Protein Domain
Type: Domain
Description: Protein kinase C (PKC) is a member of a family of Ser/Thr phosphotransferases that are involved in many cellular signaling pathways. Fungi have only one or two PKCs in contrast to mammals, which have at least 9 []. Saccharomyces cerevisiae contains a single PKC isozyme, Pkc1p, which contains all of the regulatory motifs found in mammalian PKCs []. In addition to its main function in maintaining cell integrity, fungi PKCs have been implicated in the regulation of diverse processes such as the organization of the actin cytoskeleton, autophagy and apoptosis, cell cycle control, cytokinesis and genetic stability [, ]. PKC has two antiparallel coiled-coiled regions (ACC finger domain) known as HR1 (PKC homology region 1/ Rho binding domain) upstream of the C2 domain and two C1 domains downstream.The C2 domain was first identified in PKC. C2 domains fold into an 8-standed β-sandwich that can adopt 2 structural arrangements: Type I and Type II, distinguished by a circular permutation involving their N- and C-terminal beta strands. Many C2 domains, like those of PKC, are Ca2+-dependent membrane-targeting modules that bind a wide variety of substances including bind phospholipids, inositol polyphosphates, and intracellular proteins. Most C2 domain proteins are either signal transduction enzymes that contain a single C2 domain, such as protein kinase C, or membrane trafficking proteins which contain at least two C2 domains, such as synaptotagmin 1. However, there are a few exceptions to this including RIM isoforms and some splice variants of piccolo/aczonin and intersectin which only have a single C2 domain. C2 domains with a calcium binding region have negatively charged residues, primarily aspartates, that serve as ligands for calcium ions [, , , , ].This entry represents the C2 domain of fungal PKC-like proteins.
Protein Domain
Type: Domain
Description: PKN is a lipid-activated serine/threonine kinase. It is a member of the protein kinase C (PKC) superfamily, but lacks a C1 domain. There are at least 3 different isoforms of PKN (PRK1/PKNalpha/PAK1; PKNbeta, and PRK2/PAK2/PKNgamma). The C-terminal region contains the Ser/Thr type protein kinase domain, while the N-terminal region of PKN contains three antiparallel coiled-coil (ACC) finger domains which are relatively rich in charged residues and contain a leucine zipper-like sequence. These domains binds to the small GTPase RhoA. Following these domains is a C2-like domain. Its C-terminal part functions as an auto-inhibitory region. PKNs are not activated by classical PKC activators such as diacylglycerol, phorbol ester or Ca2+, but instead are activated by phospholipids and unsaturated fatty acids [].The C2 domain was first identified in PKC. C2 domains fold into an 8-standed β-sandwich that can adopt 2 structural arrangements: Type I and Type II, distinguished by a circular permutation involving their N- and C-terminal beta strands. Many C2 domains are Ca2+-dependent membrane-targeting modules that bind a wide variety of substances including bind phospholipids, inositol polyphosphates, and intracellular proteins. Most C2 domain proteins are either signal transduction enzymes that contain a single C2 domain, such as protein kinase C, or membrane trafficking proteins which contain at least two C2 domains, such as synaptotagmin 1. However, there are a few exceptions to this including RIM isoforms and some splice variants of piccolo/aczonin and intersectin which only have a single C2 domain. C2 domains with a calcium binding region have negatively charged residues, primarily aspartates, that serve as ligands for calcium ions [, , ,].
Protein Domain
Type: Family
Description: Tom1 (target of Myb 1) and its related proteins (Tom1L1 and Tom1L2) constitute a protein family and share an N-terminal VHS (Vps27p/Hrs/Stam) domain followed by a GAT (GGA and Tom1) domain.VHS domains are found at the N termini of select proteins involved in intracellular membrane trafficking and are often localized to membranes. The three dimensional structure of human TOM1 VHS domain reveals eight helices arranged in a superhelix. The surface of the domain has two main features: (1) a basic patch on one side due to several conserved positively charged residues on helix 3 and (2) a negatively charged ridge on the opposite side, formed by residues on helix 2 []. The basic patch is thought to mediate membrane binding.It was demonstrated that the GAT domain of both Tom1 and Tom1L1 binds ubiquitin, suggesting that these proteins might participate in the sorting of ubiquitinated proteins into multivesicular bodies (MVB) []. Moreover, Tom1L1 interacts with members of the MVB sorting machinery. Specifically, the VHS domain of Tom1L1 interacts with Hrs (hepatocyte growth factor-regulated tyrosine kinase substrate), whereas a PTAP motif, located between the VHS and GAT domains of Tom1L1, is responsible for binding to TSG101 (tumour susceptibility gene 101). Myc epitope-tagged Tom1L1 is recruited to endosomes following Hrs expression. In addition, Tom1L1 possesses several tyrosine motifs at the C-terminal region that mediate interactions with members of the Src family kinases and other signalling proteins such as Grb2 and p85. Expression of a constitutively active form of Fyn kinase promotes the recruitment of Tom1L1 to enlarged endosomes. It is proposed that Tom1L1 could act as an intermediary between the signalling and degradative pathways [].Over expression of Tom1 suppresses activation of the transcription factors NF-kappaB and AP-1, induced by either IL-1beta or tumour necrosis factor (TNF)-alpha, and the VHS domain of Tom1 is indispensable for this suppressive activity. This suggests that Tom1 is a common negative regulator of signalling pathways induced by IL-1beta and TNF-alpha [].
Protein Domain
Type: Homologous_superfamily
Description: The alternative oxidase (AOX) is an enzyme that forms part of the electron transport chain in mitochondria of different organisms [, ]. Proteins homologous to the mitochondrial oxidase have also been identified in bacterial genomes [, ]. The oxidase provides an alternative route for electrons passing through the electron transport chain to reduce oxygen. However, as several proton-pumping steps are bypassed in this alternative pathway, activation of the oxidase reduces ATP generation. This enzyme was first identified as a distinct oxidase pathway from cytochrome c oxidase as the alternative oxidase is resistant to inhibition by the poison cyanide [].The alternative oxidase (also known as ubiquinol oxidase) is used as a second terminal oxidase in the mitochondria, electrons are transferred directly from reduced ubiquinol to oxygen forming water []. This is not coupled to ATP synthesis and is not inhibited by cyanide, this pathway is a single step process []. In Oryza sativa (rice) the transcript levels of the alternative oxidase are increased by low temperature []. It has been predicted to contain a coupled diiron centre on the basis of aconserved sequence motif consisting of the proposed iron ligands, four Glu and two His residues []. The EPR study of Arabidopsis thaliana (mouse-ear cress) alternative oxidase AOX1a shows that the enzyme contains a hydroxo-bridged mixed-valent Fe(II)/Fe(III) binuclear iron centre []. A catalytic cycle has been proposed that involves a di-iron centre and at least one transient protein-derived radical, most probably an invariant Tyr residue [].The structure of alternative oxidase from Trypanosoma brucei has been solved. The enzyme is a homodimer with the nonhaem di-iron carboxylate active site buried within a four-helix bundle. In the inhibitor-free state, the di-iron carboxylate is ligated by four glutamate residues, but on binding of an inhibitor, a histidine is also induced to act as a ligand. A highly conserved tyrosine is close to the active site and required for activity []. This entry represents proteins with a structure similar to that of alternative oxidase.
Protein Domain
Type: Family
Description: The large (alpha, GltB) subunit of bacterial glutamate synthase (GOGAT) consists of three domains: N-terminal domain (amidotransferase domain ), central (consisting of and the FMN-binding domain ), and C-terminal domain. This family of sequences represent a fusion of the N-terminal (amidotransferase) domain and the C-terminal structural domain.The stand-alone forms of the three domains (and for domains 1 and 2), as well as partial fusions, occur in the archaeal type of GOGAT, where the large subunit is represented by three separate proteins, corresponding to the three domains of the "standard"bacterial enzyme [].Originally, only the ORF encoding the central domain of GOGAT has been recognised and annotated as GltB in archaea, and the rest of the large subunit was thought to be missing, which may lead to some miss-annotations []. This has led to speculations that the archaeal form of the GOGAT large subunit is the ancestral minimum form of the enzyme. Later analysis showed, however, that in all archaea where the large subunit has been found, its entire sequence is represented by three separate ORFs [].Glutamate synthase is a complex iron-sulphur flavoprotein that catalyses the reductive synthesis of L-glutamate from 2-oxoglutarate and L-glutamine via intramolecular channelling of ammonia, a reaction in the bacterial, yeast and plant pathways for ammonia assimilation []. GOGAT is a multifunctional enzyme that performs L-glutamine hydrolysis, conversion of 2-oxoglutarate into L-glutamate, and electron uptake from an electron donor [].There are four classes of GOGAT [, ]: 1. Bacterial NADPH-dependent GOGAT (NADPH-GOGAT, ). This standard bacterial NADPH-GOGAT is composed of a large subunit and a small subunit.2. Ferredoxin-dependent form in cyanobacteria and plants (Fd-GOGAT, ) displays a single-subunit structure corresponding to the large bacterial subunit.3. Pyridine-linked form in both photosynthetic and nonphotosynthetic eukaryotes (eukaryotic GOGAT or NADH-GOGAT, ) displays a single-subunit structure corresponding to the fusion of the small and the large bacterial subunits ().4. The archaeal type with stand-alone proteins corresponding to the N-terminal, FMN-binding, and the C-terminal domains of the large subunit [, ](, ), and to the small subunit.
Protein Domain
Type: Domain
Description: The drosophila Tudor protein, the founder of the Tudor domain family, is encoded by a 'posterior group' gene, which when mutated disrupt normal abdominal segmentation and pole cell formation. Another drosophila gene, homeless, is required for RNA localization during oogenesis. The tudor protein contains multiple repeats of a domain which is also found in homeless [, ].The tudor domain is found in many proteins that colocalise with ribonucleoprotein or single-strand DNA-associated complexes in the nucleus, in the mitochondrial membrane, or at kinetochores. At first it was not clear if the domain binds directly to RNA and ssDNA, or controls interactions with the nucleoprotein complexes but it is now known that this domain recognises and binds to methyl-arginine-lysine residues, playing important roles in diverse epigenetics, gene expression and the regulation of various small RNAs [, , ]. The tudor-containing protein homeless, also contains a zinc finger typical of RNA-binding proteins [].This domain has been implicated in protein-protein interactions in which methylated protein substrates bind to these domains. One example is the Tudor domain of Survival of Motor Neuron (SMN), linked to spinal muscular atrophy, which binds to symmetrically dimethylated arginines of arginine-glycine (RG) rich sequences found in the C-terminal tails of Sm proteins. The resolution of the solution structure of the Tudor domain of human SMN revealed that the Tudor domain forms a strongly bent antiparallel β-sheet with five strands forming a barrel-like fold. The structure exhibits a conserved negatively charged surface that interacts with the C-terminal Arg and Gly-rich tails of the spliceosomal Sm D1 and D3 proteins [, ].
Protein Domain
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups []. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [, , ].Several 7TM receptors have been cloned but their endogenous ligands areunknown; these have been termed orphan receptors. G10d was isolated from arat genomic library and a liver cDNA library []. It is widely distributed,being found in high levels in the lung, liver and adrenal gland, and alsoin the kidney, aorta, heart, spinal cord, gut and testis [].
Protein Domain
Type: Homologous_superfamily
Description: This entry represents the pacifastin domain superfamily.Proteins containing this domain are proteinase inhibitors belonging to MEROPS inhibitor family I19 (clan IW) and sharing a pacifastindomain of ~35 residues, which contains a characteristic pattern of sixconserved cysteine residues (C-x(9,12)-C-N-x-C-x-C-x(2,3)-G-x(3,6)-C-T-x(3)-C). The pacifastin domain consists of a twisted β-sheet composed of threeantiparallel strands and stabilised by an identical pattern (C1-C4, C2-C6,C3-C5) of disulfide bridges [, , , , , ]. Proteins containing this domain were first isolated from Locusta migratoria migratoria(migratory locust). These were HI, LMCI-1 (PMP-D2) and LMCI-2 (PMP-C) [, , ]; five additional members SGPI-1 to 5 were identified in Schistocerca gregaria (desert locust) [, ], and a heterodimeric serine protease inhibitor (pacifastin) was isolated from the hemolymph of Pacifastacus leniusculus (Signal crayfish) []. Pacifastin is a 155kDa composed of two covalently linked subunits, which are separately encoded. The heavy chain of pacifastin (105kDa) is related to transferrins, containing three transferrin lobes, two of which seem tobe active for iron binding []. A number of the members of the transferrin family are also serine peptidases belong to MEROPS peptidase family S60 (). The light chain of pacifastin (44kDa) is the proteinase inhibitory subunit, and has nine cysteine-rich inhibitory domains that are homologous to each other. The locust inhibitors share a conserved array of six cysteine residues with the pacifastin light chain. The structure of members of this family reveal that they are comprised of a triple-stranded antiparallel β-sheet connected by three disulphide bridges [].The biological function(s) of the locust inhibitors is (are) not fully understood. LMCI-1 and LMCI-2 were shown to inhibit the endogenous proteolytic activating cascade of prophenoloxidase []. Expression analysis shows that the genes encoding the SGPI precursors are differentially expressed in a time-, stage- and hormone-dependent manner.This entry also contains the multidomain organization of SCO-spondin. The SCO-spondin protein is a special feature of the chordate phylum. This protein is expressed in the central nervous system (CNS) from the time a dorsal neural tube appears in the course of phylogenetical evolution [].
Protein Domain
Type: Family
Description: Haem-containing catalase-peroxidases are bifunctional antioxidant enzymes that exhibit both catalase () and peroxidase () activity, and which are present predominantly in bacterial species []. Several evolutionary lineages are present also in archaeal, fungal, and protistan species. These enzymes provide protection against oxidative stress by dismutating hydrogen peroxide to oxygen and water []. Phylogenetically they are closely related to ascorbate peroxidases and cytochrome c peroxidases []and can be divided in two distinct clades []. They do not share sequence similarity with mono-functional, haem-containing catalases () that are ubiquitous in aerobic organisms, nor with non-haem manganese-containing catalases found in bacteria (). Catalases perform a unique two-step reaction cycle that cleaves two hydrogen peroxide molecules heterolytically to alternately oxidise and reduce the haem iron thus releasing water and molecular oxygen []. Contrary, peroxidases use hydrogen peroxide only to oxidise the haem iron, but use different electron donors such as NADH or ascorbate to then reduce the haem.The structure of the catalase-peroxidase from the archaeon, Haloarcula marismortui (Halobacterium marismortui), reveals a dimer of two identical subunits [], although some catalase-peroxidases can exist also as homotetramers. The general topology, as well as the arrangement of the catalytic residues and haem in the active site, are similar to other class I peroxidases. However, the location of the haem group deeply buried inside the domain is typical of a catalase. The primary structure of the subunit can be divided into two similar halves, which very probably arose from a gene duplication event [, ]. A similar structure was obtained also for a catalase-peroxidase from the proteobacterium Burkholderia pseudomallei [].
Protein Domain
Type: Family
Description: Sea anemones are a rich source of lethal pore-forming peptides and proteins, known collectively as cytolysins or actinoporins. There are several different groups of cytolysins based on their structure and function, and share conserved regions such as a surface-exposed lipid/carbohydrate-binding module involved in toxin binding to cell membranes providing a non-specific binding to membranes to target a wide range of species and protein-protein binding surfaces that contribute to the oligomerization of membrane-bound actinoporin monomers [, ]. This entry represents the most numerous group, the 20kDa highly basic peptides. These cytolysins form cation-selective pores in sphingomyelin-containing membranes. Examples include equinatoxins (from Actinia equina), sticholysins (from Stichodactyla helianthus), magnificalysins (from Heteractis magnifica), and tenebrosins (from Actinia tenebrosa), which exhibit pore-forming, haemolytic, cytotoxic, and heart stimulatory activities. This entry also includes related proteins from fish.Cytolysins adopt a stable soluble structure, which undergoes a conformational change when brought in contact with a membrane, leading to an active, membrane-bound form that inserts spontaneously into the membrane. They often oligomerize on the membrane surface, before puncturing the lipid bilayers, causing the cell to lyse. The 20kDa sea anemone cytolysins require a phosphocholine lipid headgroup for binding, however sphingomyelin is required for the toxin to promote membrane permeability []. The crystal structures of equinotoxin II []and sticholysin II []both revealed a compact β-sandwich consisting of ten strands in two sheets flanked on each side by two short α-helices, which is a similar topology to osmotin. It is believed that the β-sandwich structure attaches to the membrane, while a three-turn α-helix lying on the surface of the β-sheet may be involved in membrane pore formation, possibly by the penetration of the membrane by the helix.Interestingly, this entry also includes bryoporin from the moss Physcomitrella patens. It shares the protein structure similarity with sea anemone actinoporin. The bryoporin gene was upregulated by various abiotic stresses, in particular most strongly by dehydration stress. Overexpression of the bryoporin gene heightens drought tolerance in P. patens significantly [].
Protein Domain
Type: Family
Description: Wnt proteins constitute a large family of secreted molecules that are involved in intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1 (mouse) and wingless (Drosophila) []. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals []. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [, ].Wnt-mediated signalling is believed to proceed initially through binding to cell surface receptors of the frizzled family; the signal is subsequently transduced through several cytoplasmic components to B-catenin, which enters the nucleus and activates the transcription of several genes important indevelopment []. Several non-canonical Wnt signalling pathways have also been elucidated that act independently of B-catenin. Canonical and noncanonical Wnt signaling branches are highly interconnected, and cross-regulate each other [].Members of the Wnt gene family are defined by their sequence similarity to mouse Wnt-1 and Wingless in Drosophila. They encode proteins of ~350-400 residues in length, with orthologues identified in several, mostly vertebrate, species. Very little is known about the structure of Wnts as they are notoriously insoluble, but they share the following features characteristics of secretory proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines []that are probably involved in disulphide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters. Fifteen major Wnt gene families have been identified in vertebrates, with multiple subtypes within some classes.Wnt-4 cDNA was isolated from mouse using a PCR-based strategy, where it wasfound to be expressed in adult tissues, particularly in brain and lung [].Wnt-4 is believed to act downstream of progesterone signalling to playan important role in mammary gland development. Furthermore, mutationsin the Wnt-4 gene have been linked to kidney defects [].
Protein Domain
Type: Family
Description: Wnt proteins constitute a large family of secreted molecules that are involvedin intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1 (mouse) and wingless (Drosophila) []. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals []. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [, ].Wnt-mediated signalling is believed to proceed initially through binding to cell surface receptors of the frizzled family; the signal is subsequently transduced through several cytoplasmic components to B-catenin, which enters the nucleus and activates the transcription of several genes important indevelopment []. Several non-canonical Wnt signalling pathways have also been elucidated that act independently of B-catenin. Canonical and noncanonical Wnt signaling branches are highly interconnected, and cross-regulate each other [].Members of the Wnt gene family are defined by their sequence similarity to mouse Wnt-1 and Wingless in Drosophila. They encode proteins of ~350-400 residues in length, with orthologues identified in several, mostly vertebrate, species. Very little is known about the structure of Wnts as they are notoriously insoluble, but they share the following features characteristics of secretory proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines []that are probably involved in disulphide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters. Fifteen major Wnt gene families have been identified in vertebrates, with multiple subtypes within some classes.Wnt-6 cDNA was isolated from mouse using a PCR-based strategy, where itwas found to be expressed in adult tissues, particularly in brain and lung[]. Furthermore, Wnt-6 is expressed in the ureter bud, where it is believedto activate kidney tubule development [].
Protein Domain
Type: Family
Description: The annexins (or lipocortins) are a family of proteins that bind to phospholipids in a calcium-dependent manner []. The 12 annexins common to vertebrates are classified in the annexin A family and named as annexins A1-A13 (or ANXA1-ANXA13), leaving A12 unassigned in the official nomenclature. Annexins outside vertebrates are classified into families B (in invertebrates), C (in fungi and some groups of unicellular eukaryotes), D (in plants), and E (in protists) []. Annexins are absent from yeasts and prokaryotes [].Most eukaryotic species have 1-20 annexin (ANX) genes. All annexins share a core domain made up of four similar repeats, each approximately 70 amino acids long []. Each individual annexin repeat (sometimes referred to as endonexin folds) is folded into five α-helices, and in turn are wound into a right-handed super-helix; they usually contain a characteristic 'type 2' motif for binding calcium ions with the sequence 'GxGT-[38 residues]-D/E'. Animal and fungal annexins also have variable amino-terminal domains. The core domains of most vertebrate annexins have been analysed by X-ray crystallography, revealing conservation of their secondary and tertiary structures despite only 45-55% amino-acid identity among individual members. The four repeats pack into a structure that resembles a flattened disc, with a slightly convex surface on which the Ca2+ -binding loops are located and a concave surface at which the amino and carboxyl termini come into close apposition.Annexins are traditionally thought of as calcium-dependent phospholipid-binding proteins, but recent work suggests a more complex set of functions. The family has been linked with inhibition of phospholipase activity, exocytosis and endocytosis, signal transduction, organisation of the extracellular matrix, resistance to reactive oxygen species and DNA replication [].This entry represents a fungal annexin. The first fungal annexin was reported in 1998, encoded by the anx14 gene of the filamentous ascomycete Neurospora crassa. Phylogenetic analyses clustered the fungal annexin with homologous proteins from anumber of animal species; this is consistent with the existence of an animal-fungal clade.
Protein Domain
Type: Family
Description: One of the major neuropathological hallmarks of Alzheimer's disease (AD) is the progressive formation in the brain of insoluble amyloid plaques and vascular deposits consisting of beta-amyloid protein (beta-APP) []. Production of beta-APP requires proteolytic cleavage of the large type-1 transmembrane (TM) protein amyloid precursor protein (APP) []. This process is performed by a variety of enzymes known as secretases. To initiate beta-APP formation, beta-secretase cleaves APP to release a soluble N-terminal fragment (APPsBeta) and a C-terminal fragment that remains membrane bound. This fragment is subsequently cleaved by gamma-secretase to liberate beta-APP.Several independent studies identified a novel TM aspartic protease as the major beta-secretase [, , ]. This protein, termed memapsin 2 or beta-site APP cleaving enzyme 1 (BACE1), shares 64% amino acid sequence similarity with a second enzyme, termed BACE2. Together, BACE1 and BACE2 define a novel family of aspartyl proteases []. Both enzymes share significant sequence similarity with other members of the pepsin family of aspartyl proteases and contain the two characteristic D(T/S)G(T/S) motifs that form the catalytic site. However, by contrast with other aspartyl proteases, BACE1 and BACE2 are type I TM proteins. Each protein comprises a large lumenal domain containing the active centre, a single TM domain and a small cytoplasmic tail.BACE2, also termed Asp1 and memapsin 1, was initially identified though Expressed Sequence Tag (EST) database searching. In vitro enzymaticassays with peptide substrates have demonstrated that BACE2 cleaves beta-secretase substrates in a similar fashion to BACE1 []. The BACE2 mRNA is expressed in the central nervous system and many peripheral tissues, although its expression level in neurons is substantially lower than that of BACE1 [].
Protein Domain
Type: Family
Description: This superfamily consists of prokaryotic Ku domain containing proteins. In the eukaryotes it has been shown that the Ku protein is involved in repairing DNA double-strand breaks by non-homologous end-joining [, ]. The Ku protein is a heterodimer of approximately 70kDa and 80kDa subunits []. Both these subunits have strong sequence similarity and it has been suggested that they may have evolved by gene duplication from a homodimeric ancestor in eukaryotes []. The prokaryotic Ku members are homodimers and they have been predicted to be involved in the DNA repair system, which is mechanistically similar to eukaryotic non-homologous end joining [, ]. Recent findings have implicated yeast Ku in telomeric structure maintenance in addition to on-homologous end-joining. Some of the phenotypes of the Ku-knockout mice may indicate a similar role for Ku at mammalian telomeres [].Evolutionary notes: With current available phyletic information it is difficult to determine the correct evolutionary trajectory of the Ku domain. It is possible that the core Ku domain was present in bacteria and archaea even before the presence of the eukaryotes. Eukaryotes might have vertically inherited the Ku-core protein, from a common ancestor shared with a certain archaeal lineage or through horizontal transfer from bacteria. Alternatively, the core Ku domain could have evolved in the eukaryotic lineage and then horizontally transferred to the prokaryotes. Sequencing of additional archaeal genomes and those of early-branching eukaryotes may help resolving the evolutionary history of the Ku domain.Structure notes: The eukaryotic Ku heterodimer comprises an alpha/beta N-terminal, a central β-barrel domain and a helical C-terminal arm []. Structural analysis of the Ku70/80 heterodimer bound to DNA indicate that subunit contacts lead to the formation of a highly charged channel through which the DNA passes without making any contacts with the DNA bases [].For additional information please see [].
Protein Domain
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes []. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [, , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability []. This entry represents a family of Cas proteins including CRISPR system endoribonuclease Csx1. This family was previously known as CRISPR-associated protein, MJ1666 family.
Protein Domain
Type: Family
Description: Calreticulin is a ubiquitous protein found in a wide range of species and in all nucleated cell types. It is an ancient and highly conserved protein with an exceptionally wide scope and variety of functions. Initially known as the high-affinity calcium-binding endoplasmic reticulum (ER) and sarcoplamic reticulum (SR) protein "calregulin", calreticulin is now known to associate with proteins in the cytoplasm, nucleus and extracellular compartment. Calreticulin is a major Ca2 -binding/storage chaperone residing in the ER lumen []. Molecular chaperones residing in the ER facilitate the folding and prevent the aggregation of newly synthesized proteins. Interaction between the molecular chaperone and the misfolded protein leads to the retention, retranslocation and eventual degradation of the misfolded protein by the proteasome after ubiquitination []. Calreticulin binds (buffers) Ca2 with high capacity and participates in folding newly synthesized proteins and glycoproteins. It is an important component of the calreticulin/calnexin cycle and quality control pathways in the ER []. Studies on calreticulin-deficient and calreticulin-transgenic mice revealed that calreticulin is a new cardiac embryonic gene and is essential during cardiac development [, ]. Calreticulin has also been characterised as an extracellular lectin, an intracellular mediator of integrin-mediated cell adhesion, an inhibitor of steroid hormone-regulated gene expression and a C1q-binding protein []. A proposed model of calreticulin domains includes a globular N-domain, a central proline-rich P-domain and an acidic C-domain. A detailed structure of the central P-domain was revealed by NMR studies, while a model of the globular N-domain of calreticulin is based on crystallographic data reported for the highly similar calnexin [].Calreticulin is also known as calregulin, Erp60, CRP55, CAB-63 and CaBP3 [].
Protein Domain
Type: Domain
Description: Tumor necrosis factor receptor superfamily member 6 (TNFRSF6), also known as Fas cell surface death receptor (FasR) or Fas, APT1, CD95, FAS1, APO-1, FASTM, ALPS1A, contains a death domain and plays a central role in the physiological regulation of programmed cell death [, ]. It has been implicated in the pathogenesis of various malignancies and diseases of the immune system [, ]. The receptor interacts with the Fas ligand (FasL), allowing the formation of a death-inducing signaling complex that includes Fas-associated death domain protein (FADD), caspase 8, and caspase 10; autoproteolytic processing of the caspases in the complex triggers a downstream caspase cascade, leading to apoptosis. This receptor has also been shown to activate NF-kappaB, MAPK3/ERK1, and MAPK8/JNK, and is involved in transducing the proliferating signals in normal diploid fibroblast and T cells [, , ].In channel catfish and the Japanese rice fish medaka, homologues of Fas receptor (FasR), as well as FADD and caspase 8, have been identified and characterized, and likely constitute the teleost equivalent of the death-inducing signaling complex (DISC) [, ]. FasL/FasR are involved in the initiation of apoptosis and suggest that mechanisms of cell-mediated cytotoxicity in teleosts are similar to those used by mammals; presumably, the mechanism of apoptosis induction via death receptors was evolutionarily established during the appearance of vertebrates.This entry represents the N-terminal domain of TNFRSF6/Fas from teleosts. TNF-receptors are modular proteins. The N-terminal extracellular part contains a cysteine-rich region responsible for ligand-binding. This region is composed of small modules of about 40 residues containing 6 conserved cysteines; the number and type of modules can vary in different members of the family [, , ].
Protein Domain
Type: Family
Description: DNA carries the biological information that instructs cells how to existin an ordered fashion: accurate replication is thus one of the mostimportant events in the cell life cycle. This function is mediated byDNA-directed DNA-polymerases, which add nucleotide triphosphate (dNTP)residues to the 5'-end of the growing DNA chain, using a complementary DNA as template. Small RNA molecules are generally used as primers forchain elongation, although terminal proteins may also be used. Three motifs, A, B and C [], are seen to be conserved across all DNA-polymerases, with motifs A and C also seen in RNA- polymerases. They are centred on invariant residues, and their structural significance was implied from the Klenlow (Escherichia coli) structure: motif A contains a strictly-conserved aspartate at the junction of a β-strand and an α-helix; motif B contains an α-helix with positive charges; and motif C has a doublet of negative charges, located in a β-turn-beta secondary structure [].DNA polymerases () can be classified, on the basis of sequencesimilarity [, ], into at least four different groups: A, B, C and X. Members of family X are small (about 40 Kd) compared with other polymerases and encompass two distinct polymerase enzymes that have similar functionality: vertebrate polymerase beta (same as Saccharomyces cerevisiae pol 4), and terminal deoxynucleotidyl-transferase (TdT) (). The former functions in DNA repair, whilethe latter terminally adds single nucleotides to polydeoxynucleotide chains.Both enzymes catalyse addition of nucleotides in a distributive manner, i.e. theydissociate from the template-primer after addition of each nucleotide.DNA-polymerases show a degree of structural similarity with RNA-polymerases.This entry represents terminal deoxynucleotidyl-transferase (TdT) and the DNA-directed DNA/RNA polymerase mu. The latter is a gap-filling polymerase involved in repair of DNA double-strand breaks by non-homologous end joining [, ].
Protein Domain
Type: Family
Description: The tumour necrosis factor (TNF) receptor (TNFR) superfamily comprises more than 20 type-I transmembrane proteins. Family members are defined based on similarity in their extracellular domain -a region that contains many cysteine residues arranged in a specific repetitive pattern []. The cysteines allow formation of an extended rod-like structure, responsible for ligand binding []. Upon receptor activation, different intracellular signalling complexes are assembled for different members of the TNFR superfamily, depending on their intracellular domains and sequences []. Activation of TNFRs can therefore induce a range of disparate effects, including cell proliferation, differentiation, survival, or apoptotic cell death, depending upon the receptor involved [, ]. TNFRs are widely distributed and play important roles in many crucial biological processes, such as lymphoid and neuronal development, innate and adaptive immunity, and maintenance of cellular homeostasis []. Drugs that manipulate their signalling have potential roles in the prevention and treatment of many diseases, such as viral infections, coronary heart disease, transplant rejection, and immune disease []. TNF receptor 27 (also known as ectodysplasin A2 receptor (EDA2R) and ectodysplasin receptor, x-linked (XEDAR)) is highly expressed during embryogenesis [], and has been implicated in the development of ectodermal appendages, such as hair follicles, teeth and sweat glands []. Although it lacks a death domain, the receptor can nevertheless induce cell death via activation of caspase 8, and may play a role in the induction of apoptosis during embryonic development and adult life []. A single partial match was also found, , a translated human cDNA sequence that fails to match motif 1.
Protein Domain
Type: Domain
Description: STAT6 mediate signals from the IL-4 receptor. Unlike the other STAT proteins which bind an IFNgamma Activating Sequence (GAS), STAT6 stands out as having a unique binding site preference. This site consists of a palindromic sequence separated by a 3 bp spacer (TTCNNNG-AA)(N3 site). STAT6 is able to bind the GAS site but only at a low affinity upon IL-4-induced activation []. There is speculation that the inappropriate activation of STAT6 is involved in uncontrolled cell growth in an oncogenic state []. IL-4 signaling via STAT6 initially occurs unopposed, but is then dampened by a negative feedback mechanism through the IL-4/Stat6 dependent induction of SOCS1 expression. The IL-4 dependent aspect of Th2 differentiation requires the activation of STAT6. IL-4 signaling and STAT6 appear to play an important role in the immune response. It was shown that large scale chromatin remodeling of the IL-4 gene occurs as cells differentiate into Th2 effectors is STAT6 dependent []. This entry represents the SH2 domain of STAT6.STAT proteins have a dual function: signal transduction and activation of transcription. When cytokines are bound to cell surface receptors, the associated Janus kinases (JAKs) are activated, leading to tyrosine phosphorylation of the given STAT proteins []. Phosphorylated STATs form dimers, translocate to the nucleus, and bind specific response elements to activate transcription of target genes []. STAT proteins contain an N-terminal domain (NTD), a coiled-coil domain (CCD), a DNA-binding domain (DBD), an α-helical linker domain (LD), an SH2 domain, and a transactivation domain (TAD). The SH2 domain is necessary for receptor association and tyrosine phosphodimer formation. There are seven mammalian STAT family members which have been identified: STAT1, STAT2, STAT3, STAT4, STAT5 (STAT5A and STAT5B), and STAT6 [].
Protein Domain
Type: Homologous_superfamily
Description: Cytochrome c oxidase () is a key enzyme in aerobic metabolism. Proton pumping haem-copper oxidases represent the terminal, energy-transfer enzymes of respiratory chains in prokaryotes and eukaryotes. The CuB-haem a3 (or haem o) binuclear centre, associated with the largest subunit I of cytochrome c and ubiquinol oxidases (), is directly involved in the coupling between dioxygen reduction and proton pumping [, ].Some terminal oxidases generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner membrane (eukaryotes).The enzyme complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) of which only the catalytic subunit (equivalent to mammalian subunit I (CO I) is found in all haem-copper respiratory oxidases. The presence of a bimetallic centre (formed by a high-spin haem and copper B) as well as a low-spin haem, both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common to all family members [, , ]. In contrast to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The enzyme complexes vary in haem and copper composition, substrate type and substrate affinity. The different respiratory oxidases allow the cells to customize their respiratory systems according to a variety of environmental growth conditions []. It has been shown that eubacterial quinol oxidase was derived from cytochrome c oxidase in Gram-positive bacteria and that archaebacterial quinol oxidase has an independent origin. A considerable amount of evidence suggests that proteobacteria (Purple bacteria) acquired quinol oxidase through a lateral gene transfer from Gram-positive bacteria [].This entry represents a structural domain superfamily found in subunit I of cytochrome c oxidase as well as related proteins, including quinol oxidase. Structurally, it is composed of twelve transmembrane helices in an approximate threefold rotational symmetric arrangement.
Protein Domain
Type: Family
Description: Wnt proteins constitute a large family of secreted molecules that are involved in intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1 (mouse) and wingless (Drosophila) []. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals []. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [, ].Wnt-mediated signalling is believed to proceed initially through binding to cell surface receptors of the frizzled family; the signal is subsequently transduced through several cytoplasmic components to B-catenin, which enters the nucleus and activates the transcription of several genes important indevelopment []. Several non-canonical Wnt signalling pathways have also been elucidated that act independently of B-catenin. Canonical and noncanonical Wnt signaling branches are highly interconnected, and cross-regulate each other [].Members of the Wnt gene family are defined by their sequence similarity to mouse Wnt-1 and Wingless in Drosophila. They encode proteins of ~350-400 residues in length, with orthologues identified in several, mostly vertebrate, species. Very little is known about the structure of Wnts as they are notoriously insoluble, but they share the following features characteristics of secretory proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines []that are probably involved in disulphide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters. Fifteen major Wnt gene families have been identified in vertebrates, with multiple subtypes within some classes.This entry represents Wnt-8. It was first isolated from Xenopus, with orthologues later being identified in several other species including human, mouse and zebrafish. Alternative exon splicing gives rise to several Wnt-8 isoforms [].
Protein Domain
Type: Homologous_superfamily
Description: Transcription factors of the T-box family are required both for early cell-fate decisions, such as those necessary for formation of the basic vertebrate body plan, and for differentiation and organogenesis []. The T-box is defined as the minimal region within the T-box protein that is both necessary and sufficient for sequence-specific DNA binding, all members of the family so far examined bind to the DNA consensus sequence TCACACCT. The T-box is a relatively large DNA-binding domain, generally comprising about a third of the entire protein (17-26kDa) [].These genes were uncovered on the basis of similarity to the DNA binding domain []of Mus musculus (Mouse) Brachyury (T) gene product, which similarity is the defining feature of the family. The Brachyury gene is named for its phenotype, which was identified 70 years ago as a mutant mouse strain with a short blunted tail. The gene, and its paralogues, have become a well-studied model for the family, and hence much of what is known about the T-box family is derived from the murine Brachyury gene.Consistent with its nuclear location, Brachyury protein has a sequence-specific DNA-binding activity and can act as a transcriptional regulator []. Homozygous mutants for the gene undergo extensive developmental anomalies, thus rendering the mutation lethal []. The postulated role of Brachyury is as a transcription factor, regulating the specification and differentiation of posterior mesoderm during gastrulation in a dose-dependent manner [].T-box proteins tend to be expressed in specific organs or cell types, especially during development, and they are generally required for the development of those tissues, for example, Brachyury is expressed in posterior mesoderm and in the developing notochord, and it is required for the formation of these cells in mice []. The T-box superfamily is an ancient group that appears to play a critical role in development in all animal species [].
Protein Domain
Type: Homologous_superfamily
Description: The SET domain is a 130 to 140 amino acid, evolutionary well conserved sequence motif that was initially characterised in the Drosophila proteins Su(var)3-9, Enhancer-of-zeste and Trithorax [, ]. In eukaryotic organisms, it appears in proteins with an important role in regulating chromatin-mediated gene transcriptional activation and silencing. In viruses,bacteria and archaea, its function is not clear yet []. This superfamily includes eukaryotic proteins with histone methyltransferase activity, which requires the combination of the SET domain with the adjacent cysteine-rich regions, one located N-terminally (pre-SET) and the other posterior to the SET domain (post-SET). Post- and pre- SET regions seem then to play a crucial role when it comes to substrate recognition and enzymatic activity [, ]. Other SET domain-containing proteins function as transcription factors (such as PR domain zinc finger protein 1 from humans []). The structure of the SET domain and the two adjacent regions pre-SET and post-SET have been solved [, , ]. The SET domain structure is all-β, but consists only in sets of few short strands composing no more than a couple of small sheets. Consequently the SET structure is mostly defined by turns and loops. An unusual feature is that the SET core is made up of two discontinuous segments of the primary sequence forming an approximate L-shape [, , ]. Two of the most conserved motifs in the SET domain are constituted by a stretch at the C-terminal containing a strictly conserved tyrosine residue and a preceding loop inside which the C-terminal segment passes forming a knot-like structure, but not quite a true knot. These two regions have been proven to be essential for SAM binding and catalysis, particularly the invariant tyrosine where in all likelihood catalysis takes place [, ].
Protein Domain
Type: Conserved_site
Description: The lipocalins are a diverse, interesting, yet poorly understood family of proteins composed, in the main, of extracellular ligand-binding proteins displaying high specificity for small hydrophobic molecules []. Functions of these proteins include transport of nutrients, control of cell regulation, pheromone transport, cryptic colouration, and the enzymatic synthesis of prostaglandins. For example, retinol-binding protein 4 transfers retinol from the stores in the liver to peripheral tissues [].The crystal structures of several lipocalins have been solved and show a novel 8-stranded anti-parallel β-barrel fold well conserved within the family. Sequence similarity within the family is at a much lower level and would seem to be restricted to conserved disulphides and 3 motifs, which form a juxtaposed clusterthat may act as a common cell surface receptor site [, ]. By contrast, at the more variable end of the fold are found an internal ligand binding site and a putative surface for the formation of macromolecular complexes []. The anti-parallel β-barrel fold is also exploited by the fatty acid-binding proteins, which function similarly by binding small hydrophobic molecules. Similarity at the sequence level, however, is less obvious, being confined to a single short N-terminal motif.This entry represents the Lipocalin conserved site. The sequences of most members of the family, the core or kernal lipocalins, are characterised by three short conserved stretches of residues []. Others, the outlier lipocalin group, share only one or two of these. This signature pattern was built around the first, common to all outlier and kernal lipocalins, which occurs near the start of the first β-strand.
Protein Domain
Type: Domain
Description: This domain is named after Bet v 1, the major birch pollen allergen. Bet v 1 belongs to family 10 of plant pathogenesis-related proteins (PR-10), cytoplasmic proteins of 15-17 kd that are wide-spread among dicotyledonous plants []. In recent years, a number of diverse plant proteins with low sequence similarity to Bet v 1 was identified. A classification by sequence similarity yielded several subfamilies related to PR-10 []:Pathogenesis-related proteins PR-10: These proteins were identified as major tree pollen allergens in birch and related species (hazel, alder), as plant food allergens expressed in high levels in fruits, vegetables and seeds (apple, celery, hazelnut), and as pathogenesis-related proteins whose expression is induced by pathogen infection, wounding, or abiotic stress. Hyp-1 (), an enzyme involved in the synthesis of the bioactive naphthodianthrone hypericin in St. John's wort (Hypericum perforatum) also belongs to this family. Most of these proteins were found in dicotyledonous plants. In addition, related sequences were identified in monocots and conifers. Cytokinin-specific binding proteins: These legume proteins bind cytokinin plant hormones [].(S)-Norcoclaurine synthases are enzymes catalysing the condensation of dopamine and 4-hydroxyphenylacetaldehyde to (S)-norcoclaurine, the first committed step in the biosynthesis of benzylisoquinoline alkaloids such as morphine []. Major latex proteins and ripening-related proteins are proteins of unknown biological function that were first discovered in the latex of opium poppy (Papaver somniferum) and later found to be upregulated during ripening of fruits such as strawberry and cucumber []. The occurrence of Bet v 1-related proteins is confined to seed plants with the exception of a cytokinin-binding protein from the moss Physcomitrella patens ().
Protein Domain
Type: Domain
Description: This entry represents a six transmembrane helix rhomboid domain.This domain is found in serine peptidases belonging to the MEROPS peptidase family S54 (Rhomboid, clan ST). They are integral membrane proteins related to the Drosophila melanogaster (Fruit fly) rhomboid protein . Members of this family are found in archaea, bacteria and eukaryotes.The rhomboid protease cleaves type-1 transmembrane domains using a catalytic dyad composed of serine and histidine. The active site is embedded within the membrane and the active site residues are on different transmembrane regions. From the tertiary structure of the Escherichia coli homologue GlpG []it was shown that hydrolysis occurs in a fluid filled cavity within the membrane. Initially, a catalytic triad including a highly conserved asparagine had been proposed, but this residue has been shown not to be essential []. Drosophila rhomboid cleaves the transmembrane proteins Spitz, Gurken and Keren within their transmembrane domains to release a soluble TGFalpha-like growth factor. Cleavage occurs in the Golgi, following translocation of the substrates from the endoplasmic reticulum membrane by Star, another transmembrane protein. The growth factors are then able to activate the epidermal growth factor receptor [, ].Few substrates of mammalian rhomboid homologues have been determined, but rhomboid-like protein 2 has been shown to cleave ephrin B3 []. Parasite-encoded rhomboid enzymes are also important for invasion of host cells by Toxoplasma and the malaria parasite. Invasion of host cells first requires their recognition and this is achieved by parasite transmembrane adhesins interacting with host cell receptors. Before the parasite can enter a host cell the adhesins must be released by cleavage. In Toxoplasma rhomboid TgROM5 cleaves the adhesins, and in Plasmodium, which lacks a TgROM5 orthologue, PfROMs 1 and 4 cleave the diverse array of malaria parasite adhesins [].
Protein Domain
Type: Family
Description: Animal lectins are proteins that bind to sugars and are involved in most non-structural roles of sugars, including trafficking of glycoconjugates and cell-cell recognition. Although they may be structurally very complex, they typically bind through the activity of a specific compact protein domain known as a carbohydrate-binding domain or CRD []. Animal lectins display a wide variety of architectures, and are classified according to the carbohydrate-recognition domain (CRD) of which there are two main types, S-type and C-type.Galectins (previously S-lectins) bind exclusively beta-galactosides like lactose. They do not require metal ions for activity. Galectins are found predominantly, but not exclusively in mammals []. Their function is unclear. They are developmentally regulated and may be involved in differentiation, cellular regulation and tissue construction. Mammalian galectins typically bind beta-galactosides. Their CRD shows a structural similarity to L-type lectin CRDs, but no strong sequence similarity, suggesting convergent evolution []. They are classified into three subgroups: proto type, which contain one CRD (Gal-1, -2, -5, -7, -10, -11, -13, -14 and -15); tandem repeat type, which contain two CRDs in tandem (Gal-4, -6, -8, -9 and -12); and chimera type, which contain one CRD and an additional non-lectin domain (Gal-3) [, ].Galectin-13, also known as placental protein 13, is a placenta-specific galectin that induces the apoptosis of T lymphocytes, which may reduce the danger of maternal immune attacks on the fetal semiallograft during the long gestation of anthropoid primates []. It has been suggested that it may have special haemostatic and immunobiological functions at the lining of the common feto-maternal blood-spaces or a developmental role in the placenta []. A lack of gal-13 may contribute to gestational diabetes mellitus (GDM) []. Galectin-13 was shown to have lysophospholipase activity [].
Protein Domain
Type: Family
Description: Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocasepathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them tothe translocase component []. From there, the mature proteins are either targeted to the outermembrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterialchromosome. The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integralmembrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release ofthe mature peptide into the periplasm (SecD and SecF) []. The chaperone protein SecB []is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm.SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membraneprotein ATPase SecA for secretion []. Together with SecY and SecG, SecE forms a multimericchannel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. Thelatter is mediated by SecA. The structure of theEscherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmicdomains []. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15transmembrane helices. The SecD and SecF equivalents of theGram-positive bacterium Bacillus subtilis are jointly present in one polypeptide,denoted SecDF, that is required to maintain a high capacity for protein secretion.Unlike the SecD subunit of the pre-protein translocase of E. coli, SecDFof B. subtilis was not required for the release of a mature secretory protein fromthe membrane, indicating that SecDF is involved in earlier translocation steps [].Comparison with SecD andSecF proteins from other organisms revealed the presence of 10 conservedregions in SecDF, some of which appear to be important for SecDF function.Interestingly, the SecDF protein of B. subtilis has 12 putative transmembranedomains. Thus, SecDF does not only show sequence similarity but also structuralsimilarity to secondary solute transporters [].This entry rerpesents the SecF protein found in bacteria.
Protein Domain
Type: Family
Description: Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.This entry represents chloride channel proteins found in bacteria.
Protein Domain
Type: Family
Description: During the development of the vertebrate nervous system, many neurons become redundant (because they have died, failed to connect to target cells, etc.) and are eliminated. At the same time, developing neurons send out axon outgrowths that contact their target cells []. Such cells control their degree of innervation (the number of axon connections) by the secretion of various specific neurotrophic factors that are essential for neuron survival. One of these is nerve growth factor (NGF), which is involved in the survival of some classes of embryonic neuron (e.g., peripheral sympathetic neurons) []. NGF is mostly found outside the central nervous system (CNS), but slight traces have been detected in adult CNS tissues, although a physiological role for this is unknown []; it has also been found in several snake venoms [, ]. Proteins similar to NGF include brain-derived neurotrophic factor (BDNF) and neurotrophins 3 to 7, all of which demonstrate neuron survival and outgrowth activities. Although NGF was originally identified in snake venom, its most abundant and best studied source is the submaxillary gland of adult male mice []. Mouse NGF is a high molecular weight hexamer, composed of 2 subunits each of alpha, beta and gamma polypeptides. The beta subunit (NGF-beta) is responsible for the physiological activity of the complex []. NGF-beta induces its cell survival effects through activation of neurotrophic tyrosine kinase receptor type 1 (NTRK1; also called TrkA), and can induce cell death by binding to the low affinity nerve growth factor receptor, p75NTR []. The neurotrophin has been shown to be involved in sympathetic axon growth and innervation of target fields [].
Protein Domain
Type: Domain
Description: The SET domain is a 130 to 140 amino acid, evolutionary well conservedsequence motif that was initially characterised in the Drosophila proteins Su(var)3-9, Enhancer-of-zeste and Trithorax. In addition to these chromosomal proteins modulating gene activities and/or chromatin structure, the SET domain is found in proteins of diverse functions ranging from yeast to mammals, but also including some bacteria and viruses [, ].The SET domains of mammalian SUV39H1 and 2 and fission yeast clr4 have been shown to be necessary for the methylation of lysine-9 in the histone H3 N terminus []. However, this histone methyltransferase (HMTase) activity is probably restricted to a subset of SET domain proteins as it requires the combination of the SET domain with the adjacent cysteine-rich regions, one located N-terminally (pre-SET) and the other posterior to the SET domain (post-SET). Post- and pre- SET regions seem then to play a crucial role when it comes to substrate recognition and enzymatic activity [, ].The structure of the SET domain and the two adjacent regions pre-SET and post-SET have been solved [, , ]. The SET structure is all beta, but consists only in sets of few short strands composing no more than a couple of small sheets. Consequently the SET structure is mostly defined by turns and loops. An unusual feature is that the SET core is made up of two discontinual segments of the primary sequence forming an approximate L shape [, , ]. Two of the most conserved motifs in the SET domain are constituted by (1) a stretch at the C-terminal containing a strictly conserved tyrosine residue and (2) a preceding loop inside which the C-terminal segment passes forming a knot-like structure, but not quite a true knot. These two regions have been proven to be essential for SAM binding and catalysis, particularly the invariant tyrosine where in all likelihood catalysis takes place [, ].
Protein Domain
Type: Family
Description: During the development of the vertebrate nervous system, many neurons become redundant (because they have died, failed to connect to target cells, etc.) and are eliminated. At the same time, developing neurons send out axon outgrowths that contact their target cells []. Such cells control their degree of innervation (the number of axon connections) by the secretion of various specific neurotrophic factors that are essential for neuron survival. One of these is nerve growth factor (NGF), which is involved in the survival of some classes of embryonic neuron (e.g., peripheral sympathetic neurons) []. NGF is mostly found outside the central nervous system (CNS), but slight traces have been detected in adult CNS tissues, although a physiological role for this is unknown []; it has also been found in several snake venoms [, ]. Proteins similar to NGF include brain-derived neurotrophic factor (BDNF) and neurotrophins 3 to 7, all of which demonstrate neuron survival and outgrowth activities. Although NGF was originally identified in snake venom, its most abundant and best studied source is the submaxillary gland of adult male mice []. Mouse NGF is a high molecular weight hexamer, composed of 2 subunits each of alpha, beta and gamma polypeptides. The beta subunit (NGF-beta) is responsible for the physiological activity of the complex []. NGF-beta induces its cell survival effects through activation of neurotrophic tyrosine kinase receptor type 1 (NTRK1; also called TrkA), and can induce cell death by binding to the low affinity nerve growth factor receptor, p75NTR []. The neurotophin has been shown to be involved in sympathetic axon growth and innervation of target fields []. Mammalian NGF-beta tend to be higher potency NTRK1 agonsits than their snake venom counterparts []. In humans, NGF-beta gene mutations can cause a loss of pain perception [].
Protein Domain
Type: Domain
Description: Nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF) [, , ](also known as TGGCA-binding proteins) are a family of vertebrate nuclear proteins which recognise and bind, as dimers, the palindromic DNA sequence 5'-TGGCANNNTGCCA-3'. This family was first described for its role in stimulating the initiation of adenovirus DNA replication []. In vertebrates there are four members NFIA, NFIB, NFIC, and NFIX and an orthologue from Caenorhabditis elegans has been described, called Nuclear factor I family protein (NFI-I) []. The CTF/NF-I proteins are individually capable of activating transcription and DNA replication, thus they function by regulating cell proliferation and differentiation. They are involved in normal development and have been associated with developmental abnormalities and cancer in humans []. In a given species, there are a large number of different CTF/NF-I proteins, generated both by alternative splicing and by the occurrence of four different genes. CTF/NF-1 proteins contain 400 to 600 amino acids. The N-terminal 200 amino-acid sequence, almost perfectly conserved in all species and genes sequenced, mediates site-specific DNA recognition, protein dimerisation and Adenovirus DNAreplication. The C-terminal 100 amino acids contain the transcriptional activation domain. This activation domain is the target of gene expression regulatory pathways elicited by growth factors and it interacts with basal transcription factors and with histone H3 [].This entry represents the N terminus, of which 200 residues contain the DNA-binding and dimerisation domain, but also has an 8-47 residue highly conserved region 5' of this, whose function is not known. Deletion of the N-terminal 200 amino acids removes the DNA-binding activity, dimerisation-ability and the stimulation of adenovirus DNA replication [].
Protein Domain
Type: Binding_site
Description: Synonym(s): Serine hydroxymethyltransferase, Serine aldolase, Threonine aldolaseSerine hydroxymethyltransferase (SHMT) is a pyridoxal phosphate (PLP) dependent enzyme and belongs to the aspartate aminotransferase superfamily (fold type I) []. The pyridoxal-P group is attached to a lysine residue around which the sequence is highly conserved in all forms of the enzyme []. The enzyme carries out interconversion of serine and glycine using PLP as the cofactor. SHMT catalyses the transfer of a hydroxymethyl group from N5, N10- methylene tetrahydrofolate to glycine, resulting in the formation of serine and tetrahydrofolate. Both eukaryotic and prokaryotic SHMT enzymes form tight obligate homodimers and the mammalian enzyme forms a homotetramer [, ]. PLP dependent enzymes were previously classified into alpha, beta and gamma classes, based on the chemical characteristics (carbon atom involved) of the reaction they catalysed. The availability of several structures allowed a comprehensive analysis of the evolutionary classification of PLP dependent enzymes, and it was found that the functional classification did not always agree with the evolutionary history of these enzymes. Structure and sequence analysis has revealed that the PLP dependent enzymes can be classified into four major groups of different evolutionary origin: aspartate aminotransferase superfamily (fold type I), tryptophan synthase beta superfamily (fold type II), alanine racemase superfamily (fold type III), D-amino acid superfamily (fold type IV) and glycogen phophorylase family (fold type V) [, ].In vertebrates, glycine hydroxymethyltransferase exists in a cytoplasmic and a mitochondrial form whereasonly one form is found in prokaryotes.The signature of this entry contains the lysine residue to which the pyridoxal phosphate group is attached. The region surrounding this lysine residue is highly conserved in all forms of the enzyme.
Protein Domain
Type: Binding_site
Description: DNA carries the biological information that instructs cells how to existin an ordered fashion: accurate replication is thus one of the mostimportant events in the cell life cycle. This function is mediated byDNA-directed DNA-polymerases, which add nucleotide triphosphate (dNTP)residues to the 3'-end of the growing DNA chain, using a complementary DNA as template. Small RNA molecules are generally used as primers forchain elongation, although terminal proteins may also be used. Three motifs, A, B and C [], are seen to be conserved across all DNA-polymerases, with motifs A and C also seen in RNA- polymerases. They are centred on invariant residues, and their structural significance was implied from the Klenow (Escherichia coli) structure: motif A contains a strictly-conserved aspartate at thejunction of a β-strand and an α-helix; motif B contains an α-helix with positive charges; and motif C has a doublet of negative charges, located in a β-turn-beta secondary structure [].DNA polymerases () can be classified, on the basis of sequencesimilarity [, ], into at least four different groups: A, B, C and X. Members of family X are small (about 40kDa) compared with other polymerases and encompass two distinct polymerase enzymes that have similar functionality: vertebrate polymerase beta (same as yeast pol 4), and terminal deoxynucleotidyl-transferase (TdT) (). The former functions in DNA repair, whilethe latter terminally adds single nucleotides to polydeoxynucleotide chains.Both enzymes catalyse addition of nucleotides in a distributive manner, i.e. theydissociate from the template-primer after addition of each nucleotide.DNA-polymerases show a degree of structural similarity with RNA-polymerases.This entry includes a highly conserved region that contains a conserved arginine and two conserved aspartic acid residues. These residues have been shown to be involved in primer binding in polymerase beta [].
Protein Domain
Type: Family
Description: Nucleosides are hydrophilic molecules and require specialised transport proteins for permeation of cell membranes. There are two types of nucleoside transport processes: equilibrative bidirectional processes driven by chemical gradients, and inwardly directed concentrative processes driven by an electrochemical gradient []. The two types of nucleoside transporters are classified into two families: the solute carrier (SLC) 29 and SLC28 families, corresponding to equilibrative and concentrative nucleoside transporters, respectively [].The microbial proteins include broad specificity transporters, such as the Escherichia coli NupC protein which transports all nucleosides (both ribo- and deoxyribonucleosides) except hypoxanthine and guanine nucleosides []. Bacillus subtilis NupC transporter has been shown to be involved in transport of the pyrimidine nucleoside uridine []. A recently characterised fungal protein, the first transporter of this type to be described in eukaryotes, exhibited transport activity for adenosine, uridine, inosine and guanosine but not cytidine, thymidine or the nucleobase hypoxanthine [].The characterised mammalian proteins can be divided into three subgroups; CNT1, CNT2 and CNT3 []. CNT1 preferentially transports pyrimidines and weakly transports adenosine. Several antiviral and anticancer nucleoside analogues, including AZT and dFdC are also substrates for CNT1. CNT2 selectivelytransports purines, and the human form has also been shown to facilitate the uptake of some antiviral compounds including ddI and ribavirin. CNT3 has a broader specificity, transporting both purines and pyrimidines. Several anticancer nucleoside analogues such as CdA, dFdC and FdU are also transported by CNT3. Substrate specificity appears to depend on a region containing transmembrane regions 7, 8 and 9. Mutation of just four residues in this region was sufficient to convert the activity of human CNT1 to that of CNT2. At least three other concentrative nucleoside transport activities have been described in mammalian cells, but the proteins responsible for these activities have not yet been identified.This entry represents a family of Concentrative Nucleoside Transporter (CNT) proteins found in bacteria and animals.
Protein Domain
Type: Conserved_site
Description: Nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF) [, , ](also known as TGGCA-binding proteins) are a family of vertebrate nuclear proteins which recognise and bind, as dimers, the palindromic DNA sequence 5'-TGGCANNNTGCCA-3'. This family was first described for its role in stimulating the initiation of adenovirus DNA replication []. In vertebrates there are four members NFIA, NFIB, NFIC, and NFIX and an orthologue from Caenorhabditis elegans has been described, called Nuclear factor I family protein (NFI-I) []. The CTF/NF-I proteins are individually capable of activating transcription and DNA replication, thus they function by regulating cell proliferation and differentiation. They are involved in normal development and have been associated with developmental abnormalities and cancer in humans []. In a given species, there are a large number of different CTF/NF-I proteins, generated both by alternative splicing and by the occurrence of four different genes. CTF/NF-1 proteins contain 400 to 600 amino acids. The N-terminal 200 amino-acid sequence, almost perfectly conserved in all species and genes sequenced, mediates site-specific DNA recognition, protein dimerisation and Adenovirus DNA replication. The C-terminal 100 amino acids contain the transcriptional activation domain. This activation domain is the target of gene expression regulatory pathways elicited by growth factors and it interacts with basal transcription factors and with histone H3 [].This entry represents a specific signature for this family of proteins, which includes the four vertebrate members NFIA, NFIB, NFIC and NFIX. The signature is a perfectly conserved, highly charged 12-residue peptide located in the DNA-binding domain of CTF/NF-I. It does not contain the four conserved Cys residues, which are required for its DNA-binding activity [].
Protein Domain
Type: Conserved_site
Description: DNA carries the biological information that instructs cells how to existin an ordered fashion: accurate replication is thus one of the mostimportant events in the cell life cycle. This function is mediated byDNA-directed DNA-polymerases, which add nucleotide triphosphate (dNTP)residues to the 3'-end of the growing DNA chain, using a complementary DNA as template. Small RNA molecules are generally used as primers forchain elongation, although terminal proteins may also be used. Three motifs, A, B and C [], are seen to be conserved across all DNA-polymerases, with motifs A and C also seen in RNA- polymerases. They are centred on invariant residues, and their structural significance was implied from the Klenow (Escherichia coli) structure: motif A contains a strictly-conserved aspartate at the junction of a β-strand and an α-helix; motif B contains an α-helix with positive charges; and motif C has a doublet of negative charges, located in a β-turn-beta secondary structure [].DNA polymerases () can be classified, on the basis of sequencesimilarity [, ], into at least four different groups: A, B, C and X. Members of family X are small (about 40kDa) compared with other polymerases and encompass two distinct polymerase enzymes that have similar functionality: vertebrate polymerase beta (same as yeast pol 4), and terminal deoxynucleotidyl-transferase (TdT) (). The former functions in DNA repair, whilethe latter terminally adds single nucleotides to polydeoxynucleotide chains.Both enzymes catalyse addition of nucleotides in a distributive manner, i.e. theydissociate from the template-primer after addition of each nucleotide.DNA-polymerases show a degree of structural similarity with RNA-polymerases.Five regions of similarity are found in all the polymerases of this entry. The signature of this entry is to the conserved region, known as 'motif B' []; motif B is located in a domain which, in E. coli polA, has been shown to bind deoxynucleotide triphosphate substrates; it contains a conserved tyrosine which has been shown, by photo-affinity labelling, to be in the active site; a conserved lysine, also part of this motif, can be chemically labelled, using pyridoxal phosphate.
Protein Domain
Type: PTM
Description: Translation initiation factor 5A (IF-5A) was previously reported to be involved in the first step of peptide bond formation in translation; however more recent work implicates it as a universally conserved translation elongation factor [].eIF5A is a cofactor for the Rev and Rex transactivator proteins of human immunodeficiency virus-1 and T-cell leukaemia virus I, respectively [, , ]. IF-5A is the sole protein in eukaryotes and archaea to contain the unusual amino acid hypusine (Ne-(4-amino-2-hydroxybutyl)lysine) that is an absolute functional requirement. The first step in the post-translational modification of lysine to hypusine is catalyzed by the enzyme deoxyhypusine synthase, the structure of which has been reported. Hypusine is derived from lysine by the post-translational addition of a butylamino group (from spermidine) to the ε-amino group of lysine. The hypusine group is essential to the function of eIF-5A. A hypusine-containing protein has been found in archaebacteria such as Sulfolobus acidocaldarius or Methanocaldococcus jannaschii (Methanococcus jannaschii); this protein is highly similar to eIF-5A and could play a similar role in protein biosynthesis. The signature for eIF-5A is centred on the hypusine residue. The crystal structure of IF-5A from the archaeon Pyrobaculum aerophilum has been determined to 1.75 A. Unmodified P. aerophilum IF-5A is found to be a beta structure with two domains and three separate hydrophobic cores. The lysine (Lys42) that is post-translationally modified by deoxyhypusine synthase is found at one end of the IF-5A molecule in a turn between beta strands beta4 and beta5; this lysine residue is freely solvent accessible. The C-terminal domain is found to be homologous to the cold-shock protein CspA of Escherichia coli, which has a well characterised RNA-binding fold, suggesting that IF-5A is involved in RNA binding [].
Protein Domain
Type: Family
Description: The zona occuldens proteins (ZO-1, ZO-2 and ZO-3) are a family of tight junction associated proteins that function as cross-linkers, anchoring the TJ strand proteins to the actin-based cytoskeleton []. Each protein contains three PDZ (postsynaptic density, disc-large, ZO-1) domains, a single SH3 (Src Homology-3) domain and a catalytically inactive GK (guanylate kinase) domain, the presence of which identifies them as members of the membrane-associated guanylate kinase (MAGUK) protein family. The signature PDZ-SH3-GuK tandem of MAGUKs may form a structural supramodule with three domains interacting with each other to assemble into an integral structural unit [, ]. They also share an acidic domain at the C-terminal region of the molecules not found in other MAGUK proteins. It has been demonstrated that the first PDZ domain is involved in binding the C-terminal -Y-V motif of claudins []. By contrast, the occludin-binding domain of ZO-1 has been shown to lie in the GK and acidic domains []. Although the precise location of the actin-binding motif has not been elucidated, it appears to be within the C-terminal half of the molecules, since transfection of this region into fibroblasts induces co-localisation of ZO-1 and ZO-2 with actin fibres.This entry represents ZO-1, which was first identified as a 220kDa antigen for a monoclonal antibody raised to junction-enriched cell fractions []. The protein shares ~65% overall similarity with ZO-2 and ZO-3 proteins, with highest levels of similarity in the MAGUK and acid domains. The structure of ZO-1 is distinct from the other ZO protein family members in that it contains a ZU5 domain at the C-terminal end of the molecule, although the function of this domain is unknown. Binding and tranfection studies indicate that ZO-1 is capable of associating with ZO-2 and ZO-3 through binding of the second PDZ domains [].
Protein Domain
Type: Family
Description: Peptidase family M49 contains exopeptidases that remove dipeptides from the N terminus of peptides and proteins and are known as dipeptidyl-peptidases (DPP). The best characterized of these is dipeptidyl-peptidase III and represents the prototype for the M49 family of metallopeptidases. It consists of two domains that form a wide cleft containing the catalytic metal ion (DPPIII; ; MEROPS identifier M49.001) []. The exopeptidases in M49 are metal-dependent, and bind a single zinc ion via the histidines in an HEXXXH motif, in which the distance between the histidines in one residue longer than in the HEXXH zinc-binding motif found in endopeptidases of clan MA. The importance of the histidines and the glutamic acid was identified by site-directed mutagenesis []. Some members of family M49, notably from bacteria such as Colweliaand plants possess the more usual HEXXH motif []. A third zinc ligand occurs within a motif that has been described as EECRAE []. DPPIII releases N-terminal dipeptides sequentially from peptides such as angiotensins II and III, Leu-enkephalin, prolactin and alpha-melanocyte-stimulating hormone, but tripeptides are poor substrates and polypeptides of more than ten residues are not cleaved [, ]. DPPIII is a soluble, cytosolic enzyme with a housekeeping role, but is elevated in retroplacental serum may participate in the increased angiotensin hydrolysis seen during pregnancy [].This family also includes Nudix hydrolase 3 (NUDT3) from plants, which is thought to hydrolyse nucleoside diphosphate derivatives because of the presence of a Nudix box. Isopentenyl diphosphate (IPP), a universal precursor for the biosynthesis of isoprenoid compounds, is hydrolysed; purine nucleotides such as 8-oxo-dATP are dephosphorylated; and the enzyme acts as a dipeptidyl-peptidase against dipeptidyl-2-arylamide substrates [].
Protein Domain
Type: Family
Description: This entry represents the Sirtuin family, class III subfamily. Proteins in this subfamily include the NAD-dependent protein deacylase sirtuin-5. Sirtuin-5 is the NAD-dependent lysine demalonylase and desuccinylase that specifically removes malonyl and succinyl groups on target proteins [].The sirtuin (also known as Sir2) family is broadly conserved from bacteria to human. Yeast Sir2 (silent mating-type information regulation 2),the founding member, was first isolated as part of the SIR complex required for maintaining a modified chromatin structure at telomeres. Sir2 functionsin transcriptional silencing, cell cycle progression, and chromosome stability []. Although most sirtuins in eukaryotic cells are located in the nucleus, others are cytoplasmic or mitochondrial.This family is divided into five classes (I-IV and U) on the basis of a phylogenetic analysis of 60 sirtuins from a wide array of organisms []. Class I and class IV are further divided into three and two subgroups, respectively. The U-class sirtuins are found only in Gram-positive bacteria []. The S. cerevisiae genome encodes five sirtuins, Sir2 and four additional proteins termed 'homologues of sir two' (Hst1p-Hst4p) []. The human genome encodes seven sirtuins, with representatives from classes I-IV [, ].Sirtuins are responsible for a newly classified chemical reaction, NAD-dependent protein deacetylation. The final products of the reaction are thedeacetylated peptide and an acetyl ADP-ribose []. In nuclear sirtuins this deacetylation reaction is mainly directed against histones acetylated lysines [].Sirtuins typically consist of two optional and highly variable N- and C-terminal domain (50-300 aa) and a conserved catalytic core domain (~250 aa). Mutagenesis experiments suggest that the N- and C-terminal regions help direct catalytic core domain to different targets [, ].The 3D-structure of an archaeal sirtuin in complex with NAD reveals that the protein consists of a large domain having a Rossmann fold and a small domain containing a three-stranded zinc ribbon motif. NAD is bound in a pocket between the two domains [].
Protein Domain
Type: Family
Description: Bicarbonate (HCO3-) transport mechanisms are the principal regulators of pH in animal cells. Such transport also plays a vital role in acid-base movements in the stomach, pancreas, intestine, kidney, reproductive organs and the central nervous system. Functional studies have suggested four different HCO3-transport modes. Anion exchanger proteins exchange HCO3-for Cl-in a reversible, electroneutral manner []. Na+/HCO3-co-transport proteins mediate the coupled movement of Na+and HCO3-across plasma membranes, often in an electrogenic manner []. Na+driven Cl-/HCO3-exchange and K+/HCO3-exchange activities have also been detected in certain cell types, although the molecular identities of the proteins responsible remain to be determined.Sequence analysis of the two families of HCO3-transporters that have been cloned to date (the anion exchangers and Na+/HCO3-co-transporters) reveals that they are homologous. This is not entirely unexpected, given that they both transport HCO3-and are inhibited by a class of pharmacological agents called disulphonic stilbenes []. They share around ~25-30% sequence identity, which is distributed along their entire sequence length, and have similar predicted membrane topologies, suggesting they have ~10 transmembrane (TM) domains.Na+/HCO3-co-transport proteins are involved in cellular HCO3-absorptionand secretion, and also with intracellular pH regulation. They mediate thecoupled movement of Na+and HCO3-across plasma membranes in most of thecell types so far investigated. A single HCO3-is transported together withone to three Na+; this transport mode is therefore often electrogenic. Inthe kidney, an electrogenic Na+/HCO3-co-transporter is the principalHCO3-transporter of the renal proximal tubule, and is responsible forreabsorption of more than 85% of the filtered load of HCO3-[]. Untilrecently, the molecular nature of these Na+/HCO3-co-transporters hadremained undiscovered, as initial attempts to clone them based on presumedhomology to Cl-/HCO3-(anion) exchangers had proved unsuccessful. Instead,an expression cloning strategy was successfully utilised to identify theNa+/HCO3-co-transporter from salamander kidney, an organ previously foundto possess electrogenic Na+/HCO3-co-transport activity []. At least 3 mammalian Na+/HCO3-co-transporters have since been cloned, withsimilar primary sequence lengths and putative membrance topologies. One ofthese has been found to be a kidney-specific isoform [], which isnear-identical (except for a varying N-terminal region) to a morewidely-distributed co-transporter cloned from pancreatic tissue [].
Protein Domain
Type: Family
Description: Wnt proteins constitute a large family of secreted molecules that are involved in intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1 (mouse) and wingless (Drosophila) []. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals []. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [, ].Wnt-mediated signalling is believed to proceed initially through binding to cell surface receptors of the frizzled family; the signal is subsequently transduced through several cytoplasmic components to B-catenin, which enters the nucleus and activates the transcription of several genes important indevelopment []. Several non-canonical Wnt signalling pathways have also been elucidated that act independently of B-catenin. Canonical and noncanonical Wnt signaling branches are highly interconnected, and cross-regulate each other [].Members of the Wnt gene family are defined by their sequence similarity to mouse Wnt-1 and Wingless in Drosophila. They encode proteins of ~350-400 residues in length, with orthologues identified in several, mostly vertebrate, species. Very little is known about the structure of Wnts as they are notoriously insoluble, but they share the following features characteristics of secretory proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines []that are probably involved in disulphide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters. Fifteen major Wnt gene families have been identified in vertebrates, with multiple subtypes within some classes.Expression of Wnt-16 is activated by E2A-Pbx1, a fusion protein resulting from a t(1;19) translocation that occurs in a large proportion of pediatric pre-B acute lymphoblastoid leukaemias (pre-B ALL) []. The Wnt-16 transcript, normally absent in bone marrow, was found to be highly expressed in the bone marrow of pre-B ALL patients.
Protein Domain
Type: Domain
Description: All eukaryotic cells are surrounded by a plasma membrane, and they alsocontain multiple membrane-based organelles and structures inside cells. Thusmembrane remodeling is likely to be important for most cellular activities anddevelopment. The Bin-Amphiphysin-Rvs (BAR) domain superfamily of proteins hasbeen found to play a major role in remodeling cellular membranes linked withorganelle biogenesis, membrane trafficking, cell division, cell morphology andcell migration. The BAR domain superfamily of proteins is evolutionarilyconserved with representative members present from yeast to man. Currentlythere are three distinct families of BAR domain proteins: classical BAR, F-BAR (FCH-BAR e.g., Fes/CIP4 homology BAR e.g., Toca-1) and I-BAR (inverse-BAR e.g., IRSp53). The classical BAR, F-BAR, and I-BAR domainsare structurally similar homodimeric modules with antiparallel arrangement ofmonomers [, ].The F-BAR domain is emerging as an important player in membrane remodelingpathways. F-BAR domain proteins couple membrane remodeling with actin dynamicsassociated with endocytic pathways and filopodium formation. F-BAR domaincontaining proteins can be categorized into five sub-families based on theirphylogeny which is consistent with the additional protein domains theypossess, for example, RhoGAP domains, Cdc42 binding sites,SH2 domains, SH3 domains and tyrosinekinase domains [].The N-terminal part (about one third) of the F-BAR domain was previouslycharacterised as an FCH (FER-CIP4 homology) domain. However, the region ofsequence similarity extends to an adjacent region with a coiled-coil (CC)structure. Hence, the F-BAR domain (FCH+CC, ~300 amino acids) has also beencalled extended FC (EFC) domain. The F-BAR domain plays a role in dimerizationand membrane phospholipid binding. It binds specifically to certain kinds oflipids and acts as a a dimeric membrane-binding curvature effector [, , ].The F-BAR domain is composed of five helices. Its structure is composed of ashort N-terminal helix, three long α-helices, and a short C-terminal helixfollowed by an extended peptide of 17 amino acids [, ].
Protein Domain
Type: Family
Description: Integrins are the major metazoan receptors for cell adhesion to extracellular matrix proteins and, in vertebrates, also play important roles in certain cell-cell adhesions, make transmembrane connections to the cytoskeleton and activate many intracellular signalling pathways [, ]. An integrin receptor is a heterodimer composed of alpha and beta subunits. Each subunit crosses the membrane once, with most of the polypeptide residing in the extracellular space, and has two short cytoplasmic domains. Some members of this family have EGF repeats at the C terminus and also have a vWA domain inserted within the integrin domain at the N terminus.Most integrins recognise relatively short peptide motifs, and in general require an acidic amino acid to be present. Ligand specificity depends upon both the alpha and beta subunits []. There are at least 18 types of alpha and 8 types of beta subunits recognised in humans []. Each alpha subunit tends to associate only with one type of beta subunit, but there are exceptions to this rule []. Each association of alpha and beta subunits has its own binding specificity and signalling properties. Many integrins require activation on the cell surface before they can bind ligands. Integrins frequently intercommunicate, and binding at one integrin receptor activate or inhibit another.Integrin Beta-7 was originally identified in leukocytes. It is 32 to 46% homologous to integrins 1 through 6, which are also found in leukocytes []. Like the other integrins, integrin beta-7 is involved in cell adhesion. With Integrin Alpha-4, Beta-7 forms a receptor that is essential for intestinal homing of effector/memory T cells [].
Protein Domain
Type: Family
Description: DNA carries the biological information that instructs cells how to existin an ordered fashion: accurate replication is thus one of the mostimportant events in the cell life cycle. This function is mediated byDNA-directed DNA-polymerases, which add nucleotide triphosphate (dNTP)residues to the 5'-end of the growing DNA chain, using a complementary DNA as template. Small RNA molecules are generally used as primers forchain elongation, although terminal proteins may also be used. Three motifs, A, B and C [], are seen to be conserved across all DNA-polymerases, with motifs A and C also seen in RNA- polymerases. They are centred on invariant residues, and their structural significance was implied from the Klenlow (Escherichia coli) structure: motif A contains a strictly-conserved aspartate at the junction of a β-strand and an α-helix; motif B contains an α-helix with positive charges; and motif C has a doublet of negative charges, located in a β-turn-beta secondary structure [].DNA polymerases () can be classified, on the basis of sequencesimilarity [, ], into at least four different groups: A, B, C and X. Members of family X are small (about 40 Kd) compared with other polymerases and encompass two distinct polymerase enzymes that have similar functionality: vertebrate polymerase beta (same as Saccharomyces cerevisiae pol 4), and terminal deoxynucleotidyl-transferase (TdT) (). The former functions in DNA repair, whilethe latter terminally adds single nucleotides to polydeoxynucleotide chains.Both enzymes catalyse addition of nucleotides in a distributive manner, i.e. theydissociate from the template-primer after addition of each nucleotide.DNA-polymerases show a degree of structural similarity with RNA-polymerases.This entry represents terminal deoxynucleotidyl-transferase (TdT).
Protein Domain
Type: Domain
Description: This entry represents the anthranilate synthase/para-aminobenzoate synthase domain, which share sequence similarity to the glutamine amidotransferase domain . Anthranilate synthase play a role in the tryptophan-biosynthetic pathway, while the para-aminobenzoate synthase is involved in the folate biosynthetic pathway. In at least one case, a single polypeptide from Bacillus subtilis was shown to have both functions. This entry contains proteins similar to para-aminobenzoate (PABA) synthase and ASase. These enzymes catalyze similar reactions and produce similar products, PABA and ortho-aminobenzoate (anthranilate). Each enzyme is composed of non-identical subunits: a glutamine amidotransferase subunit (component II) and a subunit that produces an aminobenzoate products (component I). ASase catalyses the synthesis of anthranilate from chorismate and glutamine and is a tetrameric protein comprising two copies each of components I and II. Component II of ASase belongs to the family of triad GTases which hydrolyze glutamine and transfer nascent ammonia between the active sites. In some bacteria, such as Escherichia coli, component II can be much larger than in other organisms, due to the presence of phosphoribosyl-anthranilate transferase (PRTase) activity. PRTase catalyses the second step in tryptophan biosynthesis and results in the addition of 5-phosphoribosyl-1-pyrophosphate to anthranilate to create N-5'-phosphoribosyl-anthranilate. In E.coli, the first step in the conversion of chorismate to PABA involves two proteins: PabA and PabB which co-operate to transfer the amide nitrogen of glutamine to chorismate forming 4-amino-4 deoxychorismate (ADC). PabA acts as a glutamine amidotransferase, supplying an amino group to PabB, which carries out the amination reaction. A third protein PabC then mediates elimination of pyruvate and aromatization to give PABA. Several organisms have bipartite proteins containing fused domains homologous to PabA and PabB commonly called PABA synthases. These hybrid PABA synthases may produce ADC and not PABA. [, , , , , ].
Protein Domain
Type: Homologous_superfamily
Description: This superfamily represents a structural domain consisting of segregated alpha and beta regions in 3-layers. Homologous domains with this structure are found in:3,4-dihydroxy-2-butanone 4-phosphate synthase () (DHBP synthase) (RibB) A family of eukaryotic and prokaryotic hypothetical proteins that includes YrdC and YciO from Escherichia coli and MTH1692 from the archaea Methanothermobacter thermautotrophicus (Methanobacterium thermoformicicum)DHBP synthase RibB catalyses the conversion of D-ribulose 5-phosphate to formate and 3,4-dihydroxy-2-butanone 4-phosphate, the latter serving as the biosynthetic precursor for the xylene ring of riboflavin []. In Photobacterium leiognathi, the riboflavin synthesis genes ribB (DHBP synthase), ribE (riboflavin synthase), ribH (lumazone synthase) and ribA (GTP cyclohydrolase II) all reside in the lux operon []. RibB is sometimes found as a bifunctional enzyme with GTP cyclohydrolase II that catalyses the first committed step in the biosynthesis of riboflavin (). No sequences with significant homology to DHBP synthase are found in the metazoa.The YrdC family of hypothetical proteins are widely distributed in eukaryotes and prokaryotes and occur as: (i) independent proteins, (ii) with C-terminal extensions, and (iii) as domains in larger proteins, some of which are implicated in regulation []. YrdC from Escherichia coli preferentially binds to double-stranded RNA and DNA. YrdC is predicted to be an rRNA maturation factor, as deletions in its gene lead to immature ribosomal 30S subunits and, consequently, fewer translating ribosomes []. Therefore, YrdC may function by keeping an rRNA structure needed for proper processing of 16S rRNA, especially at lower temperatures. Threonylcarbamoyl-AMP synthase (Sua5) is an example of a multi-domain protein that contains an N-terminal YrdC-like domain and a C-terminal Sua5 domain. Sua5 was identified in Saccharomyces cerevisiae (Baker's yeast) as a suppressor of a translation initiation defect in the cytochrome c gene and is required for formation of a threonylcarbamoyl group on adenosine at position 37 in tRNAs [, ]. HypF is involved in the synthesis of the active site of [NiFe]-hydrogenases [].
Protein Domain
Type: Family
Description: The homeobox is a 60-residue motif first identified in a number of Drosophila homeotic and segmentation proteins, but now known to be well-conserved in many other animals, including vertebrates [, ]. Proteins containing homeobox domains are likely to play an important role in development - most are known to be sequence-specific DNA-binding transcription factors. The domain binds DNA through a helix-turn-helix (HTH) structure.Many homeodomain-containing proteins have now been sequenced and, while the homeodomain flanking regions vary, characteristic conserved sequences upstream of the domain allow the proteins to be grouped into 3 subfamilies: the so-called antennapedia, engrailed and 'paired box' proteins. Antennapedia, which regulates the formation of leg structures in Drosophila, was one of the first homeotic genes studied and led to the discovery of the homeobox domain. Over expression of this gene in the wrong segment of the fruit fly can lead to the formation of leg structures in these segments. For example, over expression in the head segment can lead to the formation of legs instead of antennae (hence the name antennapedia). The sequences of the antennapedia proteins contain a conserved hexapeptide 5-16 residues upstream of the homeobox, the specific function of which is unclear. The six Drosophila proteins that belong to this group are antennapedia (Antp), abdominal-A (abd-A), deformed (Dfd), proboscipedia (pb), sex combs reduced (scr) and ultrabithorax (ubx) and are collectively known as the 'antennapedia' subfamily.In vertebrates the corresponding Hox genes are known []as Hox-A2, A3, A4, A5, A6, A7, Hox-B1, B2, B3, B4, B5, B6, B7, B8, Hox-C4, C5, C6, C8, Hox-D1, D3, D4 and D8.Caenorhabditis elegans lin-39 and mab-5 are also members of the 'antennapedia' subfamily.Arg and Lys are most frequently found in the last position of the hexapeptide; other amino acids are found in only a few cases.
Protein Domain
Type: Homologous_superfamily
Description: This superfamily previously known as ApbE (thought to be involved in thiamine biosynthesis, alternatively known as Mg2+-dependent flavin transferase that has a role in catalysing the covalent attachment of FMN to a threonine residue in bacterial flavoproteins []) was renamed to Ftp (flavin-trafficking protein) in 2013 due to the characterisation of TP0796, a lipoprotein from T. pallidum, which is classified as a putative member of the ApbE superfamily ().FAD pyrophosphatase () catalyses the hydrolysis of FAD, forming AMP and FMN. To date, the Ftp (TP0796) of T. pallidum is the first bacterial FAD pyrophosphatase shown to have a strict requirement for Mg2+ for its catalytic activity. Other Ftp homologs (formerly known as ApbE proteins) are present in the genomes of numerous bacteria and in lower eukaryotes, such as Trypanosoma spp. (agents of sleeping sickness and Chagas disease) and Leishmania spp. (agent of leishmaniasis), but the eukaryotic homologs appear to be fused with a multidomain fumarate reductase.Other members of the ApbE superfamily are related to the periplasmic ApbE lipoprotein of Salmonella typhimurium. In S. typhimurium, ApbE has been shown to be involved in thiamine biosynthesis and may serve in the conversion of aminoimidazole ribotide to 4-amino-5-hydroxymethyl-2-methylpyrimidine. T. pallidum is predicted to lack the thiamine pathway as well as the enzymes involved in aminoimidazole ribotide metabolism. The periplasmic location of ApbE prompts questions concerning how it could participate in a cytoplasmic pathway. ApbE proteins are relatively understudied biochemically, and representative crystal structures (PDB entries and ) have failed to definitively elucidate their functions.Some studies have shown that some of the Ftp family proteins bind FAD and that the Ftp protein from Vibrio harveyi transfers the FMN portion of FAD to a subunit of the integral inner membrane Nqr redox pump. The crystal structure of Ftp from T. pallidum displays a highly conserved Ftp fold and an active site/FAD-binding site of all known Ftp-like proteins.
Protein Domain
Type: Family
Description: Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds [, ]. An empirical classification into three classes has been proposed by Fowler and coworkers []and Kojima []. Members of class I are defined to include polypeptides related in the positions of their cysteines to equine MT-1B, and include mammalian MTs as well as from crustaceans and molluscs. Class II groups MTs from a variety of species, including sea urchins,fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units [].This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Subsequently, a new classification was proposed on the basis of sequence similarity derived from phylogenetic relationships, which basically proposes an MT family for each main taxonomic group of organisms []. Mollusc MTs are 64-75 residue proteins. They usually contain 18-23 Cys, at least 13 of them are totally conserved. The protein sequence is divided into two structural domains. The Cys residues are arranged in C-X-C groups, and a C-X-X-C grouping is also observed. In particular, the consensus pattern C-x-C-x(3)-C-T-G-x(3)-C-x-C-x(3)-C-x-C-K has been shown to be diagnostic of family 2 metallothioneins. MTs locate at the C terminus of the sequence. These proteins show more similarily to the vertebrate metallothioneins than to those from other invertebrate phyla [], and on this basis they are classified as class I metallothioneins. The protein is induced by cadmium and binds divalent cations of several transition elements, including cadmium, zinc and copper. Family 2 includes subfamilies: mo1, mo2, mog, mo, which hit the same entry, except the subfamily mog.
Protein Domain
Type: Family
Description: Neurotransmitter ligand-gated ion channels are transmembrane receptor-ion channel complexes that open transiently upon binding of specific ligands, allowing rapid transmission of signals at chemical synapses [, ]. Five of these ion channel receptor families have been shown to form a sequence-related superfamily:Nicotinic acetylcholine receptor (AchR), an excitatory cation channel in vertebrates and invertebrates; in vertebrate motor endplates it is composed of alpha, beta, gamma and delta/epsilon subunits; in neurons it is composed of alpha and non-alpha (or beta) subunits [].Glycine receptor, an inhibitory chloride ion channel composed of alpha and beta subunits [].Gamma-aminobutyric acid (GABA) receptor, an inhibitory chloride ion channel; at least four types of subunits (alpha, beta, gamma and delta) are known [].Serotonin 5HT3 receptor, of which there are seven major types (5HT3-5HT7) [].Glutamate receptor, an excitatory cation channel of which at least three types have been described (kainate, N-methyl-D-aspartate (NMDA) and quisqualate) [].These receptors possess a pentameric structure (made up of varying subunits), surrounding a central pore. All known sequences of subunits from neurotransmitter-gated ion-channels are structurally related. They are composed of a large extracellular glycosylated N-terminal ligand-binding domain, followed by three hydrophobic transmembrane regions which form the ionic channel, followed by an intracellular region of variable length. A fourth hydrophobic region is found at the C-terminal of the sequence [, ].Glutamate is classically thought to be a stimulatory neurotransmitter, however, studies in invertebrates have proven that glutamate also functions as an inhibitory ligand. The bulk of studies conducted in vivo have been on insects and crustaceans, where glutamate was first postulated to act on H-receptors resulting in a hyperpolarizing response to glutamate. Glutamate-gated chloride channels have been cloned from several nematodes and Drosophila [].
Protein Domain
Type: Homologous_superfamily
Description: This domain occurs in a family of phage (and bacteriocin) proteins related to the phage P2 V gene product, which forms the small spike at the tip of the tail []. Homologs in general are annotated as baseplate assembly protein V. At least one member is encoded within a region of Pectobacterium carotovorum (Erwinia carotovora) described as a bacteriocin, a phage tail-derived module able to kill bacteria closely related to the host strain.It is also found in Vgr-related proteins. Genes encoding type VI secretion systems (T6SS) are widely distributed in pathogenic Gram-negative bacterial species. In Vibrio cholerae, T6SS have been found to secrete three related proteins extracellularly, VgrG-1, VgrG-2, and VgrG-3. VgrG-1 can covalently cross-link actin in vitro, and this activity was used to demonstrate that V. cholerae can translocate VgrG-1 into macrophages by a T6SS-dependent mechanism. VgrG-related proteins likely assemble into a trimeric complex that is analogous to that formed by the two trimeric proteins gp27 and gp5 that make up the baseplate "tail spike"of Escherichia coli bacteriophage T4. The VgrG components of the T6SS apparatus might assemble a "cell-puncturing device"analogous to phage tail spikes to deliver effector protein domains through membranes of target host cells [].Gp5 is an integral component of the virion baseplate of bacteriophage T4. T4 Gp5 consists of 3 domains connected via long linkers: the N-terminal oligosaccharide/oligonucleotide-binding (OB)-fold domain, the middle lysozyme domain, and the C-terminal triplestranded-helix. The equivalent of the Gp5 OB-fold domain in the structure of VgrG is the domain of unknown function comprising residues 380-470 and conserved in all known VgrGs. This entry represents the OB-fold domain which consists of a 5-stranded antiparallel-barrel with a Greek-key topology [].
Protein Domain
Type: Family
Description: A number of polypeptidic hormones, mainly expressed in the intestine or the pancreas, belong to a group of structurally related peptides [, ]. Once such hormone, glucagon is widely distributed and produced in the alpha-cells of pancreatic islets []. It affects glucose metabolism in the liver []by inhibiting glycogen synthesis, stimulating glycogenolysis and enhancing gluconeogenesis. It also increases mobilisation of glucose, free fatty acids and ketone bodies which are metabolites produced in excess in diabetes mellitus. Glucagon is produced, like other peptide hormones, as part of a larger precursor (preproglucagon) which is cleaved to produce glucagon, glucagon-like protein I and glucagon-like protein II []. The structure of glucagon itself is fully conserved in all known mammalian species []. Other members of the structurally similar group include glicentin precursor, secretin, gastric inhibitory protein, vasoactive intestinal peptide (VIP), prealbumin, peptide HI-27 and growth hormone releasing factor.Pituitary adenylate cyclase-activating polypeptide (PACAP) is a bioactive peptide that was originally isolated from ovine hypothalamus on the basis of its ability to stimulate adenylate cyclase in rat anterior pituitary cell cultures. It is a neuropeptide of the vasoactive intestinal peptide/secretin/glucagon superfamily. Studies in two related patients with a partial trisomy 18p revealed three copies of the PACAP gene and elevated PACAP concentrations in plasma []. PACAP appears to function as an emergency response co-transmitter in the sympathoadrenal axis, where the primary secretory response is controlled by a classical neurotransmitter but sustained under paraphysiological conditions by a neuropeptide[].Vasoactive intestinal peptide (VIP), a 28-amino acid peptide originally isolated from porcine duodenum, is present not only in gastrointestinal tissues but also in neural tissues, possibly as a neurotransmitter, and exhibits a wide variety of biologic actions []. Two principal groups of receptors orthologous with human PAC1R and VPAC1R and were identified and characterised at the genomic level in the fish Fugu rubripes (Japanese pufferfish).
Protein Domain
Type: Family
Description: This entry represents the ADAM-TS1 family of metallopeptidases that belong to MEROPS peptidase family M12, subfamily M12B: adamalysin (clan MA).Proteolysis of the extracellular matrix plays a critical role in establishing tissue architecture during development and in tissue degradation in diseases such as cancer, arthritis, Alzheimer's disease and a variety of inflammatory conditions []. The proteolytic enzymes responsible for this process are members of diverse protease families, including the secreted zinc metalloproteases (MPs) []. Recently, a new MP family, ADAM-TS (a disintegrin-like and metalloprotease domain with thrombospondin type I modules) has been identified. The family consists of at least 20 members that share a high degree of sequence similarity and conserved domain organisation [, ]. The defining domains of the ADAM-TS family are (from N- to C-termini) a pre-pro metalloprotease domain of the reprolysin type, a snake venom disintegrin-like domain, a thrombospondin type-I (TS) module, a cysteine-rich region, and a cysteine-free (spacer) domain []. Domain organisation following the spacer domain C terminus shows some variability in certain ADAM-TS members, principally in the number of additional TS domains. Members of the ADAM-TS family have been implicated in a range of diseases. ADAM-TS1, for example, is reported to be involved in inflammation and cancer cachexia [], whilst recessively inherited ADAM-TS2 mutations cause Ehlers-Danlos syndrome type VIIC, a disorder characterised clinically by severe skin fragility []. ADAM-TS4 is an aggrecanase involved in arthritic destruction of cartilage []. ADAM-TS1 was originally cloned in mice []. Human and rat orthologues have also been identified. Expression of ADAMTS-1 is closely associated with acute inflammation [].
Protein Domain
Type: Family
Description: This entry represents members of the ADAM-TS5 family of metallopeptidases that belong to MEROPS peptidase family M12, subfamily M12B: adamalysin (clan MA), M12.225.Proteolysis of the extracellular matrix plays a critical role in establishing tissue architecture during development and in tissue degradation in diseases such as cancer, arthritis, Alzheimer's disease and a variety of inflammatory conditions []. The proteolytic enzymes responsible for this process are members of diverse protease families, including the secreted zinc metalloproteases (MPs) []. Recently, a new MP family, ADAM-TS (a disintegrin-like and metalloprotease domain with thrombospondin type I modules) has been identified. The family consists of at least 20 members that share a high degree of sequence similarity and conserved domain organisation [, ]. The defining domains of the ADAM-TS family are (from N- to C-termini) a pre-pro metalloprotease domain of the reprolysin type, a snake venom disintegrin-like domain, a thrombospondin type-I (TS) module, a cysteine-rich region, and a cysteine-free (spacer)domain []. Domain organisation following the spacer domain C terminus shows some variability in certain ADAM-TS members, principally in the number of additional TS domains. Members of the ADAM-TS family have been implicated in a range of diseases. ADAM-TS1, for example, is reported to be involved in inflammation and cancer cachexia [], whilst recessively inherited ADAM-TS2 mutations cause Ehlers-Danlos syndrome type VIIC, a disorder characterised clinically by severe skin fragility []. ADAM-TS4 is an aggrecanase involved in arthritic destruction of cartilage []. ADAM-TS5, also termed aggrecanase 2, was identified through expressed sequence tag database searching, pursuing sequences similar to ADAM-TS1-4 []. In vitro studies have shown that ADAM-TS5, like ADAM-TS4, is an aggrecanase able to cleave cartilage aggrecan [].
Protein Domain
Type: Family
Description: This entry contains members of the ADAM-TS8 family of metallopeptidases that belong to MEROPS peptidase family M12, subfamily M12B: adamalysin (clan MA).Proteolysis of the extracellular matrix plays a critical role in establishing tissue architecture during development and in tissue degradation in diseases such as cancer, arthritis, Alzheimer's disease and a variety of inflammatory conditions []. The proteolytic enzymes responsible for this process are members of diverse protease families, including the secreted zinc metalloproteases (MPs) []. Recently, a new MP family, ADAM-TS (a disintegrin-like and metalloprotease domain with thrombospondin type I modules) has been identified. The family consists of at least 20 members that share a high degree of sequence similarity and conserved domain organisation [, ]. The defining domains of the ADAM-TS family are (from N- to C-termini) a pre-pro metalloprotease domain of the reprolysin type, a snake venom disintegrin-like domain, a thrombospondin type-I (TS) module, a cysteine-rich region, and a cysteine-free (spacer) domain []. Domain organisation following the spacer domain C terminus shows some variability in certain ADAM-TS members, principally in the number of additional TS domains. Members of the ADAM-TS family have been implicated in a range of diseases. ADAM-TS1, for example, is reported to be involved in inflammation and cancer cachexia [], whilst recessively inherited ADAM-TS2 mutations cause Ehlers-Danlos syndrome type VIIC, a disorder characterised clinically by severe skin fragility []. ADAM-TS4 is an aggrecanase involved in arthritic destruction of cartilage []. ADAM-TS8, also termed METH2, was identified by searching expressed sequence tag databases for sequences that contained TS modules []. In vitro studies have shown recombinant ADAM-TS8 to be effective in blocking angiogenesis, and to inhibit endothelial cell growth [].
Protein Domain
Type: Family
Description: Transcriptional activation and repression is required for control of cell proliferation and differentiation during embryonic development and homeostasis in the adult organism. Perturbations of these processes can lead to the development of cancer []. The Eight-Twenty-One (ETO) gene product is able to form complexes with corepressors and deacetylases, such as nuclear receptor corepressor (N-CoR), which repress transcription when recruited by transcription factors []. The ETO gene derives its name from its association with many cases of acute myelogenous leukaemia (AML), in which a reciprocal translocation, t(8;21), brings together a large portion of the ETO gene from chromosome eight and part of the AML1 gene from chromosome 21. The human ETO gene family currently comprises three major subfamilies: ETO/myeloid transforming gene on chromosome 8 (MTG8); myeloid transforming gene related protein-1 (MTGR1) and myeloid transforming gene on chromosome 16 (MTG16). ETO proteins are composed of four evolutionarily conserved domains termed nervy homology regions (NHR) 1-4. NHR1 is thought to stabilise the formation of high molecular weight complexes, but is not directly responsible for repressor activity. NHR2 and its flanking sequence comprise the core repressor domain, which mediates 50% of the wild type repressor activity. Furthermore, there is evidence that the amphipathic helical structure of NHR2 promotes the formation of ETO/AML1 homodimers []. NHR3 and NHR4 have been shown to act in concert to bind N-CoR. NHR4 contains two zinc finger motifs, which are thought to play a role in protein interactions rather than DNA binding []. Screening of dbEST with the entire ETO cDNA sequence revealed a number of ESTs showing significant similarity to the query sequence. Of those identified, two overlapping clones were sequenced, revealing an ORF coding for a putative 575 amino acid protein. This was subsequently mapped to chromosome 20 and named EHT (ETO Homologous on chromosome Twenty), and later as MTGR1 [].
Protein Domain
Type: Family
Description: Two closely related neuropeptide precursors, which share no significant sequence similarity with other known neuropeptides, have recently been identified and named preproneuropeptide B and preproneuropeptide W [, , ]. In humans, each precursor contains a signal sequence and two dibasic cleavage sites. Alternative cleavage of these sites results in long (29 or 30 amino acid) and short (23 amino acid) forms of the resultant neuropeptides [, ]. Murine, rat and bovine versions of preproneuropeptide B, however, contain only the second cleavage site, resulting in only the long form of neuropeptide B [, ]. Neuropeptide B is expressed in both the central nervous system (CNS) and in the periphery. In the CNS, the highest levels of the peptide are found in the substantia nigra and hypothalamus, suggesting a possible role in locomotor control and the release of pituitary hormones []. In the periphery, the peptide is most abundant in testis, ovary, uterus, placenta, spleen, lymph nodes and peripheral blood leukocytes, indicating potential roles in the reproductive and immune systems []. Unusually, neuropeptide B purified from bovine hypothalamus was found to be brominated at its N terminus []. Neuropeptide W has a more limited distribution in the brain than neuropeptide B, and is found at highest levels in the substantia nigra, again suggesting an involvement in locomotor control []. In the periphery, neuropeptide W is more widespread than neuropeptide B. In addition to the reproductive and immune tissues in which neuropeptide B is expressed, neuropeptide W has been found at high levels in the liver, stomach and trachea []. Intracerebroventricular administration of neuropeptide W in rats has been reported to produce an acute increase in food intake and to stimulate prolactin release []. This entry represents the neuropeptide B/W precursor family.
Protein Domain
Type: Conserved_site
Description: Gastrin and cholecystokinin (CCK) are structurally and functionally related peptide hormones that function as hormonal regulators of various digestive processes and feeding behaviours. They are known to induce gastric secretion, stimulate pancreatic secretion, increase blood circulation and water secretion in the stomach and intestine, and stimulate smooth muscle contraction. Originally found in the gut, these hormones have since been shown to be present in various parts of the nervous system. Like many other active peptides they are synthesized as larger protein precursors that are enzymatically converted to their mature forms. They are found in several molecular forms due to tissue-specific post-translational processing. A number of other peptides are known to belong to the same family: Caerulein, an amphibian skin peptide, with a biological activity similar to that of CCK or gastrin. There are different types of caerulein []in which a single or up to four copies of the peptide are present. Leukosulfakinin I and II (LSK) [, ]are peptides, isolated from cockroach, that change the frequency and amplitude of contractions of the hindgut. Drosulfakinins I and II []are putative CCK-homologues from Drosophila. Those two peptides are part of a precursor sequence that was isolated using a probe based on the sequence of CCK and LSK. A chicken antrum peptide []which is a potent stimulus of avian gastric acid but not of pancreatic secretion. Cionin [], a neuropeptide from the protochordate Ciona intestinalis (Transparent sea squirt). The biological activity of gastrin and CCK is associated with the last five C-terminal residues. One or two positions downstream, there is a conserved sulphated tyrosine residue.
Protein Domain
Type: Family
Description: This group of metallopeptidases belong to MEROPS peptidase family M32 (carboxypeptidase Taq family, clan MA(E)). The predicted active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH. Carboxypeptidase Taq (TaqCP) is a zinc-containing thermostable metallopeptidase [, ]. It was originally discovered and purified from Thermus aquaticus; optimal enzymatic activity occurs at 80 Celsius []. This family also includes Pyrococcus furiosus thermostable carboxypeptidase (PfuCP) []and carboxypeptidase 1 from Bacillus subtilis []. Over 70 metallopeptidase families have been identified to date. In these enzymes a divalent cation which is usually zinc, but may be cobalt, manganese or copper, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. In some families of co-catalytic metallopeptidases, two metal ions are observed in crystal structures ligated by five amino acids, with one amino acid ligating both metal ions. The known metal ligands are His, Glu, Asp or Lys. At least one other residue is required for catalysis, which may playan electrophillic role. Many metalloproteases contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases [].
Protein Domain
Type: Family
Description: Haptoglobin is a plasma protein that binds haemoglobin. The resulting complex is too large to be excreted by the kidney, thereby preventing loss of iron and damage to the kidney. The haptoglobin-haemoglobin complex is degraded in the liver, which is also the site of haptoglobin synthesis. The mature haptoglobin molecule is a tetramer, consisting of two alpha and two beta chains. Alpha and beta chains arise from proteolytic processing of the same precursor. Each beta chain can bind an α-β heterodimer of haemoglobin so that each haptoglobin tetramer binds one haemoglobin tetramer. In Pan troglodytes (Chimpanzee), haptoglobin genes form a small multigene family of three genes: HP, HPR (haptoglobin-related protein, which may be non-functional), and HPP (haptoglobin-primate) []. In contrast, most humans have a two-gene cluster due to an unequal homologous crossover event between HPR and HPP in the human lineage []. Such events may be common among these closely related genes as Macaca mulatta (Rhesus macaque) was found to have haplotypes of one- or two-gene clusters that appear to have formed from unequal crossover among an ancestral three-gene cluster []. The haptoglobin precursor contains a signal sequence, a sushi domain (in mature alpha chain), and a trypsin domain (in mature beta chain) which belongs to the MEROPS peptidase family S1 (clan PA(S)). Haptoglobins have no enzymatic activity as the active site residues typical of trypsin-related proteases are not conserved, they are therefore classed as non-peptidase homologues. A common allelic variant in humans contains two sushi domains.Uncleaved haptoglobin is also known as zonulin, and plays a role in intestinal permeability [].
Protein Domain
Type: Family
Description: Tropoelastin is the precursor to the elastin molecule. Elastin aggregatesare responsible for the stretch properties of skin, arterial walls andligaments, and elastin is implicated in several hereditary diseases,including cutis laxa (where the elasticity of the skin is lost) andelastoderma (similar to cutis laxa but with grape-like accumulations ofelastin in the dermis). The unusual and highly characteristic amino acidcomposition of this protein accounts for its great hydrophobicity. It contains one-third glycine amino acids and several lysine derivatives that serve as covalentcross-links between protein monomers. Elastin is thus a three-dimensional network with 60-70 amino acids between two cross-linking points. This moleculararchitecture is determinant for its elastic properties, insolubility and resistance to proteolysis.Normally, the elastin gene contains 36 exons, and this structure allows theformation of stable isoforms by alternative splicing. The 3-dimensionalstructureof elastin is currently unknown and was originally thought to bean amorphous polymer. This is consistent with the theory of rubberelasticity, which requires the resting state of the protein to be ofhigher disorder (entropy) than the extended state [].More recent studies show the presence of helical and other secondarystructures [], and the elasticity theory has been amended to involve, inthe resting state, secondary structure elements in chaotic motion. In theextended state of the protein, the secondary structures align to form anordered structure together with neighbouring molecules [].Tropoelastin consists mainly of repetitive elements of four, five, six andnine hydrophobic residues []. The five, six and nine residue repeatsfunction as binding sites for fibroblasts during chemotaxis (the hexapeptideand nonapeptide repeats competing for the same receptor) []. Thehexapeptide repeat is also known to bind calcium ions.The formation of the elastin fibre is a complicated process, involving thebinding of a chaperone to the precursor to prevent aggregation in the cell,followed by migration out of the cell, whereupon the chaperonedisassociates. The tropoelastin molecules then cross-link to each otherusing deaminated lysine residues, the microfibril structures functioning asa scaffold [].
Protein Domain
Type: Family
Description: Uridylate kinases (also known as UMP kinases) are key enzymes in the synthesis of nucleoside triphosphates. They catalyse the reversible transfer of the gamma-phosphoryl group from an ATP donor to UMP, yielding UDP, which is the starting point for the synthesis of all other pyrimidine nucleotides. The eukaryotic enzyme has a dual specificity, phosphorylating both UMP and CMP, while the bacterial enzyme is specific to UMP. The bacterial enzyme shows no sequence similarity to the eukaryotic enzyme or other nucleoside monophosphate kinases, but rather appears to be part of the amino acid kinase family. It is dependent on magnesium for activity and is activated by GTP and repressed by UTP [, ]. In many bacterial genomes, the gene tends to be located immediately downstream of elongation factor T and upstream of ribosome recycling factor. A related protein family, believed to be equivalent in function is found in the archaea and in spirochetes.Structurally, the bacterial and archaeal proteins are homohexamers centred around a hollow nucleus and organised as a trimer of dimers [, ]. Each monomer within the protein forms the amino acid kinase fold and can be divided into an N-terminal region which binds UMP and mediates intersubunit interactions within the dimer, and a C-terminal region which binds ATP and contains a mobile loop covering the active site. Inhibition of enzyme activity by UTP appears to be due to competition for the binding site for UMP, not allosteric inhibition as was previously suspected.Uridylate kinase PUMPKIN, chloroplastic from Arabidopsis thaliana is essential for retaining photosynthetic activity in chloroplasts as it is required for specific post-transcriptional processes of many plastid transcripts [, ]. This entry represents uridine monophosphate kinase predominantly found in bacteria and plant chloroplasts.
Protein Domain
Type: Conserved_site
Description: The homeobox is a 60-residue motif first identified in a number of Drosophila homeotic and segmentation proteins, but now known to be well-conserved in many other animals, including vertebrates [, ]. Proteins containing homeobox domains are likely to play an important role in development - most are known to besequence-specific DNA-binding transcription factors. The domain binds DNA through a helix-turn-helix (HTH) structure.Many homeodomain-containing proteins have now been sequenced and, while the homeodomain flanking regions vary, characteristic conserved sequences upstream of the domain allow the proteins to be grouped into 3 subfamilies: the so-called antennapedia, engrailed and 'paired box' proteins. Antennapedia, which regulates the formation of leg structures in Drosophila, was one of the first homeotic genes studied and led to the discovery of the homeobox domain. Over expression of this gene in the wrong segment of the fruit fly can lead to the formation of leg structures in these segments. For example, over expression in the head segment can lead to the formation of legs instead of antennae (hence the name antennapedia). The sequences of the antennapedia proteins contain a conserved hexapeptide 5-16 residues upstream of the homeobox, the specific function of which is unclear. The six Drosophila proteins that belong to this group are antennapedia (Antp), abdominal-A (abd-A), deformed (Dfd), proboscipedia (pb), sex combs reduced (scr) and ultrabithorax (ubx) and are collectively known as the 'antennapedia' subfamily.In vertebrates the corresponding Hox genes are known []as Hox-A2, A3, A4, A5, A6, A7, Hox-B1, B2, B3, B4, B5, B6, B7, B8, Hox-C4, C5, C6, C8, Hox-D1, D3, D4 and D8.Caenorhabditis elegans lin-39 and mab-5 are also members of the 'antennapedia' subfamily.Arg and Lys are most frequently found in the last position of the hexapeptide; other amino acids are found in only a few cases.
Protein Domain
Type: Family
Description: Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].This entry represents bacterial voltage-gated chloride channel of the ClcB type. ClcB probably acts as an electrical shunt for an outwardly-directed proton pump that is linked to amino acid decarboxylation, as part of the extreme acid resistance (XAR) response [].
Protein Domain
Type: Domain
Description: This entry represents the N-terminal domain of family members such as the Matrix (Mx) and Matrix protein long (ML) proteins. They are found in Thogoto virus (THOV), a tick-transmitted orthomyxovirus with a genome consisting of six single-stranded RNA segments that encode seven structural proteins []. Matrix proteins of the family Orthomyxoviridae are major structural components of the viral capsid, located below the viral lipid membrane and provide protection for viral ribonucleoproteins (vRNPs) []. They serve as a major participant during the processes of virus invasion and budding. Furthermore, they play specific roles throughout the viral life cycle, usually by interacting with other viral components or host cellular proteins [].ML protein, an extended version of the viral M protein, is a viral IFN antagonist. ML is essential for virus growth and pathogenesis in an IFN-competent host. In the presence of ML the activation and/or action of the interferon regulatory factor-3 (IRF-3) is severely affected. This effect depends on direct interaction of ML with the transcription factor IIB (TFIIB). ML suppresses IRF-7 in a similar manner as it suppresses IRF-3. Studies have revealed that ML associates with IRF-7 and prevents IRF-7 dimerization and interaction with TRAF6 [].Structural analysis revealed that N-terminal fragment of M protein (MN) undergoes conformational changes that result in specific, pH-dependent inter-molecular interactions. Comparison of THOV MN and influenza A virus (IAV) MN region, showed low sequence identity. However, superimposition of the two structures in neutral condition showed that both matrix proteins contain nine helices connected with same topology. Since the matrix layer of IAV disassembles in acidic endosome at the beginning of infection and repacks in the neutral cytoplasm, a change of pH might be a key regulator for the capsid assembly/disassembly transition during these processes. Hence, pH-dependent conformational transition model was studied in THOV MN, where interactions such as hydrogen bonds and hydrophobic interactions are suggested to be involved in THOV matrix assembly [].
Protein Domain
Type: Family
Description: Pro-adrenomedullin (Pro-ADM) is a 185 amino acid long protein. It consists of a pro-ADM N-terminal 20 peptide (PAMP), a midregional pro-ADM (MR-proADM), an adrenotensin and a glycine-extended 53-amino acid peptides. The latter is subsequently converted to the mature ADM consisting of 52 amino acid by enzymatic amidation [].Adrenomedullin (ADM) is a hypotensive peptide, and was first discovered in human pheochromocytoma, it belongs to the calcitonin gene-related peptide family. The first described effects were vasodilation and blood pressure lowering effects but later, other actions were discovered in health and disease, among them stabilisation and development of the endothelial barrier and immunoregulation. ADM is widely expressed in virtually all human tissues with highest levels in adrenal medulla, cardiac atria, and lungs. Many cells are capable of producing ADM, including ECs, vascular smooth muscle cells (VSMCs), monocytes, renal parenchymal cells, and macrophages. ADM exerts its effects by interaction of its C-terminal moiety with ADM1 and ADM2 receptors which are complexes consisting of the calcitonin receptor-like receptor (CRLR) combined with a specific receptor activity-modifying protein 2 and 3 (RAMP2 and RAMP3), respectively, mostly referred to as "ADM receptors". These receptors have been detected in various tissues and organs, such as blood vessels, skeletal muscles, heart, lungs, and nerve tissue [].Recently, ADM has been characterised as pronociceptive mediator, acting as an upstream factor in the transmission of noxious information for various types of pathological pain including acute and chronic inflammatory pain, cancer pain, neuropathic pain induced by spinal nerve injury and diabetic neuropathy. It may also have a role in nerve regeneration in pathological conditions [].
Protein Domain
Type: Conserved_site
Description: This entry represents 5-enolpyruvylshikimate-3-phosphate (EPSP) synthase (also known as 3-phosphoshikimate 1-carboxyvinyltransferase), catalyses the sixth step in the biosynthesis from chorismate of the aromatic amino acids (the shikimate pathway) in bacteria (gene aroA), plants and fungi (where it is part of a multifunctional enzyme which catalyses five consecutive steps in this pathway) []. The sixth step is the formation of EPSP and inorganic phosphate from shikimate-3-phosphate (S3P) and phosphoenolpyruvate (PEP).EPSP can use shikimate or shikimate-3-phosphate as a substrate. By binding shikimate, the backbone of the active site is changed, which affects the binding of glyphosate and renders the reaction insensitive to inhibition by glyphosate []. On isolation of the discontinuous C-terminal domain, it was found that it binds neither its substrate nor its inhibitor but maintains structural integrity [].Earlier studies suggested that the active site of the enzyme is in the cleft between its two globular domains. When the enzyme binds S3P, there is a conformational change in the isolated N-terminal domain []. The sequence of EPSP from various biological sources shows that the structure of the enzyme has been well conserved throughout evolution and since the shikimate pathway is not present in vertebrates but is essential for the life of plants, fungi and bacteria; it is commonly viewed as a target for weed killers and antimicrobial drug development.This entry represents two conserved regions as signature patterns. The first pattern corresponds to a region that is part of the active site and which is also important for the resistance to glyphosate []. The second pattern is located in the C-terminal part of the protein and contains a conserved lysine which seems to be important for the activity of the enzyme.
Protein Domain
Type: Homologous_superfamily
Description: The glycine-tyrosine-phenylalanine (GYF) domain is an around 60-amino acid domain which contains a conserved GP[YF]xxxx[MV]xxWxxx[GN]YF motif. It was identified in the human intracellular protein termed CD2 binding protein 2 (CD2BP2), which binds to a site containing two tandem PPPGHR segments within the cytoplasmic region of CD2. Binding experiments and mutational analyses have demonstrated the critical importance of the GYF tripeptide in ligand binding. A GYF domain is also found in several other eukaryotic proteins of unknown function []. It has been proposed that the GYF domain found in these proteins could also be involved in proline-rich sequence recognition [].Resolution of the structure of the CD2BP2 GYF domain by NMR spectroscopy revealed a compact domain with a β-β-α-β-beta topology, where the single α-helix is tilted away from the twisted, anti-parallel β-sheet. The conserved residues of the GYF domain create a contiguous patch of predominantly hydrophobic nature which forms an integral part of the ligand-binding site []. There is limited homology within the C-terminal 20-30 amino acids of various GYF domains, supporting the idea that this part of the domain is structurally but not functionally important [].This entry also matches Arabidopsis histone methyltransferases ATXR3/SDG2 and ATXR7/SDG25, which contain two partial GYF domains towards the N terminus []. Histone methyltransferase ATXR7 is involved in regulation of flowering time []. It is specifically required for the trimethylation of 'Lys-4' of histone H3 (H3K4me3) at the FLC locus, it prevents the trimethylation on 'Lys-27' (H3K27me3) at the same locus. ATXR3 is also required for H3K4 trimethylation and is crucial for both sporophyte and gametophyte development in plants [, ].