The EMI domain, first named after its presence in proteins of the EMILIN family, is a small cysteine-rich module of around 75 amino acids. The EMI domain is most often found at the N terminus of metazoan extracellular proteins that are forming or are compatible with multimer formation []. It is found in association with other domains, such as C1q, laminin-type EGF-like, collagen-like, FN3, WAP, ZP or FAS1 []. It has been suggested that the EMI domain could be a protein-protein interaction module, as the EMI domain of EMILIN-1 was found to interact with the C1q domain of EMILIN-2 []. The EMI domain possesses six highly conserved cysteines residues, which likely form disulphide bonds. Other key features of the EMI domain are the C-C-x-G-[WYFH]pattern, a hydrophobic position just preceding the first cysteine (Cys1) of the domain and a cluster of hydrophobic residues between Cys3 and Cys4. The EMI domain could be made of two sub-domains, the fold of the second one sharing similarities with the C-terminal sub-module characteristic of EGF-like domains []. Proteins known to contain a EMI domain include:Vertebrate Emilins, extracellular matrix glycoproteins.Vertebrate Multimerins, extracellular matrix glycoproteins. Vetebrate Emu proteins, which could interact with several different extracellular matrix components and serve to connect and integrate the function of multiple partner molecules. Vertebrate beta-IG-H3. Vertebrate osteoblast-specific factor 2 (OSF-2). Mammalian NEU1/NG3 proteins. Drosophila midline fasciclin. Caenorhabditis elegansced-1, a transmembrane receptor that mediates cell corpse engulfment. The Pfam alignment for this domain is truncated at the C terminus and does not include the final cysteine []. This is to stop the family overlapping with other domains.
Cytochrome c oxidase () is a key enzyme in aerobic metabolism. Proton pumping haem-copper oxidases represent the terminal, energy-transfer enzymes of respiratory chains in prokaryotes and eukaryotes. The CuB-haem a3 (or haem o) binuclear centre, associated with the largest subunit I of cytochrome c and ubiquinol oxidases (), is directly involved in the coupling between dioxygen reduction and proton pumping [, ].Some terminal oxidases generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner membrane (eukaryotes).The enzyme complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) of which only the catalytic subunit (equivalent to mammalian subunit I (CO I) is found in all haem-copper respiratory oxidases. The presence of a bimetallic centre (formed by a high-spin haem and copper B) as well as a low-spin haem, both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common to all family members [, , ]. In contrast to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The enzyme complexes vary in haem and copper composition, substrate type and substrate affinity. The different respiratory oxidases allow the cells to customize their respiratory systems according to a variety of environmental growth conditions []. It has been shown that eubacterial quinol oxidase was derived from cytochrome c oxidase in Gram-positive bacteria and that archaebacterial quinol oxidase has an independent origin. A considerable amount of evidence suggests that proteobacteria (Purple bacteria) acquired quinol oxidase through a lateral gene transfer from Gram-positive bacteria [].This entry represents a structural domain found in subunit I of cytochrome c oxidase as well as related proteins, including quinol oxidase.
Cytochrome c oxidase () is a key enzyme in aerobic metabolism. Proton pumping haem-copper oxidases represent the terminal, energy-transfer enzymes of respiratory chains in prokaryotes and eukaryotes. The CuB-haem a3 (or haem o) binuclear centre, associated with the largest subunit I of cytochrome c and ubiquinol oxidases (), is directly involved in the coupling between dioxygen reduction and proton pumping [, ].Some terminal oxidases generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner membrane (eukaryotes).The enzyme complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) of which only the catalytic subunit (equivalent to mammalian subunit I (CO I) is found in all haem-copper respiratory oxidases. The presence of a bimetallic centre (formed by a high-spin haem and copper B) as well as a low-spin haem, both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common to all family members [, , ]. In contrast to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The enzyme complexes vary in haem and copper composition, substrate type and substrate affinity. The different respiratory oxidases allow the cells to customize their respiratory systems according to a variety of environmental growth conditions []. It has been shown that eubacterial quinol oxidase was derived from cytochrome c oxidase in Gram-positive bacteria and that archaebacterial quinol oxidase has an independent origin. A considerable amount of evidence suggests that proteobacteria (Purple bacteria) acquired quinol oxidase through a lateral gene transfer from Gram-positive bacteria [].This entry represents a domain found in cytochrome c oxidase subunit I.
Cytochrome c oxidase () is a key enzyme in aerobic metabolism. Proton pumping haem-copper oxidases represent the terminal, energy-transfer enzymes of respiratory chains in prokaryotes and eukaryotes. The CuB-haem a3 (or haem o) binuclear centre, associated with the largest subunit I of cytochrome c and ubiquinol oxidases (), is directly involved in the coupling between dioxygen reduction and proton pumping [, ].Some terminal oxidases generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner membrane (eukaryotes).The enzyme complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) of which only the catalytic subunit (equivalent to mammalian subunit I (CO I) is found in all haem-copper respiratory oxidases. The presence of a bimetallic centre (formed by a high-spin haem and copper B) as well as a low-spin haem, both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common to all family members [, , ]. In contrast to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The enzyme complexes vary in haem and copper composition, substrate type and substrate affinity. The different respiratory oxidases allow the cells to customize their respiratory systems according to a variety of environmental growth conditions []. It has been shown that eubacterial quinol oxidase was derived from cytochrome c oxidase in Gram-positive bacteria and that archaebacterial quinol oxidase has an independent origin. A considerable amount of evidence suggests that proteobacteria (Purple bacteria) acquired quinol oxidase through a lateral gene transfer from Gram-positive bacteria [].This entry represents the copper-binding site of the cytochrome c oxidase subunit I. In particular, copper is ligated to three conserved histidine residues contained in this site [].
Proteins containing this domain are proteinase inhibitors belonging to MEROPS inhibitor family I19 (clan IW) and sharing a pacifastindomain of ~35 residues, which contains a characteristic pattern of sixconserved cysteine residues (C-x(9,12)-C-N-x-C-x-C-x(2,3)-G-x(3,6)-C-T-x(3)-C). The pacifastin domain consists of a twisted β-sheet composed of threeantiparallel strands and stabilised by an identical pattern (C1-C4, C2-C6,C3-C5) of disulfide bridges [, , , , , ]. Proteins containing this domain were first isolated from Locusta migratoria migratoria(migratory locust). These were HI, LMCI-1 (PMP-D2) and LMCI-2 (PMP-C) [, , ]; five additional members SGPI-1 to 5 were identified in Schistocerca gregaria (desert locust) [, ], and a heterodimeric serine protease inhibitor (pacifastin) was isolated from the hemolymph of Pacifastacus leniusculus (Signal crayfish) []. Pacifastin is a 155kDa composed of two covalently linked subunits, which are separately encoded. The heavy chain of pacifastin (105kDa) is related to transferrins, containing three transferrin lobes, two of which seem tobe active for iron binding []. A number of the members of the transferrin family are also serine peptidases belong to MEROPS peptidase family S60 (). The light chain of pacifastin (44kDa) is the proteinase inhibitory subunit, and has nine cysteine-rich inhibitory domains that are homologous to each other. The locust inhibitors share a conserved array of six cysteine residues with the pacifastin light chain. The structure of members of this family reveal that they are comprised of a triple-stranded antiparallel β-sheet connected by three disulphide bridges [].The biological function(s) of the locust inhibitors is (are) not fully understood. LMCI-1 and LMCI-2 were shown to inhibit the endogenous proteolytic activating cascade of prophenoloxidase []. Expression analysis shows that the genes encoding the SGPI precursors are differentially expressed in a time-, stage- and hormone-dependent manner.
Isochorismate pyruvate-lyase (IPL; PchB) catalyses the second reaction in the pyochelin biosynthetic pathway of Pseudomonas aeruginosa, conversion of isochorismate to salicylate plus pyruvate (following the initial PchA-dependent conversion of chorismate to isochorismate) []. This enzyme can also carry out the chorismate mutase (CM) reaction, but with a low catalytic efficiency. It is unlikely that PchB plays a significant role in aromatic amino acid biosynthesis. This enzyme is a stand-alone version of a chorismate mutase domain of the AroQ class.The three types of CM are AroQ class, prokaryotic type; AroQ class, eukaryotic type; and AroH class. They fall into two structural folds (AroQ class and AroH class) which are completely unrelated []. The two types of the AroQ structural class (the Escherichia coli CM dimer and the yeast CM monomer) can be structurally superimposed, and the topology of the four-helix bundle forming the active site is conserved [].The PchB-type of chorismate mutase domain, while sharing conserved residues and the predicted secondary structure with the other subgroups of the AroQ class (prokaryotic type), has isochorismate pyruvate-lyase (IPL) as a primary catalytic activity. PchB can still use the same active site either for the IPL or for the CM reaction. It has been suggested that PchB was derived from an AroQ-class CM by a gene duplication event followed by selection for efficient IPL function in the course of the evolution of the pyochelin siderophore pathway, with only residual CM activity remaining. It can be further speculated that contemporary CMs may already possess (weak) IPL activity [].For additional information please see [, , ].
Activator protein-2 (AP-2) transcription factors constitute a family of closely related and evolutionarily conserved proteins that bind to the DNA consensus sequence 5'-GCCNNNGGC-3' and stimulate target gene transcription [, ]. Five different isoforms of AP-2 have been identified in mammals, termed AP-2 alpha, beta, gamma, delta and epsilon. Each family member shares a common structure, possessing a proline/glutamine-rich domain in the N-terminal region, which is responsible for transcriptional activation [], and a helix-span-helix domain in the C-terminal region, which mediates dimerisation and site-specific DNA binding [].The AP-2 family have been shown to be critical regulators of gene expression during embryogenesis. They regulate the development of facial prominence and limb buds, and are essential for cranial closure and development of the lens [, ]; they have also been implicated in tumorigenesis. AP-2 protein expression levels have been found to affect cell transformation, tumour growth and metastasis, and may predict survival in some types of cancer [, ]. Mutations in human AP-2 have been linked with bronchio-occular-facial syndrome and Char Syndrome, congenital birth defects characterised by craniofacial deformities and patent ductus arteriosus, respectively []. AP-2 alpha was initially isolated from human HeLa cells []. The protein wasshown to bind to enhancer regions of the SV40 and human metallothionein IIA promoters, and to stimulate RNA synthesis []. AP-2 alpha gene knockout in mice causes neural-tube defects during embryogenesis, leading to craniofacialabnormalities and anencephaly []. In humans, deletion of chromosome 6region 6p24-p25, which includes the AP-2 alpha gene, is associated withmicrophthalmia, corneal clouding and a number of other dysmorphic features, including hypertelorism, micrognathia, dysplastic ears, thin limbs, and congenital cardiac defects.This entry represents the N-terminal region of these proteins, including the transcriptional activation domain.
The ~200 amino acid TBC/rab GTPase-activating protein (GAP) domain is well conserved across species and has been found in a wide range of different proteins from plant adhesion molecules to mammalian oncogenes. The name TBC derives from the name of the murine protein Tbc1 in which this domain was first identified based on its similarity to sequences in the tre-2 oncogene, and the yeast regulators of mitosis, BUB2 and cdc16 []. The connection of this domain with rab GTPase activation stems from subsequent in-depth sequence analyses and alignments []and recent work demonstrating that it appears to contain the catalytic activities of the yeast rab GAPs, GYP1, and GYP7 [].The TBC/rab GAP domain has also been named PTM after three proteins known to contain it: the Drosophila pollux, the human oncoprotein TRE17 (oncoTRE17), and a myeloid cell line-expressed protein []. The TBC/rab GAP domain contains six conserved motifs named A to F []. A conserved arginine residue in the sequence motif B has been shown to be critical for the full GAP activity []. Resolution of the 3D structure of the TBC/rab GAP domain of GYP1 has shown that it is a fully α-helical V-shaped molecule. The conserved arginine residue is positioned at the side of the narrow cleft on the concave site of the V-shaped molecule. It has been proposed that this cleft is the binding site for the GTPase. The conserved arginine residue probably functions as a catalytic arginine finger analogous to that seen in ras and Rho-GAPs. The two key features of the arginine finger activation mechanism appear to be (i) the positioning of the catalytically essential GTPase glutamine side chain via a hydrogen bonding interaction between the glutamine carbamoyl-NH2 group and the main chain carbonyl group of the GAP arginine, and (ii) the polarization of the gamma-phosphate group or the stabilization of charge on it via the interaction of the positively charged side chain guanidinoyl group of the GAP arginine [].
This entry represents a six transmembrane helix rhomboid domain. This entry also includes derlins, inactive members of the rhomboid family of intramembrane proteases which lack an active site Ser-His dyad but retain the overall rhomboid architecture [].This domain is found in serine peptidases belonging to the MEROPS peptidase family S54 (Rhomboid, clan ST). They are integral membrane proteins related to the Drosophila melanogaster (Fruit fly) rhomboid protein . Members of this family are found in archaea, bacteria and eukaryotes.The rhomboid protease cleaves type-1 transmembrane domains using a catalytic dyad composed of serine and histidine. The active site is embedded within the membrane and the active site residues are on different transmembrane regions. From the tertiary structure of the Escherichia coli homologue GlpG []it was shown that hydrolysis occurs in a fluid filled cavity within the membrane. Initially, a catalytic triad including a highly conserved asparagine had been proposed, but this residue has been shown not to be essential []. Drosophila rhomboid cleaves the transmembrane proteins Spitz, Gurken and Keren within their transmembrane domains to release a soluble TGFalpha-like growth factor. Cleavage occurs in the Golgi, following translocation of the substrates from the endoplasmic reticulum membrane by Star, another transmembrane protein. The growth factors are then able to activate the epidermal growth factor receptor [, ].Few substrates of mammalian rhomboid homologues have been determined, but rhomboid-like protein 2 has been shown to cleave ephrin B3 []. Parasite-encoded rhomboid enzymes are also important for invasion of host cells by Toxoplasma and the malaria parasite. Invasion of host cells first requires their recognition and this is achieved by parasite transmembrane adhesins interacting with host cell receptors. Before the parasite can enter a host cell the adhesins must be released by cleavage. In Toxoplasma rhomboid TgROM5 cleaves the adhesins, and in Plasmodium, which lacks a TgROM5 orthologue, PfROMs 1 and 4 cleave the diverse array of malaria parasite adhesins [].
The Ras association domain (RASSF) proteins are named due to the presence of a Ras association (RA) domain in their N or C terminus that can potentially interact with the Ras GTPase family of proteins. These GTPases control a variety of cellular processes, such as membrane trafficking, apoptosis, and proliferation. RASSF proteins contain several other functional domains that modulate associations with other proteins. RASSF proteins with the RA domain at the C terminus (which are termed C-terminal or classical RASSF) usually also include a Salvador-RASSF-Hippo (SARAH) domain involved in several protein-protein interactions and for homo- and heterodimerisation of RASSF isoforms. N-terminal RASSF proteins (with the RA domain in the N terminus) do not usually contain a SARAH domain [].At least 10 RASSF family members have been characterised (with multiple splice variants), many of which have been shown to play a role in tumour suppression. RASSF proteins also act as scaffolding agents in microtubule stability, regulate mitotic cell division, control cell migration and cell adhesion, and modulate NF-KB activity and the duration of inflammation. Loss of RASSF expression through promoter methylation has been shown in numerous types of cancer, including leukemia, melanoma, breast and prostate cancer [].RASSF7 is one of the N-terminal RASSF proteins, characterised by an RA domain in the N terminus. It was previously known as HRC1 (HRAS1-related cluster protein 1), and is predicted to exist as at least three isoforms as a result of alternative splicing []. RASSF7 has been shown to promote mitosis through the regulation of spindle formation []. There is conflicting evidence on the methylation of the RASSF7 promoter, and subsequently on the status of RASSF7 as a tumour suppressor [, ].
Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds [, ]. An empirical classification into three classes has been proposed by Fowler and coworkers []and Kojima []. Members of class I are defined to include polypeptides related in the positions of their cysteines to equine MT-1B, and include mammalian MTs as well as from crustaceans and molluscs. Class II groups MTs from a variety of species, including sea urchins,fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units [].This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Subsequently, a new classification was proposed on the basis of sequence similarity derived from phylogenetic relationships, which basically proposes an MT family for each main taxonomic group of organisms []. The members of family 1 are recognised by the sequence pattern K-x(1,2)-C-C-x-C-C-P-x(2)-C located at the beginning of the third exon. The taxonomic range of the members extends to vertebrates. Known characteristics: 60 to 68 AAs; 20 Cys (21 in one case), 19 of them are totally conserved; the protein sequence is divided into two structural domains, containing 9 and 11 Cys all binding 3 and 4 bivalent metal ions, respectively. The gene is composed of 3 exons, 2 introns and the splicing sites are conserved. Family 1 includes subfamilies: m1, m2, m3, m4, m, a, a1, a2, b, ba, t, all of them hit the same InterPro entry. This entry represents a conserved region containing seven of the metal-binding cysteines, located in the N-terminal section of family 1 MTs.
This entry represents 5-enolpyruvylshikimate-3-phosphate (EPSP) synthase (also known as 3-phosphoshikimate 1-carboxyvinyltransferase), catalyses the sixth step in the biosynthesis from chorismate of the aromatic amino acids (the shikimate pathway) in bacteria (gene aroA), plants and fungi (where it is part of a multifunctional enzyme which catalyses five consecutive steps in this pathway) []. The sixth step is the formation of EPSP and inorganic phosphate from shikimate-3-phosphate (S3P) and phosphoenolpyruvate (PEP).EPSP can use shikimate or shikimate-3-phosphate as a substrate. By binding shikimate, the backbone of the active site is changed, which affects the binding of glyphosate and renders the reaction insensitive to inhibition by glyphosate []. On isolation of the discontinuous C-terminal domain, it was found that it binds neither its substrate nor its inhibitor but maintains structural integrity [].Earlier studies suggested that the active site of the enzyme is in the cleft between its two globular domains. When the enzyme binds S3P, there is a conformational change in the isolated N-terminal domain []. The sequence of EPSP from various biological sources shows that the structure of the enzyme has been well conserved throughout evolution. Two strongly conserved regions are well defined. The first one corresponds to a region that is part of the active site and which is also important for the resistance to glyphosate []. The second second one is located in the C-terminal part of the protein and contains a conserved lysine which seems to be important for the activity of the enzyme.Since the shikimate pathway is not present in vertebrates but is essential for the life of plants, fungi and bacteria, it is commonly viewed as a target for antimicrobial drug development.
Heme peroxidases were originally divided into two superfamilies, namely, the animal peroxidases and the plant peroxidases (class I, II and III), which include fungal (class II) and bacterial peroxidases. The DyP (for dye decolorizing peroxidase) family constitutes a novel class of hemeperoxidase. Because these enzymes were derived from fungal sources, the DyP family was thought to be structurally related to the class II secretory fungal peroxidases. However, the DyP family exhibits only low sequence similarity to classical fungal peroxidases, such as LiP and MnP, and does not contain the conserved proximal and distal histidines and an essential arginine found in other plant peroxidase superfamily members.DyP proteins have several characteristics that distinguish them from all other peroxidases, including a particularly wide substrate specificity, a lack of homology to most other peroxidases, and the ability to function well under much lower pH conditions compared with the other plant peroxidases []. In terms of substrate specificity, DyP degrades the typical peroxidase substrates, but also degrades hydroxyl-free anthraquinone (many dyes are derived from anthraquinone compounds).Crystal structures of DyP family members reveal two domains, each one adopting a ferredoxin-like fold [, , ]. The proteins consist of an N-terminal domain and a C-terminal domain likely to be related by a duplication of an ancestral gene, as inferred from the conserved topology of the domains. The heme iron is penta-coordinated, with the protein contributing a conserved histidine ligand to the iron centre. A conserved Asp most likely acts as a proton donor/acceptor and takes the place of the catalytic histidine used by plant peroxidases. This Asp substitution helps explain why the DyP family is active at low pH [, ].
Ku70 (also known as XRCC6) is a eukaryotic protein that is involved in the repair of DNA double-strand breaks by non-homologous end-joining [, ]. Ku is a heterodimer of approximately 70kDa and 80kDa subunits []. Both these subunits have strong sequence similarity and it has been suggested that they may have evolved by gene duplication from a homodimeric ancestor in eukaryotes []. Homologues of the eukaryotic DNA-end-binding protein Ku were identified in several bacterial and one archaeal genome using iterative database searches; these prokaryotic Ku members are homodimers that have been predicted to be involved in the DNA repair system, which is mechanistically similar to eukaryotic non-homologous end joining [, ]though they are not members of this family. Recent findings have implicated yeast Ku in telomeric structure maintenance in addition to non-homologous end-joining. Some of the phenotypes of Ku-knockout mice may indicate a similar role for Ku at mammalian telomeres [].Evolutionary notes: With the current available phyletic information it is difficult to determine the correct evolutionary trajectory of the Ku domain. It is possible that the core Ku domain was present in bacteria and archaea even before the presence of the eukaryotes. Eukaryotes might have vertically inherited the Ku core protein from a common ancestor shared with a certain archaeal lineage, or through horizontal transfer from bacteria. Alternatively, the core Ku domain could have evolved in the eukaryotic lineage and then horizontally transferred to the prokaryotes. Sequencing of additional archaeal genomes and those of early-branching eukaryotes may help resolve the evolutionary history of the Ku domain. Structure notes: The eukaryotic Ku heterodimer is comprised of an alpha/beta N-terminal, a central β-barrel domain and a helical C-terminal arm []. Structural analysis of the Ku70/80 heterodimer bound to DNA indicates that subunit contacts lead to the formation of a highly charged channel through which the DNA passes without making any contacts with the DNA bases [].
Transcription factors of the T-box family are required both for early cell-fate decisions, such as those necessary for formation of the basic vertebrate body plan, for differentiation and organogenesis []and also have been associated to multiple aspects of development and in adult terminal cell-type differentiation in different animal lineages []. The T-box is defined as the minimal region within the T-box protein that is both necessary and sufficient for sequence-specific DNA binding, all members of the family so far examined bind to the DNA consensus sequence TCACACCT and function as transcriptional repressors and/or activators []. The T-box is a relatively large DNA-binding domain, generally comprising about a third of the entire protein (17-26kDa) [].These genes were uncovered on the basis of similarity to the DNA binding domain []of Mus musculus (Mouse) Brachyury (T) gene product, which similarity is the defining feature of the family. The Brachyury gene is named for its phenotype, which was identified 70 years ago as a mutant mouse strain with a short blunted tail. The gene, and its paralogues, have become a well-studied model for the family, and hence much of what is known about the T-box family is derived from the murine Brachyury gene.Consistent with its nuclear location, Brachyury protein has a sequence-specific DNA-binding activity and can act as a transcriptional regulator []. Homozygous mutants for the gene undergo extensive developmental anomalies, thus rendering the mutation lethal []. The postulated role of Brachyury is as a transcription factor, regulating the specification and differentiation of posterior mesoderm during gastrulation in a dose-dependent manner [].T-box proteins tend to be expressed in specific organs or cell types, especially during development, and they are generally required for the development of those tissues, for example, Brachyury is expressed in posterior mesoderm and in the developing notochord, and it is required for the formation of these cells in mice []. The T-box family is an ancient group that appears to play a critical role in development in all animal species [].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups []. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [, , ].Lysophospholipids (LPs), such as lysophosphatidic acid (LPA), sphingosine1-phosphate (S1P) and sphingosylphosphorylcholine (SPC), have long been known to act as signalling molecules in addition to their roles as intermediates in membrane biosynthesis []. They have roles in the regulation of cell growth, differentiation, apoptosis and development, and have been implicated in a wide range of pathophysiological conditions, including: blood clotting, corneal wounding, subarachinoid haemorrhage, inflammation and colitis []. A number of G protein-coupled receptors bind members of the lysophopholipid family - these include: the cannabinoid receptors; platelet activating factor receptor; OGR1, an SPC receptor identified in ovarian cancer cell lines; PSP24, an orphan receptor that has been proposed to bind LPA; and at least 8 closely related receptors, the EDG family, that bind LPA and S1P [].S1P is released from activated platelets and is also produced by a number of other cell types in response to growth factors and cytokines []. It is proposed to act both as an extracellular mediator and as an intracellularsecond messenger. The cellular effects of S1P include growth related effects, such as proliferation, differentiation, cell survival and apoptosis, and cytoskeletal effects, such as chemotaxis, aggregation, adhesion, morphological change and secretion. The molecule has been implicated in control of angiogenesis, inflammation, heart-rate and tumour progression, and may play an important role in a number of disease states, such as atherosclerosis, and breast and ovarian cancer []. Recently, 5 G protein-coupled receptors have been identified that act as high affinity receptors for S1P, and also as low affinity receptors for the related lysophospholipid, SPC []. EDG-1, EDG-3, EDG-5 and EDG-8 share a high degree of similarity, and are also referred to as lpB1, lpB3, lpB2 and lpB4, respectively. EDG-6 is referred to as lpC1, reflecting its more distant relationship to the other S1P receptors.EDG-1 was the first member of the family to be cloned (from phorbol-esterdifferentiated human endothelial cells); its ligand, however, was unknown, so it was named endothelial differentiation gene (EDG) 1, reflecting its potential function []. EDG-1 is expressed widely, with highest levels in the brain, heart, lung, liver and spleen. Moderate levels are also found in the thymus, kidney and muscle []. Within these regions, EDG-1 is expressed in endothelial cells, vascular smooth muscle, fibroblasts, melanocytes and cells of epithelioid origin []. Upon binding of S1P, the receptor can couple to Gi1, Gi2, Gi3, Go and Gz type G proteins, leading to inhibition of adenylyl cylase, phospholipase C activation and MAP kinase activation [, ].
This superfamily represents the N-terminal region of the SUD domain (SUD-N or Mac2) found in non-structural protein NSP3, the product of ORF1a in group 2 (beta) coronaviruses. It is found in human SARS-CoV and SARS-CoV-2 polyprotein 1a and 1ab, and in related coronavirus polyproteins [].Non-structural protein Nsp3 contains at least seven different functional modules within its 1922-amino-acid polypeptide chain. One of these is the so-called SARS (severe acute respiratory syndrome)-unique domain (SUD), a stretch of about 338 residues that is completely absent from any other coronavirus. The SUD domain may be responsible for the high pathogenicity of the SARS coronavirus, compared to other viruses of this family [, ]. Later, the NSP3 of MHV was shown by X-ray crystallography to contain a SUD-C-like fold, so it is no longer appropriate to call this domain "SARS-unique". This region has been renamed into "Domain Preceding Ubl2 and PL2pro"(DPUP) []. NSP3 has been shown to bind to viral RNA, nucleocapsid protein, as well as other viral proteins, and participates in polyprotein processing. It is a multifunctional protein comprising up to 16 different domains and regions []. SUD(core) exhibits a two-domain architecture. The N-terminal subdomain (SUD-N) and the C-terminal subdomain of SUDcore, also named middle SUD subdomain, or SUD-M [, ]. SUD-N has been shown to be dispensable for the SARS-CoV replication/transcription complex within the context of a SARS-CoV replicon []. SUD consists of three globular domains separated by short linker peptide segments: SUD-N, SUD-M, and SUD-C []. Among these, SUD-N and SUD-M are macrodomains. The SUD-N domain is a related macrodomain which also binds G-quadruplexes []. While SUD-N is specific to the NSP3 of SARS and betacoronaviruses of the sarbecovirus subgenera (B lineage), SUD-M is present in most NSP3 proteins except the NSP3 from betacoronaviruses of the embecovirus subgenera (A lineage). SUD-M, despite its name, is not specific to SARS. SUD-C adopts a frataxin-like fold, has structural similarity to DNA-binding domains of DNA-modifying enzymes, binds single-stranded RNA, and regulates the RNA binding behavior of the SUD-M macrodomain. SARS-CoV Nsp3 contains a third macrodomain (the X-domain). The X-domain may function as a module binding poly(ADP-ribose); however, SUD-N and SUD-M do not bind ADP-ribose, as the triple glycine sequence involved in its binding is not conserved in these [].
Pleckstrin homology (PH) domains are small modular domains that occur in a large variety of proteins. The domains can bind phosphatidylinositol within biological membranes and proteins such as the beta/gamma subunits of heterotrimeric G proteins []and protein kinase C []. Through these interactions, PH domains play a role in recruiting proteins to different membranes, thus targeting them to appropriate cellular compartments or enabling them to interact with other components of the signal transduction pathways.PH domains have been found to possess inserted domains (such as in PLC gamma, syntrophins) and to be inserted within other domains. Mutations in Brutons tyrosine kinase (Btk) within its PH domain cause X-linked agammaglobulinaemia (XLA) in patients. Point mutations cluster into the positively charged end of the molecule around the predicted binding site for phosphatidylinositol lipids.The 3D structure of several PH domains has been determined []. All known cases have a common structure consisting of two perpendicular anti-parallel β-sheets, followedby a C-terminal amphipathic helix. The loops connecting the β-strands differ greatly in length, making the PH domain relatively difficult to detect. There are no totally invariant residues within the PH domain.Proteins reported to contain one more PH domains belong to the following families:Pleckstrin, the protein where this domain was first detected, is the major substrate of protein kinase C in platelets. Pleckstrin is one of the rare proteins to contains two PH domains.Ser/Thr protein kinases such as the Akt/Rac family, the beta-adrenergic receptor kinases, the mu isoform of PKC and the trypanosomal NrkA family.Tyrosine protein kinases belonging to the Btk/Itk/Tec subfamily.Insulin Receptor Substrate 1 (IRS-1).Regulators of small G-proteins like guanine nucleotide releasing factor GNRP (Ras-GRF) (which contains 2 PH domains), guanine nucleotide exchange proteins like vav, dbl, SoS and Saccharomyces cerevisiae CDC24, GTPase activating proteins like rasGAP and BEM2/IPL2, and the human break point cluster protein bcr.Cytoskeletal proteins such as dynamin (see ), Caenorhabditis elegans kinesin-like protein unc-104 (see ), spectrin beta-chain, syntrophin (2 PH domains) and S. cerevisiae nuclear migration protein NUM1.Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see ) isoforms gamma and delta. Isoform gamma contains two PH domains, the second one is split into two parts separated by about 400 residues.Oxysterol binding proteins OSBP, S. cerevisiae OSH1 and YHR073w.Mouse protein citron, a putative rho/rac effector that binds to the GTP-bound forms of rho and rac.Several S. cerevisiae proteins involved in cell cycle regulation and bud formation like BEM2, BEM3, BUD4 and the BEM1-binding proteins BOI2 (BEB1) and BOI1 (BOB1).C. elegans protein MIG-10.C. elegans hypothetical proteins C04D8.1, K06H7.4 and ZK632.12.S. cerevisiae hypothetical proteins YBR129c and YHR155w.
Thioredoxins [, , , ]are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of two cysteine thiol groups to a disulphide, accompanied by the transfer of two electrons and two protons. The net result is the covalent interconversion of a disulphide and a dithiol. In the NADPH-dependent protein disulphide reduction, thioredoxin reductase (TR) catalyses the reduction of oxidised thioredoxin (trx) by NADPH using FAD and its redox-active disulphide; reduced thioredoxin then directly reduces the disulphide in the substrate protein [].Thioredoxin is present in prokaryotes and eukaryotes and the sequence around the redox-active disulphide bond is well conserved. All thioredoxins contain a cis-proline located in a loop preceding β-strand 4, which makes contact with the active site cysteines, and is important for stability and function []. Thioredoxin belongs to a structural family that includes glutaredoxin, glutathione peroxidase, bacterial protein disulphide isomerase DsbA, and the N-terminal domain of glutathione transferase []. Thioredoxins have a beta-alpha unit preceding the motif common to all these proteins.A number of eukaryotic proteins contain domains evolutionary related to thioredoxin, most of them are protein disulphide isomerases (PDI). PDI () [, , ]is an endoplasmic reticulum multi-functional enzyme that catalyses the formation and rearrangement of disulphide bonds during protein folding []. All PDI contains two or three (ERp72) copies of the thioredoxin domain, each of which contributes to disulphide isomerase activity, but which are functionally non-equivalent []. Moreover, PDI exhibits chaperone-like activity towards proteins that contain no disulphide bonds, i.e. behaving independently of its disulphide isomerase activity []. The various forms of PDI which are currently known are:PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase (), as a component of oligosaccharyl transferase (), as thyroxine deiodinase (), as glutathione-insulin transhydrogenase () and as a thyroid hormone-binding proteinERp60 (ER-60; 58 Kd microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C isozyme and later to be a protease.ERp72.ERp5.Bacterial proteins that act as thiol:disulphide interchange proteins that allows disulphide bondformation in some periplasmic proteins also contain a thioredoxin domain. These proteins include:Escherichia coli DsbA (or PrfA) and its orthologs in Vibrio cholerae (TtcpG) and Haemophilus influenzae (Por).E. coli DsbC (or XpRA) and its orthologues in Erwinia chrysanthemi and H. influenzae.E. coli DsbD (or DipZ) and its H. influenzae orthologue.E. coli DsbE (or CcmG) and orthologues in H. influenzae.Rhodobacter capsulatus (Rhodopseudomonas capsulata) (HelX), Rhiziobiacae (CycY and TlpA).This entry represents the thioredoxin domain.
Chemokines (chemotactic cytokines) are a family of chemoattractant molecules. They attract leukocytes to areas of inflammation and lesions, and play a key role in leukocyte activation. Originally defined as host defense proteins, chemokines are now known to play a much broader biological role []. They have a wide range of effects in many different cell types beyond the immune system, including, for example, various cells of the central nervous system [], and endothelial cells, where they may act as either angiogenic or angiostatic factors [].The chemokine family is divided into four classes based on the number and spacing of their conserved cysteines: 2 Cys residues may be adjacent (the CC family); separated by an intervening residue (the CXC family); have only one of the first two Cys residues (C chemokines); or contain both cysteines, separated by three intervening residues (CX3C chemokines).Chemokines exert their effects by binding to rhodopsin-like G protein-coupled receptors on the surface of cells. Following interaction with their specific chemokine ligands, chemokine receptors trigger a flux in intracellular calcium ions, which cause a cellular response, including the onset of chemotaxis. There are over fifty distinct chemokines and least 18 human chemokine receptors []. Although the receptors bind only a single class of chemokines, they often bind several members of the same class with high affinity. Chemokine receptors are preferentially expressed on important functional subsets of dendritic cells, monocytes and lymphocytes, including Langerhans cells and T helper cells [, ]. Chemokines and their receptors can also be subclassified into homeostatic leukocyte homing molecules (CXCR4, CXCR5, CCR7, CCR9) versus inflammatory/inducible molecules (CXCR1, CXCR2, CXCR3, CCR1-6, CX3CR1).CC chemokine receptors are a subfamily of the chemokine receptors that specifically bind and respond to cytokines of the CC chemokine family. There are currently ten members of the CC chemokine receptor subfamily, named CCR1 to 10. The receptors receptors are found in monocytes, lymphocytes, basophils and eosinophils.This entry represents CC chemokine receptor 9 (CCR9), which was previously designated as the orphan receptors GPR28 and GPR 9-6. CCR9 is expressed predominantly in the thymus, in both mature and immature T cells, and is also found in the lymph nodes, spleen, glomerular podocytes, bone marrow stromal cells and the small intestine [, , , , ]. Transfected cells expressing CCR9 receptor bind specifically to CCL25 (also known as Thymus-Expressed Chemokine) []. This interaction may play a pivotal role in T-cell migration in the thymus []. CCR9 activation has also been shown to influence cancer cell migration, invasion and matrix metallopeptidase expression, which together may affect prostate cancer metastasis [].
Protein phosphorylation, which plays a key role in most cellular activities, is a reversible process mediated by protein kinases and phosphoprotein phosphatases. Protein kinases catalyse the transfer of the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. Phosphoprotein phosphatases catalyse the reverse process. Protein kinases fall into three broad classes, characterised with respect to substrate specificity []:Serine/threonine-protein kinasesTyrosine-protein kinasesDual specificity protein kinases (e.g. MEK - phosphorylates both Thr and Tyr on target proteins)Protein kinase function is evolutionarily conserved from Escherichia coli to human []. Protein kinases play a role in a multitude of cellular processes, including division, proliferation, apoptosis, and differentiation []. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins. The catalytic subunits of protein kinases are highly conserved, and several structures have been solved [], leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases [].Tyrosine-protein kinases can transfer a phosphate group from ATP to a tyrosine residue in a protein. These enzymes can be divided into two main groups []:Receptor tyrosine kinases (RTK), which are transmembrane proteins involved in signal transduction; they play key roles in growth, differentiation, metabolism, adhesion, motility, death and oncogenesis []. RTKs are composed of 3 domains: an extracellular domain (binds ligand), a transmembrane (TM) domain, and an intracellular catalytic domain (phosphorylates substrate). The TM domain plays an important role in the dimerisation process necessary for signal transduction []. Cytoplasmic / non-receptor tyrosine kinases, which act as regulatory proteins, playing key roles in cell differentiation, motility, proliferation, and survival. For example, the Src-family of protein-tyrosine kinases [].TYK2 was first identified by low-stringency hybridisation screening of ahuman lymphoid cDNA library with the catalytic domain of proto-oncogene c-fms []. Mouse and puffer fish orthlogues have also been identified. In common with JAK1 and JAK2, and by contrast with JAK3, TYK2 appears to be ubiquitously expressed. This entry represents the N-terminal region of TYK2.
This short bi-helical repeat is related to HEAT repeats and is present in phycocyanobilin lyases and other proteins.Cyanobacteria and red algae harvest light energy using macromolecular complexes known as phycobilisomes (PBS), peripherally attached to the photosynthetic membrane. The major components of PBS are the phycobiliproteins. These heterodimeric proteins are covalently attached to phycobilins: open-chain tetrapyrrole chromophores, which function as the photosynthetic light-harvesting pigments. Phycobiliproteins differ in sequence and in the nature and number of attached phycobilins to each of their subunits. These proteins include the lyase enzymes that specifically attach particular phycobilins to apophycobiliprotein subunits. The most comprehensively studied of these is the CpcE/Flyase , , which attaches phycocyanobilin (PCB) to the alpha subunit of apophycocyanin []. Similarly, MpeU/V attaches phycoerythrobilin to phycoerythrin II, while CpeY/Z is thought to be involved in phycoerythrobilin (PEB) attachment to phycoerythrin (PE) I (PEs I and II differ in sequence and in the number of attached molecules of PEB: PE I has five, PE II has six) [].All the reactions of the above lyases involve an apoprotein cysteine SH addition to a terminal delta 3,3'-double bond. Such a reaction is not possible in the case of phycoviolobilin (PVB), the phycobilin of alpha-phycoerythrocyanin (alpha-PEC). It is thought that in this case, PCB, not PVB, is first added to apo-alpha-PEC, and is then isomerized to PVB. The addition reaction has been shown to occur in the presence of either of the components of alpha-PEC-PVB lyase PecE or PecF (or both). The isomerisation reaction occurs only when both PecE and PecF components are present, i.e. the PecE/F phycobiliprotein lyase is also a phycobilin isomerase []. Another member of this family is the NblB protein, whose similarity to the phycobiliprotein lyases was previously noted []. This constitutively expressed protein is not known to have any lyase activity. It is thought to be involved in the coordination of PBS degradation with environmental nutrient limitation. It has been suggested that the similarity of NblB to the phycobiliprotein lyases is due to the ability to bind tetrapyrrole phycobilins via the common repeated motif []. This repeat is also found in proteins not related to the phycobilisomes, such as archaeal proteins that are essential for chemotaxis and phototaxis [], epoxyqueuosine reductases []and deoxyhypusine hydroxylases [].
Fibroblast growth factors (FGFs) [, ]are a family of multifunctional proteins, often referred to as 'promiscuous growth factors' due to their diverse actions on multiple cell types [, ]. FGFs are mitogens, which stimulate growth or differentiation of cells of mesodermal or neuroectodermal origin. The function of FGFs in developmental processes include mesoderm induction, anterior-posterior patterning, limb development, and neural induction and development. In mature tissues, they are involved in diverse processes including keratinocyte organisation and wound healing [, , , , , ]. FGF involvement is critical during normal development of both vertebrates and invertebrates, and irregularities in their function leads to a range of developmental defects [, , , ]. Fibroblast growth factors are heparin-binding proteins and interactions with cell-surface-associated heparan sulfate proteoglycans have been shown to be essential for FGF signal transduction. FGFs have internal pseudo-threefold symmetry (β-trefoil topology) []. There are currently over 20 different FGF family members that have been identified in mammals, all of which are structurally related signaling molecules [, ]. They exert their effects through four distinct membrane fibroblast growth factor receptors (FGFRs), FGFR1 to FGFR4 [], which belong to the tyrosine kinase superfamily. Upon binding to FGF, the receptors dimerize and their intracellular tyrosine kinase domains become active [].The FGFRs consist of an extracellular ligand-binding domain composed of three immunoglobulin-like domains (D1-D3), a single transmembrane helix domain, and an intracellular domain with tyrosine kinase activity []. The three immunoglobin(Ig)-like domains, D1, D2, and D3, present a stretch of acidic amino acids (known as the acid box) between D1 and D2. This acid box can participate in the regulation of FGF binding to the FGFR. Immunoglobulin-like domains D2 and D3 are sufficient for FGF binding. FGFR family members differ from one another in their ligandaffinities and tissue distribution [, ]. Most FGFs can bind to several different FGFR subtypes. Indeed, FGF1 is sometimes referred to as the universal ligand, as it is capable of activating all of the different FGFRs []. However, there are some exceptions. For example, FGF7 only interacts with FGFR2 []and FGF18 was recently shown to only activate FGFR3 []. Fibroblast growth factor receptor 1 (FGFR1) binds both acidic and basic fibroblast growth factors and is involved in limb induction []. FGFR1 has been shown to be associated with Pfeiffer syndrome [], and cleft lip and/or palate [, ]. Fibroblast growth factor receptor 1 has been shown to interact with growth factor receptor-bound protein 14 (GRB14) [], Src homology 2 domain containing adaptor protein B (SHB) [], fibroblast growth factor receptor substrate 2 (FRS2)[]and fibroblast growth factor 1 (FGF1) [, ].This entry represents the catalytic domain of FGFR1.
Fibroblast growth factors (FGFs) [, ]are a family of multifunctional proteins, often referred to as 'promiscuous growth factors' due to their diverse actions on multiple cell types [, ]. FGFs are mitogens, which stimulate growth or differentiation of cells of mesodermal or neuroectodermal origin. The function of FGFs in developmental processes include mesoderm induction, anterior-posterior patterning, limb development, and neural induction and development. In mature tissues, they are involved in diverse processes including keratinocyte organisation and wound healing [, , , , , ]. FGF involvement is critical during normal development of both vertebrates and invertebrates, and irregularities in their function leads to a range of developmental defects [, , , ]. Fibroblast growth factors are heparin-binding proteins and interactions with cell-surface-associated heparan sulfate proteoglycans have been shown to be essential for FGF signal transduction. FGFs have internal pseudo-threefold symmetry (β-trefoil topology) []. There are currently over 20 different FGF family members that have been identified in mammals, all of which are structurally related signaling molecules [, ]. They exert their effects through four distinct membrane fibroblast growth factor receptors (FGFRs), FGFR1 to FGFR4 [], which belong to the tyrosine kinase superfamily. Upon binding to FGF, the receptors dimerize and their intracellular tyrosine kinase domains become active [].The FGFRs consist of an extracellular ligand-binding domain composed of three immunoglobulin-like domains (D1-D3), a single transmembrane helix domain, and an intracellular domain with tyrosine kinase activity []. The three immunoglobin(Ig)-like domains, D1, D2, and D3, present a stretch of acidic amino acids (known as the acid box) between D1 and D2. This acid box can participate in the regulation of FGF binding to the FGFR. Immunoglobulin-like domains D2 and D3 are sufficient for FGF binding. FGFR family members differ from one another in their ligand affinities and tissue distribution [, ]. Most FGFs can bind to several different FGFR subtypes. Indeed, FGF1 is sometimes referred to as the universal ligand, as it is capable of activating all of the different FGFRs []. However, there are some exceptions. For example, FGF7 only interacts with FGFR2 []and FGF18 was recently shown to only activate FGFR3 []. This entry represents the fibroblast growth factor receptor family.
Janus kinases (JAKs) are tyrosine kinases that function in membrane-proximal signalling events initiated by a variety of extracellular factors binding to cell surface receptors []. Many type I and II cytokine receptors lack a protein tyrosine kinase domain and rely on JAKs to initiate the cytoplasmic signal transduction cascade. Ligand binding induces oligomerisation of the receptors, which then activates the cytoplasmic receptor-associated JAKs. These subsequently phosphorylate tyrosine residues along the receptor chains with which they are associated. The phosphotyrosine residues are a target for a variety of SH2 domain-containing transducer proteins. Amongst these are the signal transducers and activators of transcription (STAT) proteins, which, after binding to the receptor chains, are phosphorylated by the JAK proteins. Phosphorylation enables the STAT proteins to dimerise and translocate into the nucleus, where they alter the expression of cytokine-regulated genes. This system is known as the JAK-STAT pathway.Four mammalian JAK family members have been identified: JAK1, JAK2, JAK3, and TYK2. They are relatively large kinases of approximately 1150 amino acids, with molecular weights of ~120-130kDa. Their amino acid sequences are characterised by the presence of 7 highly conserved domains, termed JAK homology (JH) domains. The C-terminal domain (JH1) is responsible for the tyrosine kinase function. The next domain in the sequence (JH2) is known as the tyrosine kinase-like domain, as its sequence shows high similarity to functional kinases but does not possess any catalytic activity. Although the function of this domain is not well established, there is some evidence for a regulatory role on the JH1 domain, thus modulating catalytic activity. The N-terminal portion of the JAKs (spanning JH7 to JH3) is important for receptor association and non-catalytic activity, and consists of JH3-JH4, which is homologous to the SH2 domain, and lastly JH5-JH7, which is a FERM domain.This entry represents the non-receptor tyrosine kinase JAK2 []. JAK2 was initially cloned using a PCR-based strategy utilising primers corresponding to conserved motifs within the catalytic domain of protein-tyrosine kinases []. In common with JAK1 and TYK2, and by contrast with JAK3, JAK2 appears to be ubiquitously expressed.
The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [, ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].Phenylalanyl-tRNA synthetase () is an alpha2/beta2 tetramer composed of 2 subunits that belongs to class IIc. In eubacteria, a small subunit (pheS gene) can be designated as beta (E. coli) or alpha subunit (nomenclature adopted in InterPro). Reciprocally the large subunit(pheT gene) can be designated as alpha (E. coli) or beta (see and ). In all other kingdoms the two subunits have equivalent length in eukaryota, and can be identified by specific signatures. The enzyme from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the synthetase family. Identification of phenylalanyl-tRNA synthetase as a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other synthetases [].This family describes the alpha subunit, which shows some similarity to class II aminoacyl-tRNA ligases. Mitochondrial phenylalanyl-tRNA synthetase is a single polypeptide chain, active as a monomer, and similar to this chain rather than to the beta chain, but excluded from this family.
The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [, ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].Phenylalanyl-tRNA synthetase () is an alpha2/beta2 tetramer composed of 2 subunits that belongs to class IIc. In eubacteria, a small subunit (pheS gene) can be designated as beta (E. coli) or alpha subunit (see ). Reciprocally the large subunit(pheT gene) can be designated as alpha (E. coli) or beta. In all other kingdoms the two subunits have equivalent length in eukaryota, and can be identified by specific signatures. The enzyme from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the synthetase family. Identification of phenylalanyl-tRNA synthetase as a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other synthetases [].This family describes the beta subunit. The beta subunits break into two subfamilies that are considerably different in sequence, length, and pattern of gaps (see also ). This family represents the subfamily that includes the beta subunit from eukaryotic cytosol, the archaea, and spirochetes.
Phenylalanine-tRNA ligase () is an alpha2/beta2 tetramer composed of 2 subunits that belongs to class IIc. In eubacteria, a small subunit (pheS gene) can be designated as beta (E. coli) or alpha subunit (see ). Reciprocally the large subunit(pheT gene) can be designated as alpha (E. coli) or beta. In all other kingdoms the two subunits have equivalent length in eukaryota, and can be identified by specific signatures. The enzyme from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the synthetase family. Identification of phenylalanine-tRNA ligase as a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other ligases [].This family describes the beta subunit. The beta subunits break into two subfamilies that are considerably different in sequence, length, and pattern of gaps (see also ). This family represents the subfamily that includes the beta subunit from bacteria other than spirochetes, as well as a chloroplast-encoded form from Porphyra purpurea. The chloroplast-derived sequence is considerably shorter at the N-terminal.The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [, ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].
The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [, ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].Phenylalanyl-tRNA synthetase () is an alpha2/beta2 tetramer composed of 2 subunits that belongs to class IIc. In eubacteria, a small subunit (pheS gene) can be designated as beta (E. coli) or alpha subunit (nomenclature adopted in InterPro). Reciprocally the large subunit(pheT gene) can be designated as alpha (E. coli) or beta (see and ). In all other kingdoms the two subunits have equivalent length in eukaryota, and can be identified by specific signatures. The enzyme from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the synthetase family. Identification of phenylalanyl-tRNA synthetase as a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other synthetases [].This family describes the mitochondrial phenylalanyl-tRNA synthetases. Unlike all other known phenylalanyl-tRNA synthetases, the mitochondrial form demonstrated from yeast is monomeric. It is similar to but longer than the alpha subunit (PheS) of the alpha 2 beta 2 form found in bacteria, Archaea, and eukaryotes, and shares the characteristic motifs of class II aminoacyl-tRNA ligases.
The band-7 protein family comprises a diverse set of membrane-bound proteins characterised by the presence of a conserved domain, the band-7 domain, also known as SPFH or PHB domain. The exact function of the band-7 domain is not known, but examples from animal and bacterial stomatin-type proteins demonstrate binding to lipids and the ability to assemble into membrane-bound oligomers that form putative scaffolds [].A variety of proteins belong to the band-7 family. These include the stomatins, prohibitins, flottins and the HflK/C bacterial proteins. Eukaryotic band 7 proteins tend to be oligomeric and are involved in membrane-associated processes. Stomatins are involved in ion channel function, prohibitins are involved in modulating the activity of a membrane-bound FtsH protease and the assembly of mitochondrial respiratory complexes, and flotillins are involved in signal transduction and vesicle trafficking [].Stomatin, also known as human erythrocyte membrane protein band 7.2b [], was first identified in the band 7 region of human erythrocyte membrane proteins. It is an oligomeric, monotopic membrane protein associated with cholesterol-rich membranes/lipid rafts. Human stomatin is ubiquitously expressed in all tissues; highly in hematopoietic cells, relatively low in brain. It is associated with the plasma membrane and cytoplasmic vesicles of fibroblasts, epithelial and endothelial cells [].Stomatin is believed to be involved in regulating monovalent cation transport through lipid membranes. Absence of the protein in hereditary stomatocytosis is believed to be the reason for the leakage of Na+and K+ions into and from erythrocytes []. Stomatin is also expressed in mechanosensory neurons, where it may interact directly with transduction components, including cation channels [].Stomatin proteins have been identified in various organisms, including Caenorhabditis elegans. There are nine stomatin-like proteins in C. elegans, MEC-2 being the one best characterised []. In mammals, other stomatin family members are stomatin-like proteins SLP1, SLP2 and SLP3, and NPHS2 (podocin), which display selective expression patterns []. Stomatin family members are oligomeric, they mostly localise to membrane domains, and in many cases have been shown to modulate ion channel activity.The stomatins and prohibitins, and to a lesser extent flotillins, are highly conserved protein families and are found in a variety of organisms ranging from prokaryotes to higher eukaryotes, whereas HflK and HflC homologues are only present in bacteria [].This entry represents the stomatins and stomatin-like proteins, including podicin, from a wide range of eukaryotes, bacteria, archaea and viruses. It excludes the HflK and HflC proteins, prohibitins and flotillins.
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents ZPR1-type zinc finger domains. ZPR1 was shown experimentally to bind approximately two moles of zinc, and has two copies of a domain homologous to this protein, each containing a putative zinc finger of the form CXXCX(25)CXXC. ZPR1 bindsthe tyrosine kinase domain of epidermal growth factor receptor but is displaced by receptor activation and autophosphorylation after which it redistributes in part to the nucleus. The proteins described by this family by analogy may be suggested to play a role in signal transduction as proven for other Z-finger binding proteins.Deficiencies in ZPR1 may contribute to neurodegenerative disorders. ZPR1 appears to be down-regulated in patients with spinal muscular atrophy (SMA), a disease characterised by degeneration of the alpha-motor neurons in the spinal cord that can arise from mutations affecting the expression of Survival Motor Neurons (SMN) []. ZPR1 interacts with complexes formed by SMN [], and may act as a modifier that effects the severity of SMA.
This entry represents a subfamily of the major facilitator superfamily. Members in this family include sugar transporters, which are responsible for the binding and transport of various carbohydrates, organic alcohols, and acids in a wide range of prokaryotic and eukaryotic organisms []. Most but not all members of this family catalyse sugar transport []. Recent genome-sequencing data and a wealth of biochemical and molecular genetic investigations have revealed the occurrence of dozens of families of primary and secondary transporters. Two such families have been found to occur ubiquitously in all classifications of living organisms. These are the ATP-binding cassette (ABC) superfamily and the major facilitator superfamily (MFS), also called the uniporter-symporter-antiporter family. While ABC family permeases are ingeneral multicomponent primary active transporters, capable of transporting both small molecules and macromolecules in response to ATP hydrolysis the MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients. Although well over 100 families of transporters have now been recognised and classified, the ABC superfamily and MFS account for nearly half of the solute transporters encoded within the genomes of microorganisms. They are also prevalent in higher organisms. The importance of these two families of transport systems to living organisms can therefore not be overestimated [].The MFS was originally believed to function primarily in the uptake of sugars but subsequent studies revealed that drug efflux systems, Krebs cycle metabolites, organophosphate:phosphate exchangers, oligosaccharide:H1 symport permeases, and bacterial aromatic acid permeases were all members of the MFS. These observations led to the probability that the MFS is far more widespread in nature and far more diverse in function than had been thought previously. 17 subgroups of the MFS have been identified [].Evidence suggests that the MFS permeases arose by a tandem intragenic duplication event in the early prokaryotes. This event generated a 2-transmembrane-spanner (TMS) protein topology from a primordial 6-TMS unit. Surprisingly, all currently recognised MFS permeases retain the two six-TMS units within a single polypeptide chain, although in 3 of the 17 MFS families, an additional two TMSs are found []. Moreover, the well-conserved MFS specific motif between TMS2 and TMS3 and the related but less well conserved motif between TMS8 and TMS9 []prove to be a characteristic of virtually all of the more than 300 MFS proteins identified.This family includes sugar and other type of transporters.
Recent genome-sequencing data and a wealth of biochemical and molecular genetic investigations have revealed the occurrence of dozens of families of primary and secondary transporters. Two such families have been found to occur ubiquitously in all classifications of living organisms. These are the ATP-binding cassette (ABC) superfamily and the major facilitator superfamily (MFS), also called the uniporter-symporter-antiporter family. While ABC family permeases are in general multicomponent primary active transporters, capable of transporting both small molecules and macromolecules in response to ATP hydrolysis the MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients. Although well over 100 families of transporters have now been recognised and classified, the ABC superfamily and MFS account for nearly half of the solute transporters encoded within the genomes of microorganisms. They are also prevalent in higher organisms. Theimportance of these two families of transport systems to living organisms can therefore not be overestimated [].The MFS was originally believed to function primarily in the uptake of sugars but subsequent studies revealed that drug efflux systems, Krebs cycle metabolites, organophosphate:phosphate exchangers, oligosaccharide:H1 symport permeases, and bacterial aromatic acid permeases were all members of the MFS. These observations led to the probability that the MFS is far more widespread in nature and far more diverse in function than had been thought previously. 17 subgroups of the MFS have been identified [].Evidence suggests that the MFS permeases arose by a tandem intragenic duplication event in the early prokaryotes. This event generated a 2-transmembrane-spanner (TMS) protein topology from a primordial 6-TMS unit. Surprisingly, all currently recognised MFS permeases retain the two six-TMS units within a single polypeptide chain, although in 3 of the 17 MFS families, an additional two TMSs are found []. Moreover, the well-conserved MFS specific motif between TMS2 and TMS3 and the related but less well conserved motif between TMS8 and TMS9 []prove to be a characteristic of virtually all of the more than 300 MFS proteins identified.This entry represents the metabolite-H(+) symport (MHS) subfamily of the MFS. Members include citrate-proton symporters [], alpha-ketoglutarate permease [], shikimate transporters [], glycine betaine/proline/ectoine/pipecolic acid transporter OusA []and the proline/betaine transporter ProP [].
Sodium proton exchangers (NHEs) constitute a large family of integral membrane protein transporters that are responsible for the counter-transport of protons and sodium ions across lipid bilayers [, ]. These proteins are found in organisms across all domains of life. In archaea, bacteria, yeast and plants, these exchangers provide increased salt tolerance by removing sodium in exchanger for extracellular protons. In mammals they participate in the regulation of cell pH, volume, and intracellular sodium concentration, as well as for the reabsorption of NaCl across renal, intestinal, and other epithelia [, , , ]. Human NHE is also involved in heart disease, cell growth and in cell differentiation []. The removal of intracellular protons in exchange for extracellular sodium effectively eliminates excess acid from actively metabolising cells. In mammalian cells, NHE activity is found in both the plasma membrane and inner mitochondrial membrane. To date, nine mammalian isoforms have been identified (designated NHE1-NHE9) [, ]. These exchangers are highly-regulated (glyco)phosphoproteins, which, based on their primary structure, appear to contain 10-12 membrane-spanning regions (M) at the N terminus and a large cytoplasmic region at the C terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 and M7 regions are highly conserved. Thus, this is thought to be the region that is involved in the transport of sodium and hydrogen ions. The cytoplasmic region has little similarity throughout the family. There is some evidence that the exchangers may exist in the cell membrane as homodimers, but little is currently known about the mechanism of their antiport [].Sodium/hydrogen exchanger 1 (NHE-1) is found in virtually all tissues and cells in mammals and is involved in numerous physiological processes, including regulation of intracellular pH, cellular volume, cytoskeletal organisation, heart disease and cancer [, , ]. In epithelial cells, NHE-1 is largely restricted to the basolateral membrane, which specific subcellular localisation is thought to be important to the functioning of these epithelia. This protein comprises two domains: an N-terminal membrane domain that functions to transport ions, and a C-terminal cytoplasmic regulatory domain that regulates the activity and mediates cytoskeletal interactions.NHE-1 plays a role in survival and migration and invasion of several cancers [, ]. It was shown to be activated at physiological levels of NO [].
Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocasepathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them tothe translocase component []. From there, the mature proteins are either targeted to the outermembrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterialchromosome. The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integralmembrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release ofthe mature peptide into the periplasm (SecD and SecF) []. The chaperone protein SecB []is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm.SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membraneprotein ATPase SecA for secretion []. Together with SecY and SecG, SecE forms a multimericchannel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. Thelatter is mediated by SecA. The structure of theEscherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmicdomains []. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15transmembrane helices. The SecD and SecF equivalents of theGram-positive bacterium Bacillus subtilis are jointly present in one polypeptide,denoted SecDF, that is required to maintain a high capacity for protein secretion.Unlike the SecD subunit of the pre-protein translocase of E. coli, SecDFof B. subtilis was not required for the release of a mature secretory protein fromthe membrane, indicating that SecDF is involved in earlier translocation steps [].Comparison with SecD andSecF proteins from other organisms revealed the presence of 10 conservedregions in SecDF, some of which appear to be important for SecDF function.Interestingly, the SecDF protein of B. subtilis has 12 putative transmembranedomains. Thus, SecDF does not only show sequence similarity but also structuralsimilarity to secondary solute transporters [].This entry represents archaeal and bacterial SecD and SecF protein export membrane proteins and their archaeal homologues []. It is found in association with SecD and SecF proteins are part of the multimeric protein export complex comprising SecA, D, E, F, G, Y, and YajC []. SecD and SecF are required to maintain a proton motive force [].
Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, thesebeing grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ].This group of serine peptidases belong to MEROPS peptidase family S49 (protease IV family, clan S-). The predicted active site serine for members of this family occurs in a transmembrane domain. This group of sequences represent both long and short forms of the bacterial SppA and homologues found in the archaea and plants.Signal peptides of secretory proteins seem to serve at least two important biological functions. First, they are required for protein targeting to and translocation across membranes, such as the eubacterial plasma membrane and the endoplasmic reticular membrane of eukaryotes. Second, in addition to their role as determinants for protein targeting and translocation, certain signal peptides have a signallingfunction.During or shortly after pre-protein translocation, the signal peptide is removed by signal peptidases. The integral membrane protein, SppA (protease IV), of Escherichia coli was shown experimentally to degrade signal peptides. The member of this family from Bacillus subtilis has only been shown to be required for efficient processing of pre-proteins under conditions of hyper-secretion [].
Helicases have been classified in 5 superfamilies (SF1-SF5). All of theproteins bind ATP and, consequently, all of them carry the classical Walker A(phosphate-binding loop or P-loop) and Walker B(Mg2+-binding aspartic acid) motifs. For the two largest groups, commonlyreferred to as SF1 and SF2, a total of seven characteristic motifs have beenidentified []which are distributed over two structural domains, anN-terminal ATP-binding domain and a C-terminal domain. UvrD-like DNA helicasesbelong toSF1, but they differ from classical SF1/SF2 by alarge insertion in each domain. UvrD-like DNA helicases unwind DNA with a3'-5' polarity [].Crystal structures of several uvrD-like DNA helicases have been solved [, , ]. They are monomeric enzymes consisting of twodomains with a common α-β RecA-like core. The ATP-binding site issituated in a cleft between the N terminus of the ATP-binding domain and thebeginning of the C-terminal domain. The enzyme crystallizes in two differentconformations (open and closed). The conformational difference between the twoforms comprises a large rotation of the end of the C-terminal domain byapproximately 130 degrees. This "domain swiveling"was proposed to be an importantaspect of the mechanism of the enzyme [].Some proteins that belong to the UvrD-like DNA helicase family are listedbelow:Bacterial UvrD helicase. It is involved in the post-incision events ofnucleotide excision repair and methyl-directed mismatch repair. It unwindsDNA duplexes with 3'-5' polarity with respect to the bound strand andinitiates unwinding most effectively when a single-stranded region ispresent.Gram-positive bacterial pcrA helicase, an essential enzyme involved in DNArepair and rolling circle replication. The Staphylococcus aureus pcrAhelicase has both 5'-3' and 3'-5' helicase activities.Bacterial rep proteins, a single-stranded DNA-dependent ATPase involved inDNA replication which can initiate unwinding at a nick in the DNA. It bindsto the single-stranded DNA and acts in a progressive fashion along the DNAin the 3' to 5' direction.Bacterial helicase IV (helD gene product). It catalyzes the unwinding ofduplex DNA in the 3'-5' direction.Bacterial recB protein. RecBCD is a multi-functional enzyme complex thatprocesses DNA ends resulting from a double-strand break. RecB is a helicasewith a 3'-5' directionality.Fungal srs2 proteins, an ATP-dependent DNA helicase involved in DNA repair. The polarity of the helicase activity was determined to be 3'-5'.This domain is also found bacterial helicase-nuclease complex AddAB, both in subunit AddA and AddB. The AddA subunit is responsable for the helicase activity. AddB also harbors a putative ATP-binding domain which does not play a role as a secondary DNA motor, but that it may instead facilitate the recognition of the recombination hotspot sequences [].This entry represents the ATP-binding domain found in AddA, AddB and UvrD-like helicases.
Helicases have been classified in 5 superfamilies (SF1-SF5) []. All of the proteins bind ATP and, consequently, all of them carry the classical Walker A (phosphate-binding loop or P-loop), and Walker B (Mg2+-binding aspartic acid) motifs []. For the two largest groups, commonly referred to as SF1 and SF2, a total of seven characteristic motifs have been identified []which are distributed over two structural domains, an N-terminal ATP-binding domain and a C-terminal domain.This entry represents the C-terminal domain.UvrD-like DNA helicases belong to SF1, but they differ from classical SF1/SF2 by a large insertion in each domain. UvrD-like DNA helicases unwind DNA with a 3'-5' polarity []. Crystal structures of several uvrD-like DNA helicases have been solved [, , ]. They are monomeric enzymes consisting of two domains with a common α-β RecA-like core. The ATP-binding site is situated in a cleft between the N terminus of the ATP-binding domain and the beginning of the C-terminal domain. The enzyme crystallizes in two different conformations (open and closed). The conformational difference between the two forms comprises a large rotation of the end of the C-terminal domain by approximately 130 degrees. This "domain swiveling"was proposed to be an important aspect of the mechanism of the enzyme [].Some proteins that belong to the uvrD-like DNA helicase family are listed below:Bacterial UvrD helicase. It is involved in the post-incision events of nucleotide excision repair and methyl-directed mismatch repair. It unwinds DNA duplexes with 3'-5' polarity with respect to the bound strand and initiates unwinding most effectively when a single-stranded region is present.Gram-positive bacterial pcrA helicase, an essential enzyme involved in DNA repair and rolling circle replication. The Staphylococcus aureus pcrA helicase has both 5'-3' and 3'-5' helicase activities. Bacterial rep proteins, a single-stranded DNA-dependent ATPase involved in DNA replication which can initiate unwinding at a nick in the DNA. It binds to the single-stranded DNA and acts in a progressive fashion along the DNA in the 3' to 5' direction.Bacterial helicase IV (helD gene product). It catalyzes the unwinding of duplex DNA in the 3'-5' direction.Bacterial recB protein. RecBCD is a multi-functional enzyme complex that processes DNA ends resulting from a double-strand break. RecB is a helicase with a 3'-5' directionality.Fungal srs2 proteins, an ATP-dependent DNA helicase involved in DNA repair. The polarity of the helicase activity was determined to be 3'-5'.
Family X DNA polymerases (PolX) are involved in DNA repair, being evolutionarily conserved in prokaryotes, eukaryotes and archaea. All DNA polymerases from this family are single-subunit enzymes, lacking the 3'-5' exonuclease activity and displaying very low processivity during primer extension reactions []. Proteins in this family include in the well-characterized mammalian Pol beta; more recently discovered eukaryotic polymerases lambda, and mu; and a template-independent polymerase, terminal transferase (TdT) []. In eukaryotes, Pol beta fills short nucleotide gaps produced during base excision repair (BER) []. Pols beta, lamda, mu can also take part in translesion DNA synthesis (TLS) []. Their structures have been revealed [, , ].DNA carries the biological information that instructs cells how to existin an ordered fashion: accurate replication is thus one of the mostimportant events in the cell life cycle. This function is mediated byDNA-directed DNA-polymerases, which add nucleotide triphosphate (dNTP)residues to the 3'-end of the growing DNA chain, using a complementary DNA as template. Small RNA molecules are generally used as primers forchain elongation, although terminal proteins may also be used. Three motifs, A, B and C [], are seen to be conserved across all DNA-polymerases, with motifs A and C also seen in RNA- polymerases. They are centred on invariant residues, and their structural significance was implied from the Klenow (Escherichia coli) structure: motif A contains a strictly-conserved aspartate at the junction of a β-strand and an α-helix; motif B contains an α-helix with positive charges; and motif C has a doublet of negative charges, located in a β-turn-beta secondary structure [].DNA polymerases () can be classified, on the basis of sequencesimilarity [, ], into at least four different groups: A, B, C and X. Members of family X are small (about 40kDa) compared with other polymerases and encompass two distinct polymerase enzymes that have similar functionality: vertebrate polymerase beta (same as yeast pol 4), and terminal deoxynucleotidyl-transferase (TdT) (). The former functions in DNA repair, whilethe latter terminally adds single nucleotides to polydeoxynucleotide chains.Both enzymes catalyse addition of nucleotides in a distributive manner, i.e. theydissociate from the template-primer after addition of each nucleotide.DNA-polymerases show a degree of structural similarity with RNA-polymerases.
Tensins constitute an eukaryotic family of lipid phosphatases that are defined by thepresence of two adjacent domains: a lipid phosphatase domain and a C2-like domain. The tensin-type C2 domain has a structure similar to the classical C2 domain (see ) that mediates the Ca2+-dependent membrane recruitment of several signalling proteins. However the tensin-type C2 domain lacks two of the three conserved loops that bind Ca2+, and in this respect it is similar to the C2 domains of PKC-type [, ]. The tensin-type C2 domain can bind phopholipid membranes in a Ca2+ independent manner []. In the tumour suppressor protein PTEN, the best characterised member of the family, the lipid phosphatase domain was shown to specifically dephosphorylate the D3 position of the inositol ring of the lipid second messenger, phosphatydilinositol-3-4-5-triphosphate (PIP3). The lipid phosphatase domain contains the signature motif HCXXGXXR present in the active sites of protein tyrosine phosphatases (PTPs) and dual specificity phosphatases (DSPs). Furthermore, two invariant lysines are found only in the tensin-type phosphatase motif (HCKXGKXR) and are suspected to interact with the phosphate group at position D1 and D5 of the inositol ring [, ]. The C2 domain is found at the C terminus of the tumour suppressor protein PTEN (phosphatidyl-inositol triphosphate phosphatase). This domain may include a CBR3 loop, indicating a central role in membrane binding. This domain associates across an extensive interface with the N-terminal phosphatase domain DSPc suggesting that the C2 domain productively positions the catalytic part of the protein on the membrane. The crystal structure of the PTEN tumour suppressor has been solved []. The lipid phosphatase domain has a structure similar to the dual specificity phosphatase (see ). However, PTEN has a larger active site pocket that could be important to accommodate PI(3,4,5)P3. Proteins known to contain a phosphatase and a C2 tensin-type domain are listed below: Tensin, a focal-adhesion molecule that binds to actin filaments. It may be involved in cell migration, cartilage development and in linking signal transduction pathways to the cytoskeleton.Phosphatase and tensin homologue deleted on chromosome 10 protein (PTEN). It antagonizes PI 3-kinase signalling by dephosphorylating the 3-position of the inositol ring of PI(3,4,5)P3 and thus inactivates downstream signalling. It plays major roles both during development and in the adult to control cell size, growth, and survival.Auxilin. It binds clathrin heavy chain and promotes its assembly into regular cages.Cyclin G-associated kinase or auxilin-2. It is a potential regulator of clathrin-mediated membrane trafficking.
The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [, ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].Phenylalanyl-tRNA synthetase () is an alpha2/beta2 tetramer composed of 2 subunits that belongs to class IIc. In eubacteria, a small subunit (pheS gene) can be designated as beta (E. coli) or alpha subunit (nomenclature adopted in InterPro). Reciprocally the large subunit(pheT gene) can be designated as alpha (E. coli) or beta (see and ). In all other kingdoms the two subunits have equivalent length in eukaryota, and can be identified by specific signatures. The enzyme from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the synthetase family. Identification of phenylalanyl-tRNA synthetase as a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other synthetases [].
Rodent urinary proteins (mouse major urinary proteins or MUPs and rat alpha-2u globulins) are the major protein components of rodent urine and transport pheromones [].Rodent urine contains an unusually large amount of protein. The major site of MUP synthesis is the liver; the protein is secreted by the liver into serum, where it circulates at relatively low levels before being rapidly filteredby the kidney and excreted.The sex-dependent expression of MUP (adult male mice secrete 5-20 times as much MUP as do females) and its ability to bind a number of odorant molecules is consistent with the suggestion that MUP acts as a pheromonetransporter; the protein may be excreted into the urine carrying a boundpheromone, which is released as the urine dries and the protein denatures.The crystal structure of MUP has been solved []and is known to be a member of the lipocalin family. Alpha-2u-globulin, a close homologue of MUP, accounts for 30-50% of totalexcreted protein in adult male rat urine. As its electrophoretic mobilityis similar to that of serum a2 globulin, it was named 'alpha-2u-globulin',the subscript 'u' denoting its origin in urine. Alpha-2u-globulin is secreted into the plasma by a number of tissues, where it circulates beforefiltration through the kidney; between 20 and 50% is reabsorbed by theproximal tubule of the nephron, the rest being excreted. Although the exactphysiological role of alpha-2u-globulin is unclear, there is circumstantialevidence that it functions in pheromone transport. This is consistent withits observed binding properties, its close similarity with MUP and the knownproperties of male rat urine.Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature SubcommitteeKing T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed ofthe first three letters of the genus; a space; the first letter of thespecies name; a space and an arabic number. In the event that two speciesnames have identical designations, they are discriminated from one anotherby adding one or more letters (as necessary) to each species designation.The allergens in this family include allergens with the following designations: Mus m 1 and Rat m 1.
The short-chain dehydrogenases/reductases family (SDR) []is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was drosophila alcohol dehydrogenase, this family used to be called [, , ]'insect-type', or 'short-chain' alcohol dehydrogenases. Most members of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least two domains [], the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains [].Insect ADH is very different from yeast and mammalian ADHs. The enzyme from Drosophila lebanonensis (Fruit fly) has been characterised by protein analysis and wasfound to have a 254-residue protein chain with an acetyl-blocked N-terminalMet []. Comparisons with the enzyme from other species reveals that theyhave diverged considerably. The structural variation within drosophila is about as large as that for mammalian zinc-containing alcohol dehydrogenase.The crystal structure of the apo form of drosophila ADH has been solved to1.9A resolution []. Three structural features characterise the active site architecture: (i) a deep cavity, covered by aflexible 33-residue loop and an 11-residue C-terminal tail of the neighbouring subunit, whose hydrophobic surface is likely to increase thespecificity of the enzyme for secondary aliphatic alcohols; (ii) the Ser-Tyr-Lys residues of the catalytic triad are known to be involved inenzymatic catalysis; and (iii) three well-ordered water molecules in hydrogen bonding distance of side-chains of the catalytic triad may be significantfor the proton release steps in the catalysis. A number of proteins within the SDR family share a strong phylogeneticrelationship with insect ADH. Amongst these are drosophila ADH-relatedprotein (duplicate of Adh or Adh-dup) []; drosophila fat body protein; and development-specific 25Kd protein from Sarcophaga peregrina (Flesh fly).
The short-chain dehydrogenases/reductases family (SDR) []is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was Drosophila alcohol dehydrogenase, this family used to be called [, , ]'insect-type', or 'short-chain' alcohol dehydrogenases. Most members of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least two domains [], the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains [].Insect ADH is very different from yeast and mammalian ADHs. The enzyme from Drosophila lebanonensis (Fruit fly) has been characterised by protein analysis and wasfound to have a 254-residue protein chain with an acetyl-blocked N-terminalMet []. Comparisons with the enzyme from other species reveals that theyhave diverged considerably. The structural variation within Drosophila is about as large as that for mammalian zinc-containing alcohol dehydrogenase.The crystal structure of the apo form of D. lebanonensis ADH has been solved to1.9A resolution. Three structural features characterise the active site architecture: (i) a deep cavity, covered by aflexible 33-residue loop and an 11-residue C-terminal tail of the neighbouring subunit, whose hydrophobic surface is likely to increase thespecificity of the enzyme for secondary aliphatic alcohols; (ii) the Ser-Tyr-Lys residues of the catalytic triad are known to be involved inenzymatic catalysis; and (iii) three well-ordered water molecules in hydrogen bonding distance of side-chains of the catalytic triad may be significantfor the proton release steps in the catalysis.A number of proteins within the SDR family share a strong phylogeneticrelationship with insect ADH. Amongst these are Drosophila ADH-relatedprotein (duplicate of Adh or Adh-dup) []; Drosophila fat body protein; and development-specific 25Kd protein from Sarcophaga peregrina (Flesh fly). This group specifically identifies proteins related to Ceratitis capitata (Mediterranean fruit fly).
The short-chain dehydrogenases/reductases family (SDR) []is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was Drosophila alcohol dehydrogenase, this family used to be called [, , ]'insect-type', or 'short-chain' alcohol dehydrogenases. Most members of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least two domains [], the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains [].Insect ADH is very different from yeast and mammalian ADHs. The enzyme from Drosophila lebanonensis (Fruit fly) has been characterised by protein analysis and wasfound to have a 254-residue protein chain with an acetyl-blocked N-terminalMet []. Comparisons with the enzyme from other species reveals that theyhave diverged considerably. The structural variation within Drosophila is about as large as that for mammalian zinc-containing alcohol dehydrogenase.The crystal structure of the apo form of D. lebanonensis ADH has been solved to1.9A resolution []. Three structural features characterise the active site architecture: (i) a deep cavity, covered by aflexible 33-residue loop and an 11-residue C-terminal tail of the neighbouring subunit, whose hydrophobic surface is likely to increase thespecificity of the enzyme for secondary aliphatic alcohols; (ii) the Ser-Tyr-Lys residues of the catalytic triad are known to be involved inenzymatic catalysis; and (iii) three well-ordered water molecules in hydrogen bonding distance of side-chains of the catalytic triad may be significantfor the proton release steps in the catalysis.A number of proteins within the SDR family share a strong phylogeneticrelationship with insect ADH. Amongst these are Drosophila ADH-relatedprotein (duplicate of Adh or Adh-dup) []; drosophila fat body protein; and development-specific 25Kd protein from Sarcophaga peregrina (Flesh fly).
5-hydroxytryptamine (5-HT) or serotonin, is a neurotransmitter that it is primarily found in the gastrointestinal (GI) tract, platelets, and in the central nervous system (CNS). It is implicated in a vast array of physiological and pathophysiological pathways. Receptors for 5-HT mediate both excitatory and inhibitory neurotransmission, and modulate the release of many neurotransmitters including glutamate, GABA, dopamine, epinephrine/norepinephrine, and acetylcholine, as well as many hormones, including oxytocin, prolactin, vasopressin and cortisol. In the CNS, 5-HT receptors can influence various neurological processes, such as aggression, anxiety and appetite and, as a, result are the target of a variety of pharmaceutical drugs, including many antidepressants, antipsychotics and anorectics []. The 5-HT receptors are grouped into a number of distinct subtypes, classified according to their antagonist susceptibilities and their affinities for 5-HT. With the exception of the 5-HT3 receptor, which is a ligand-gated ion channel [], all 5-HT receptors are members of the rhodopsin-like G protein-coupled receptor family [], and they activate an intracellular second messenger cascade to produce their responses. The 5-HT2 receptors mediate many of the central and peripheral physiologic functions of 5-hydroxytryptamine. The original 5HT2 receptor (now renamed as the 5-HT2A receptor) was initially classified according to its ability to display micromolar affinity for 5-HT, to be labelled with [3H]spiperone and by its susceptibility to 5-HT antagonists. At least 3 members of the 5HT2 receptor subfamily exist (5-HT2A, 5-HT2B, 5-HT2C), all of which share a high degree of sequence similarity and couple to Gq/G11 to stimulate the phosphoinositide pathway and elevate cytosolic calcium. Cardiovascular effects include contraction of blood vessels and shape changes in platelets; central nervous system effects include neuronal sensitisation to tactile stimuli and mediation of some of the effects of phenylisopropylamine hallucinogens. 5-HT2 receptors display functional selectivity in which the same agonist in different cell types or different agonists in the same cell type can differentially activate multiple, distinct signalling pathways [].The distribution of 5-HT2C is limited to the CNS and choroid plexus []. Activation of the receptor has been shown to exert an inhibitory influence upon frontocortical dopaminergic and adrenergic, but not serotonergic transmission and, in part, to play a role in neuroendocrine function [, , , ]. Additional characteristic behavioural responses attributed to 5-HT2C receptor activation include hypoactivity [, , ], feeding [, , , , ], reproductive behaviour []and thermoregulation []. Chronic treatment with antipsychotic drugs that are 5-HT2 antagonists results in downregulation of both 5-HT2A and 5-HT2C receptors, as does chronic treatment with SSRIs and 5-HT agonists []. However, chronic SSRI treatment may increase 5-HT2C expression, specifically in the choroid plexus [].
5-hydroxytryptamine (5-HT) or serotonin, is a neurotransmitter that it is primarily found in the gastrointestinal (GI) tract, platelets, and in the central nervous system (CNS). It is implicated in a vast array of physiological and pathophysiological pathways. Receptors for 5-HT mediate both excitatory and inhibitory neurotransmission, and modulate the release of many neurotransmitters including glutamate, GABA, dopamine, epinephrine/norepinephrine, and acetylcholine, as well as many hormones, including oxytocin, prolactin, vasopressin and cortisol. In the CNS, 5-HT receptors can influence various neurological processes, such as aggression, anxiety and appetite and, as a, result are the target of a variety of pharmaceutical drugs, including many antidepressants, antipsychotics and anorectics []. The 5-HT receptors are grouped into a number of distinct subtypes, classified according to their antagonist susceptibilities and their affinities for 5-HT. With the exception of the 5-HT3 receptor, which is a ligand-gated ion channel [], all 5-HT receptors are members of the rhodopsin-like G protein-coupled receptor family [], and they activate an intracellular second messenger cascade to produce their responses. The 5-HT1 receptors are a subfamily of 5-HT receptors that were originally classified according to their inhibition of adenylyl cyclase, degree of sequence similarity and their overlapping pharmacological specificities. The subfamily is comprised of five different receptors 5-HT1A, 5-HT1B, 5-HT1D, 5-HT1E, 5-HT1F, and they can couple to Gi/Go and mediate inhibitory neurotransmission, although signalling via other transduction systems are known. One of the 5-HT1 receptors, the 5-HT1E receptor, is yet to achieve receptor status from the International Union of Basic and Clinical Pharmacology (IUPHAR), since a robust response mediated via the protein has not been reported in the literature.This entry represents 5-HT1E receptor. It was first identified in the frontal cortex of the human brain. The exact function of the receptor is presently unknown, due to the lack of selective ligands []. It is thought to be negatively linked to adenylyl cyclase in recombinant cell systems and may have an important evolutionary role in humans []. It is hypothesized that the 5-HT1E receptor is involved in the regulation of memory, due to the high abundance of receptors in the frontal cortex, hippocampus, and olfactory bulb [, ], all of which are regions of the brain integral to memory regulation []. The 5-HT1E receptor, like the 5-HT1F receptor, has high affinity for 5-HT and low affinity for 5-carboxyamidotryptaine and mesulergine []. However, the 5-HT1E receptor has a relatively low affinity for sumatriptan, which sets it apart from the 5-HT1F receptor [].
5-hydroxytryptamine (5-HT) or serotonin, is a neurotransmitter that it is primarily found in the gastrointestinal (GI) tract, platelets, and in the central nervous system (CNS). It is implicated in a vast array of physiological and pathophysiological pathways. Receptors for 5-HT mediate both excitatory and inhibitory neurotransmission, and modulate the release of many neurotransmitters including glutamate, GABA, dopamine, epinephrine/norepinephrine, and acetylcholine, as well as many hormones, including oxytocin, prolactin, vasopressin and cortisol. In the CNS, 5-HT receptors can influence various neurological processes, such as aggression, anxiety and appetite and, as a, result are the target of a variety of pharmaceutical drugs, including many antidepressants, antipsychotics and anorectics []. The 5-HT receptors are grouped into a number of distinct subtypes, classified according to their antagonist susceptibilities and their affinities for 5-HT. With the exception of the 5-HT3 receptor, which is a ligand-gated ion channel [], all 5-HT receptors are members of the rhodopsin-like G protein-coupled receptor family [], and they activate an intracellular second messenger cascade to produce their responses. The 5-HT2 receptors mediate many of the central and peripheral physiologic functions of 5-hydroxytryptamine. The original 5HT2 receptor (now renamed as the 5-HT2A receptor) was initially classified according to its ability to display micromolar affinity for 5-HT, to be labelled with [3H]spiperone and by its susceptibility to 5-HT antagonists. At least 3 members of the 5HT2 receptor subfamily exist (5-HT2A, 5-HT2B, 5-HT2C), all of which share a high degree of sequence similarity and couple to Gq/G11 to stimulate the phosphoinositide pathway and elevate cytosolic calcium. Cardiovascular effects include contraction of blood vessels and shape changes in platelets; central nervous system effects include neuronal sensitisation to tactile stimuli and mediation of some of the effects of phenylisopropylamine hallucinogens. 5-HT2 receptors display functional selectivity in which the same agonist in different cell types or different agonists in the same cell type can differentially activate multiple, distinct signalling pathways [].The 5-HT2B receptor has been shown to be distributed in a range of tissues, including human gut, brain and the cardiovascular system [, , , ]. In the cardiovascular system the 5-HT2B receptor regulates cardiac structure and function []. 5-HT2B receptor stimulation can also lead to pathological proliferation of cardiac valve fibroblasts [], which, with chronic overstimulation, can lead to a severe valvulopathy. In addition, the 5-HT2B receptor has been shown to be involved in pulmonary hypertension via vasoconstriction []. As a result 5-HT2B antagonists have been developed as treatments for chronic heart disease [, ]. In the CNS the 5-HT2B receptorhas been shown to be involved in presynaptic inhibition, leading to behavioural effects [], since it is important to the normal regulation of serotonin levels in the blood plasma []and abnormal release produced by drugs such as MDMA [].
Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.CLC-3 is a member of the CLC family initially cloned from rat kidney []and localised to chromosome 4 in humans []; the human isoform contains762 amino acid residues. Together with CLC-4 and CLC-5, it forms a distinct branch of the CLC gene family, the three members showing ~80% residue identity. Expression of CLC-3 produces outwardly-rectifying Cl-currents that are inhibited by protein kinase C activation [, ]. More recently, it has been suggested that CLC-3 may be a ubiquitous swelling-activated Cl-channel that has very similar characteristics to those of native volume-regulated Cl-currents [].
Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function asCl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.CLC-6 (also known as H(+)/Cl(-) exchange transporter 6) is a CLC that, together with CLC-7, forms a distinct branch of the CLC gene family. CLC-6 consists of 869 amino acids residues (human isoform)and is ~45% identical to CLC-7 (at the amino acid level). Analysis of human CLC-6 mRNAs reveals that transcripts of the encoding gene (CLCN6) are alternatively-spliced, resulting in the expression of four different CLC-6 isoforms (CLC-6a to CLC-6d). These show different levels of abundance and tissue distribution patterns, with one, CLC-6c, apparently being a kidney-specific isoform []. The functionality of CLC-6 has been proven but its exact biophysical properties remain unknown [, ]. This protein seems to be mostly expressed in neurons of the central and peripheral nervous systems [].
Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.CLC-5, with 746 amino acid residues, is a member of the CLC family thatshows most similarity to the CLC-3 and CLC-4 channels, to which it is ~80%identical at the amino acid level. It is predominantly expressed in thekidney, but can be found in the brain and liver []. As mentioned above,mutations in the CLCN5 gene cause certain hereditary kidney stone diseases,including Dent's disease, an X-chromosome linked syndrome characterised byproteinuria, hypercalciuria, and kidney stones (nephrolithiasis), leading toprogressive renal failure. When the native protein is expressed, it givesrises to strongly outwardly-rectifying Cl-currents; however, the mutatedchannel forms show loss-of-function [, ]. Recent studies have suggestedthat CLC-5 may play an important role in endocytosis in renal proximaltubule cells (probably by providing a shunt for the potential generated bythe H+-ATPase), and that disruption of this function may impair endocytosis,accounting for the proteinuria observed in Dent's disease [].
Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.CLC-0 is the principal Cl-channel of the electric organ of Torpedospecies. These marine electric rays generate high voltage pulses (to stuntheir prey) by the concerted action of Cl-channels and nicotinic acetylcholine receptors, in specialised cells known as electrocytes. Theproperties of the CLC-0 channel (consisting of 805-809 amino acids) havebeen extensively studied after reconstitution into lipid bilayers. It hasa peculiar double-barrelled structure, appearing to have two identical ionpores that close and open independently, but which can be also closedtogether by another common gate. Further evidence also suggests it mayfunction as a homodimer [, ].
Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.CLC-2 is a member of the CLC family that is ubiquitously expressed inmammalian tissues. It is 898 amino acid residues in length (human isoform)and shows ~50% amino acid identity to CLC-1, to which it is most closelyrelated []. The channel is normally closed at physiological membranepotentials, but can be activated by rather strong hyperpolarisation.However, it is activated by cell swelling, suggesting a role for it in cellvolume regulation. It is also activated by acidic extracellular pH; theregion of the molecule (near the N terminus) that imparts sensitivity toboth cell swelling and extracellular pH has been elucidated [].
Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.CLC-7 is a CLC that, together with CLC-6, forms a distinct branch of theCLC gene family. CLC-7 consists of 789 amino acid residues (human isoform)and is ~45% identical to CLC-6 (at the amino acid level). CLC-7 is broadlyexpressed, but, to date, functional studies have not generated measurableCl-currents; its identification as a functional Cl-channel thereforeremainsputative. Interestingly, CLC-7 is the only known eukaryotic CLCprotein to lack a highly conserved glycosylation site between hydrophobicdomains D8 and D9 [].
Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.In plants, chloride channels contribute to a number of plant-specificfunctions, such as regulation of turgor, stomatal movement, nutrienttransport and metal tolerance. By contrast with Cl-channels in animal cells, they are also responsible for the generation of action potentials.The best documented examples are the chloride channels of guard cells,which control opening and closing of stomata. Recently, four homologousproteins that belong to the CLC family have been cloned from Arabidopsis thaliana (Mouse-ear cress) []. Hydropathy analysis suggests that they havea similar membrane topology to other CLC proteins, with up to 12 TM domains.Expression in Xenopus oocytes failed to generate measurable Cl-currents,although protein analysis suggested they had been synthesised and insertedinto cell membranes. However, similar CLC proteins have since been clonedfrom other plants, and one, CIC-Nt1 (from tobacco), has been demonstrated toform funtional Cl-channels, suggesting that at least some of these proteinsdo function as Cl-channels in plants [].
Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport [].The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata [], and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains []. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm [].A number of human disease-causingmutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia []. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease []. These mutations have been demonstrated to reduce or abolish CLC function.Two highly similar members of the CLC family have been cloned that appearto be kidney-specific isoforms. These are known as CLC-Ka and CLC-Kb inhumans and are ~90% identical (at the amino acid level); in other species,they are named CLC-K1 and CLC-K2 [, ]. Within species, the two isoformsshow differing distribution patterns in the kidney, possibly suggestingdiferent roles in renal function. To date, attempts at functional expressionof CLC-K isoforms have not yielded measurable Cl-currents; however, thatthey play a key role in normal kidney function had been made clear by thefact that naturally occurring mutations in the human gene CLCNKB (encodingCLC-Kb) lead to a form of Bartter's syndrome, an inherited kidney diseasecharacterised by hypokalaemic alkalosis []. Similarly, transgenic mice,whose CLC-K1 channel has been rendered dysfunctional by targeted genedisruption, develop overt diabetes, suggesting that these channels areimportant for urinary concentration [].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups []. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [, , ].Thrombin is a coagulation protease that activates platelets, leukocytes, endothelial and mesenchymal cells at sites of vascular injury, acting partlythrough an unusual proteolytically activated GPCR []. Gene knockout experiments have provided definitive evidence for a second thrombin receptorin mouse platelets and have suggested tissue-specific roles for differentthrombin receptors. Because the physiological agonist at the receptor wasoriginally unknown, it was provisionally named protease-activated receptor(PAR) []. At least 4 PAR subtypes have now been characterised. Thus, the thrombin and PAR receptors constitute a fledgling receptor family that shares a novel proteolytic activation mechanism [].
The Macro or A1pp domain is a module of about 180 amino acids which can bind ADP-ribose (an NAD metabolite) or related ligands. Binding to ADP-ribose could be either covalent or non-covalent []: in certain cases it is believed to bind non-covalently []; while in other cases (such as Aprataxin) it appears to bind both non-covalently through a zinc finger motif, and covalently through a separate region of the protein []. The domain was described originally in association with ADP-ribose 1''-phosphate (Appr-1''-P) processing activity (A1pp) of the yeast YBR022W protein []. The domain is also called Macro domain as it is the C-terminal domain of mammalian core histone macro-H2A [, ]. Macro domain proteins can be found in eukaryotes, in (mostly pathogenic) bacteria, in archaea and in ssRNA viruses, such as coronaviruses [, ], Rubella and Hepatitis E viruses. In vertebrates the domain occurs e.g. in histone macroH2A, in predicted poly-ADP-ribose polymerases (PARPs) and in B aggressive lymphoma (BAL) protein. The macro domain can be associated with catalytic domains, such as PARP, or sirtuin. The Macro domain can recognise ADP-ribose or in some cases poly-ADP-ribose, which can be involved in ADP-ribosylation reactions that occur in important processes, such as chromatin biology, DNA repair and transcription regulation []. The human macroH2A1.1 Macro domain binds an NAD metabolite O-acetyl-ADP-ribose []. The Macro domain has been suggested to play a regulatory role in ADP-ribosylation, which is involved in inter- and intracellular signaling, transcriptional regulation, DNA repair pathways and maintenance of genomic stability, telomere dynamics, cell differentiation and proliferation, and necrosis and apoptosis. The 3D structure of the SARS-CoV Macro domain has a mixed α/β fold consisting of a central seven-stranded twisted mixed β-sheet sandwiched between two α-helices on one face, and three on the other. The final α-helix, located on the edge of the central β-sheet, forms the C terminus of the protein []. The crystal structure of AF1521 (a Macro domain-only protein from Archaeoglobus fulgidus) has also been reported and compared with other Macro domain containing proteins. Several Macro domain only proteins are shorter than AF1521, and appear to lack either the first strand of the β-sheet or the C-terminal helix 5. Well conserved residues form a hydrophobic cleft and cluster around the AF1521-ADP-ribose binding site [, , , ].
Protein phosphorylation, which plays a key role in most cellular activities, is a reversible process mediated by protein kinases and phosphoprotein phosphatases. Protein kinases catalyse the transfer of the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. Phosphoprotein phosphatases catalyse the reverse process. Protein kinases fall into three broad classes, characterised with respect to substrate specificity []:Serine/threonine-protein kinasesTyrosine-protein kinasesDual specificity protein kinases (e.g. MEK - phosphorylates both Thr and Tyr on target proteins)Protein kinase function is evolutionarily conserved from Escherichia coli to human []. Protein kinases play a role in a multitude of cellular processes, including division, proliferation, apoptosis, and differentiation []. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins. The catalytic subunits of protein kinases are highly conserved, and several structures have been solved [], leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases [].In the absence of cAMP, protein kinase A (PKA) exists as an equimolar tetramer of regulatory (R) and catalytic (C) subunits. In addition to its role as an inhibitor of the C subunit, the R subunit anchors the holoenzyme to specific intracellular locations and prevents the C subunit from entering the nucleus. Typical R subunits have a conserved domain structure, consisting of the N-terminal dimerisation domain, inhibitory region, cAMP-binding domain A and cAMP-binding domain B. R subunits interact with C subunits primarily through the inhibitory site. The cAMP-binding domains show extensive sequence similarity and bind cAMP cooperatively.On the basis of phylogenetic trees generated from multiple sequence alignment of complete sequences, this family was divided into four sub-families, types I to IV []. Types I and II, found in animals, differ in molecular weight, sequence, autophosphorylation capability, cellular location and tissue distribution. Types I and II are further sub-divided into alpha and beta subtypes, based mainly on sequence similarity. Type III are from fungi and type IV are from alveolates.
Steroid or nuclear hormone receptors (NRs) constitute an important superfamily of transcription regulators that are involved in widely diverse physiological functions, including control of embryonic development, cell differentiation and homeostasis. Members of the superfamily include the steroid hormone receptors and receptors for thyroid hormone, retinoids, 1,25-dihydroxy-vitamin D3 and a variety of other ligands []. The proteins function as dimeric molecules in nuclei to regulate the transcription of target genes in a ligand-responsive manner [, ]. In addition to C-terminal ligand-binding domains, these nuclear receptors contain a highly-conserved, N-terminal zinc-finger that mediates specific binding to target DNA sequences, termed ligand-responsive elements. In the absence of ligand, steroid hormone receptors are thought to be weakly associated with nuclear components; hormone binding greatly increases receptor affinity.NRs are extremely important in medical research, a large number of them being implicated in diseases such as cancer, diabetes, hormone resistance syndromes, etc. While several NRs act as ligand-inducible transcription factors, many do not yet have a defined ligand and are accordingly termed 'orphan' receptors. During the last decade, more than 300 NRs have been described, many of which are orphans, which cannot easily be named due to current nomenclature confusions in the literature. However, a new system has recently been introduced in an attempt to rationalise the increasingly complex set of names used to describe superfamily members.The retinoic acid (retinoid X) receptor consists of 3 functional and structural domains: an N-terminal (modulatory) domain; a DNA binding domainthat mediates specific binding to target DNA sequences (ligand-responsiveelements); and a hormone binding domain. The N-terminal domain differs between retinoic acid isoforms; the small highly-conserved DNA-bindingdomain (~65 residues) occupies the central portion of the protein; and the ligand binding domain lies at the receptor C terminus.This entry represents retinoidX receptors. It also represents hepatocyte nuclear factor 4 (HNF4), which is a nuclear receptor protein expressed in the liver and kidney, and functions as a key regulator of many metabolic pathways. HNF4 was originally classified as an orphan receptor. Linoleic acid has now been identified as the endogenous ligand for HNF4 in mammalian cells [].
Tubby, an autosomal recessive mutation, mapping to mouse chromosome 7, was recently found to be the result of a splicing defect in a novel gene with unknown function. This mutation maps to the tub gene [, ]. The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. By contrast with the rapid juvenile-onset weight gain seen in diabetes (db) and obese (ob) mice, obesity in tubby mice develops gradually, and strongly resembles the late-onset obesity observed in the human population. Excessive deposition of adipose tissue culminates in a two-fold increase of body weight. Tubby mice also suffer retinal degeneration and neurosensory hearing loss. The tripartite character of the tubby phenotype is highly similar to human obesity syndromes, such as Alstrom and Bardet-Biedl. Although these phenotypes indicate a vital role for tubby proteins, no biochemical function has yet been ascribed to any family member [], although it has been suggested that the phenotypic features of tubby mice may be the result of cellular apoptosis triggered by expression of the mutated tub gene. TUB is the founding-member of the tubby-like proteins, the TULPs. TULPs are found in multicellular organisms from both the plant and animal kingdoms. Ablation of members of this protein family cause disease phenotypes that are indicative of their importance in nervous-system function and development [].Mammalian TUB is a hydrophilic protein of ~500 residues. The N-terminal () portion of the protein is conserved neither in length nor sequence, but, in TUB, contains the nuclear localisation signal and may have transcriptional-activation activity. The C-terminal 250 residues are highly conserved. The C-terminal extremity contains a cysteine residue that might play an important role in the normal functioning of these proteins. The crystal structure of the C-terminal core domain from mouse tubby has been determined to 1.9A resolution. This domain is arranged as a 12-stranded, all anti-parallel, closed β-barrel that surrounds a central alpha helix, (which is at the extreme carboxyl terminus of the protein) that forms most of the hydrophobic core. Structural analyses suggest that TULPs constitute a unique family of bipartite transcription factors [].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the AN1-type zinc finger domain, which has a dimetal (zinc)-bound alpha/beta fold. This domain was first identified as a zinc finger at the C terminus of AN1 , a ubiquitin-likeprotein in Xenopus laevis []. The AN1-type zinc finger contains six conserved cysteines and two histidines that could potentially coordinate 2 zinc atoms.Certain stress-associated proteins (SAP) contain AN1 domain, often in combination with A20 zinc finger domains (SAP8) or C2H2 domains (SAP16) []. For example, the human protein Znf216 has an A20 zinc-finger at the N terminus and an AN1 zinc-finger at the C terminus, acting to negatively regulate the NFkappaB activation pathway and to interact with components of the immune response like RIP, IKKgamma and TRAF6. The interact of Znf216 with IKK-gamma and RIP is mediated by the A20 zinc-finger domain, while its interaction with TRAF6 is mediated by the AN1 zinc-finger domain; therefore, both zinc-finger domains are involved in regulating the immune response []. The AN1 zinc finger domain is also found in proteins containing a ubiquitin-like domain, which are involved in the ubiquitination pathway []. Proteins containing an AN1-type zinc finger include:Ascidian posterior end mark 6 (pem-6) protein [].Human AWP1 protein (associated with PRK1), which is expressed during early embryogenesis [].Human immunoglobulin mu binding protein 2 (SMUBP-2), mutations in which cause muscular atrophy with respiratory distress type 1 [].
Two highly similar activities are represented in this group: thymidine phosphorylase (TP, gene deoA, ) and pyrimidine-nucleoside phosphorylase (PyNP, gene pdp, ). Both are dimeric enzymes that function in the salvage pathway to catalyse the reversible phosphorolysis of pyrimidine nucleosides to the free base and sugar moieties. In the case of thymidine phosphorylase, thymidine (and to a lesser extent, 2'-deoxyuridine) is lysed to produce thymine (or uracil) and 2'-deoxyribose-1-phosphate. Pyrimidine-nucleoside phosphorylase performs the analogous reaction on thymidine (to produce the same products) and uridine (to produce uracil and ribose-1-phosphate). PyNP is typically the only pyrimidine nucleoside phosphorylase encoded by Gram-positive bacteria, while eukaryotes and proteobacteria encode two: TP, and the unrelated uridine phosphorylase. In humans, TP was originally characterised as platelet-derived endothelial cell growth factor and gliostatin []. Structurally, the enzymes are homodimers, each composed of a rigid all α-helix lobe and a mixed α-helix/β-sheet lobe, which are connected by a flexible hinge [, ]. Prior to substrate binding,the lobes are separated by a large cleft. A functional active site and subsequent catalysis occurs upon closing of the cleft. The active site, composed of a phosphate binding site and a (deoxy)ribonucleotide binding site within the cleft region, is highly conserved between the two enzymes of this group. Active site residues (Escherichia coli DeoA numbering) include the phosphate binding Lys84 and Ser86 (close to a glycine-rich loop), Ser113, and Thr123, and the pyrimidine nucleoside-binding Arg171, Ser186, and Lys190. Sequence comparison between the active site residues for both enzymes reveals only one difference [], which has been proposed to partially mediate substrate specificity. In TP, position 111 is a methionine, while the analogous position in PyNP is lysine. It should be noted that the uncharacterised archaeal members of this family differ in a number of respects from either of the characterised activities. The residue at position 108 is lysine, indicating the activity might be PyNP-like (though the determinants of substrate specificity have not been fully elucidated). Position 171 is glutamate (negative charge side chain) rather than arginine (positive charge side chain). In addition, a large loop that may "lock in"the substrates within the active site is much smaller than in the characterised members. It is not clear what effect these and other differences have on activity and specificity.
Two highly similar activities are represented in this group: thymidine phosphorylase (TP, gene deoA, ) and pyrimidine-nucleoside phosphorylase (PyNP, gene pdp, ). Both are dimeric enzymes that function in the salvage pathway to catalyse the reversible phosphorolysis of pyrimidine nucleosides to the free base and sugar moieties. In the case of thymidine phosphorylase, thymidine (and to a lesser extent, 2'-deoxyuridine) is lysed to produce thymine (or uracil) and 2'-deoxyribose-1-phosphate. Pyrimidine-nucleoside phosphorylase performs the analogous reaction on thymidine (to produce the same products) and uridine (to produce uracil and ribose-1-phosphate). PyNP is typically the only pyrimidine nucleoside phosphorylase encoded by Gram-positive bacteria, while eukaryotes and proteobacteria encode two: TP, and the unrelated uridine phosphorylase. In humans, TP was originally characterised as platelet-derived endothelial cell growth factor and gliostatin []. Structurally, the enzymes are homodimers, each composed of a rigid all α-helix lobe and a mixed α-helix/β-sheet lobe, which are connected by a flexible hinge [, ]. Prior to substrate binding, the lobes are separated by a large cleft. A functional active site and subsequent catalysis occurs upon closing of the cleft. The active site, composed of a phosphate binding site and a (deoxy)ribonucleotide binding site within the cleft region, is highly conserved betweenthe two enzymes of this group. Active site residues (Escherichia coli DeoA numbering) include the phosphate binding Lys84 and Ser86 (close to a glycine-rich loop), Ser113, and Thr123, and the pyrimidine nucleoside-binding Arg171, Ser186, and Lys190. Sequence comparison between the active site residues for both enzymes reveals only one difference [], which has been proposed to partially mediate substrate specificity. In TP, position 111 is a methionine, while the analogous position in PyNP is lysine. It should be noted that the uncharacterised archaeal members of this family differ in a number of respects from either of the characterised activities. The residue at position 108 is lysine, indicating the activity might be PyNP-like (though the determinants of substrate specificity have not been fully elucidated). Position 171 is glutamate (negative charge side chain) rather than arginine (positive charge side chain). In addition, a large loop that may "lock in"the substrates within the active site is much smaller than in the characterised members. It is not clear what effect these and other differences have on activity and specificity.
The SEA domain has been named after the first three proteins in which it was identified (Sperm protein, Enterokinase and Agrin). The SEA domain has around 120 residues, it is an extracellular domain found in a number of cell surface and secreted proteins in which it could be present in one or two copies []. Many SEA domains possess autoproteolysis activity. The SEA domain is closely associated with regions receiving extensive O-glycosylation and is present adjacent to the transmembrane segment in quite a number of type I transmembrane proteins on the cell surface, such as mucin-1 (MUC1) and Notch receptors and in type II single-pass transmembrane proteins such as enterokinase and matriptases. It also present in interphotoreceptor matrix proteoglycans (IMPG1 and IMPG2) []. It has been proposed that carbohydrates are required to stabilise SEA domains and protect them against proteolytic degradation and that the extent of substitution may control proteolytic processing [, ].The SEA domain contains an about 80-residue conserved region and an about 40-residue segment that separates the conserved region from the subsequent C-terminal domains with an alternating conformation of β-sheets and α-helices. Structural analysis of MUC1 SEA domain revealed that it adopts a ferredoxin-like fold in which the cleavage site is located in the middle of the β-hairpin of the second and third β-strands. MUC1 SEA domain undergoes autoproteolysis at the glycine-serine peptide bond and the Ser responsible of this activity is located in the consensus motif GSXXX (X: a hydrophobic residue) [, , ].Some proteins known to contain a SEA domain include:Vertebrate agrin, an heparan sulfate proteoglycan of the basal lamina of the neuromuscular junction. It is responsible for the clustering of acetylcholine receptors (AChRs) and other proteins at the neuromuscular junction.Mammalian enterokinase. It catalyses the conversion of trypsinogen to trypsin which in turn activates other proenzymes, including chymotrypsinogen, procarboxypeptidases, and proelastases.63kDa sea urchin sperm protein (SP63). It might mediate sperm-egg or sperm-matrix interactions.Animal perlecan, a heparan sulfate containing proteoglycan found in all basement membranes. It interacts with other basement membrane components such as laminin and collagen type IV and serves as an attachment substrate for cells.Some vertebrate epithelial mucins. They form a family of secreted and cell surface glycoproteins expressed by epithelial tissues and implicated in epithelial cell protection, adhesion modulation and signaling.Mammalian cell surface antigen 114/A10, an integral transmembrane protein that is highly expressed in hematopoietic progenitor cells and IL-3-dependent cell lines.
The large (alpha, GltB) subunit of bacterial glutamate synthase (GOGAT) consists of three domains: N-terminal domain (amidotransferase domain) or related (in archaeal GOGAT), central domain and the FMN-binding domain, and C-terminal domain. This family represents a stand-alone form of the C-terminal domain. The stand-alone form occurs in the archaeal type of GOGAT, where the large subunit is represented by three separate proteins, corresponding to the three domains of the "standard"bacterial enzyme []. Similar organisation of GOGAT with stand-alone domains has been found in some bacteria (e.g., members from Sinorhizobium meliloti, Thermotoga maritima), but its function is not clear in those organisms where the "standard"bacterial form is also present (e.g., Sinorhizobium meliloti).This domain is also called the GXGXG structural domain, containing repeated sequence motif G-XX-G-XXX-G). It has a right-handed β-helix topology composing seven β-helical turns. It does not have a direct function in glutamate synthase activity but rather a structural function through extensive interactions with the amidotransferase and FMN-binding domains [, ].Originally, only the ORF encoding the central domain of GOGAT has been recognised and annotated as GltB in archaea, and the rest of the large subunit was thought to be missing, which may lead to some miss-annotations []. This has led to speculations that the archaeal form of the GOGAT large subunit is the ancestral minimum form of the enzyme. Later analysis showed, however, that in all archaea where the large subunit has been found, its entire sequence is represented by three separate ORFs [].Glutamate synthase (GOGAT, GltS) is a complex iron-sulphur flavoprotein that catalyses the reductive synthesis of L-glutamate from 2-oxoglutarate and L-glutamine via intramolecular channelling of ammonia, a reaction in the bacterial, yeast and plant pathways for ammonia assimilation []. GOGAT is a multifunctional enzyme that performs L-glutamine hydrolysis, conversion of 2-oxoglutarate into L-glutamate, and electron uptake from an electron donor [].There are four classes of GOGAT [, ]: 1. Bacterial NADPH-dependent GOGAT (NADPH-GOGAT, ). This standard bacterial NADPH-GOGAT is composed of a large (alpha, GltB) subunit and a small (beta, GltD) subunit.2. Ferredoxin-dependent form in cyanobacteria and plants (Fd-GOGAT, ) displays a single-subunit structure corresponding to the large bacterial subunit.3. Pyridine-linked form in both photosynthetic and nonphotosynthetic eukaryotes (eukaryotic GOGAT or NADH-GOGAT, ) displays a single-subunit structure corresponding to the fusion of the small and the large bacterial subunits ().4. The archaeal type with stand-alone proteins corresponding to the N-terminal, FMN-binding, and the C-terminal domains of the large subunit [, ](, , ), and to the small subunit.
This entry represents the CRIB domain. Many putative downstream effectors of the small GTPases Cdc42 and Rac contain a GTPase binding domain (GBD), also called p21 binding domain (PBD), which has been shown to specifically bind the GTP bound form of Cdc42 or Rac, with a preference for Cdc42 [, ]. The most conserved region of GBD/PBD domains is the N-terminal Cdc42/Rac interactive binding motif (CRIB), which consists of about 16 amino acids with the consensus sequence I-S-x-P-x(2,4)-F-x-H-x(2)-H-V-G [].Although the CRIB motif is necessary for the binding to Cdc42 and Rac, it is not sufficient to give high-affinity binding [, ]. A less well conserved inhibitory switch (IS) domain responsible for maintaining the proteins in a basal (autoinhibited) state is located C-terminaly of the CRIB-motif [, , ].GBD domains can adopt related but distinct folds depending on context. Although GBD domains are largely unstructured in the free state, the IS domain forms an N-terminal β-hairpin that immediately follows the conserved CRIB motif and a central bundle of three α-helices in the autoinhibited state. The interaction between GBD domains and their respective G proteins leads to the formation of a high-affinity complex in which unstructured regions of both the effector and the G protein become rigid. CRIB motifs from various GBD domains interact with Cdc42 in a similar manner, forming an intermolecular β-sheet with strand β-2 of Cdc42. Outside the CRIB motif, the C-terminal of the various GBD domains are very divergent and show variation in their mode of binding to Cdc42, perhaps determining the specificity of the interaction. Binding of Cdc42 or Rac to the GBD domain causes a dramatic conformational change, refolding part of the IS domain and unfolding the rest [, , , , ].Some proteins known to contain a CRIB domain are listed below:Mammalian activated Cdc42-associated kinases (ACKs), nonreceptor tyrosine kinases implicated in integrin-coupled pathways.Mammalian p21-activated kinases (PAK1 to PAK4), serine/threonine kinases that modulate cytoskeletal assembly and activate MAP-kinase pathways.Mammalian Actin nucleation-promoting factor WAS (also known as Wiskott-Aldrich Symdrom Proteins, WASPs), non-kinase proteins involved in the organisation of the actin cytoskeleton.Yeast STE20 and CLA4, the homologues of mammalian PAKs. STE20 is involved in the mating/pheromone MAP kinase cascade.
Protein phosphorylation, which plays a key role in most cellular activities, is a reversible process mediated by protein kinases and phosphoprotein phosphatases. Protein kinases catalyse the transfer of the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. Phosphoprotein phosphatases catalyse the reverse process. Protein kinases fall into three broad classes, characterised with respect to substrate specificity []:Serine/threonine-protein kinasesTyrosine-protein kinasesDual specificity protein kinases (e.g. MEK - phosphorylates both Thr and Tyr on target proteins)Protein kinase function is evolutionarily conserved from Escherichia coli to human []. Protein kinases play a role in a multitude of cellular processes, including division, proliferation, apoptosis, and differentiation []. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins. The catalytic subunits of protein kinases are highly conserved, and several structures have been solved [], leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases [].Protein kinases are a group of enzymes that possess a catalytic subunit, which transfers the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues (such as serine, threonine, or tyrosine) in a substrate protein's side chain, resulting in a conformational change affecting protein function. Protein kinase function has been evolutionarily conserved from Escherichia coli to Homo sapiens (Human), where they play a role in a multitude of cellular processes, including division, proliferation, apoptosis, and differentiation [].The catalytic subunits of protein kinases are highly conserved, and several structures have been solved [], leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases [].Anti-Mullerian hormone (AMH), also called Mullerian inhibiting substance, is a member of the transforming growth factor beta (TGF-beta) family that represses the development and function of reproductive organs []. Anti-Mullerian hormone is thought to exert its effects through two membrane-bound serine/threonine kinase receptors, type 2 and type 1. Upon ligand binding, these drive receptor-specific cytoplasmic substrates, the Smad molecules, into the nucleus where they act as transcription factors. A type 2 receptor specific for AMH was cloned through its homology with receptors of TGF-beta family members. Components of the AMH signalling pathway have been identified in gonads and gonadal cell lines. The AMH type II receptor is highly specific. In contrast, the identity of the AMH type I receptor is not clear.
Bicarbonate (HCO3-) transport mechanisms are the principal regulators of pH in animal cells. Such transport also plays a vital role in acid-base movements in the stomach, pancreas, intestine, kidney, reproductive organs and the central nervous system. Functional studies have suggested four different HCO3-transport modes. Anion exchanger proteins exchange HCO3-for Cl-in a reversible, electroneutral manner []. Na+/HCO3-co-transport proteins mediate the coupled movement of Na+and HCO3-across plasma membranes, often in an electrogenic manner []. Na+driven Cl-/HCO3-exchange and K+/HCO3-exchange activities have also been detected in certain cell types, although the molecular identities of the proteins responsible remain to be determined.Sequence analysis of the two families of HCO3-transporters that have been cloned to date (the anion exchangers and Na+/HCO3-co-transporters) reveals that they are homologous. This is not entirely unexpected, given that they both transport HCO3-and are inhibited by a class of pharmacological agents called disulphonic stilbenes []. They share around ~25-30% sequence identity, which is distributed along their entire sequence length, and have similar predicted membrane topologies, suggesting they have ~10 transmembrane (TM) domains.Anion exchange proteins participate in pH and cell volumeregulation. They are glycosylated, plasma-membrane transport proteins thatexchange hydrogen carbonate (HCO3-) for chloride (Cl-) in a reversible,electroneutral manner [, ]. To date three anion exchanger isoforms havebeen identified (AE1-3), AE1 being the previously-characterised erythrocyteband 3 protein. They share a predicted topology of 12-14 transmembrane (TM)domains, but have differing distribution patterns and cellular localisation.The best characterised isoform, AE1, is known to be the most abundantmembrane protein in mature erythrocytes. It has a molecular mass of ~95kDaand consists of two major domains. The N-terminal 390 residues form a water-soluble, highly elongated domain that serves as an attachment site for thebinding of the membrane skeleton and other cytoplasmic proteins. Theremainder of the protein is a 55kDa hydrophobic domain that is responsiblefor catalysing anion exchange. The function of the analogous domains of AE2and AE3 remains to be determined [].AE2 (~1240 amino acids) is a non-erythroid anion exchanger. It was cloned from choroid plexus but has been detected in many organs including the gastrointestinal tract and kidney. It is expressed in both epithelial and non-epithelial cells, and may be present in the Golgi apparatus in addition to the cell membrane []. Three AE2 N-terminal variants have been described, arising due to the presence of alternative promoter sites within the gene. They are referred to as AE2a-c and have differing distribution patterns: AE2a is expressed in all tissues; AE2b exhibits a more restricted distribution, with highest levels in the stomach; and AE2c is expressed only in the stomach [].
The SEA domain has been named after the first three proteins in which it was identified (Sperm protein, Enterokinase and Agrin). The SEA domain has around 120 residues, it is an extracellular domain found in a number of cell surface and secreted proteins in which it could be present in one or two copies []. Many SEA domains possess autoproteolysis activity. The SEA domain is closely associated with regions receiving extensive O-glycosylation and is present adjacent to the transmembrane segment in quite a number of type I transmembrane proteins on the cell surface, such as mucin-1 (MUC1) and Notch receptors and in type II single-pass transmembrane proteins such as enterokinase and matriptases. It also present in interphotoreceptor matrix proteoglycans (IMPG1 and IMPG2) []. It has been proposed that carbohydrates are required to stabilise SEA domains and protect them against proteolytic degradation and that the extent of substitution may control proteolytic processing [, ].The SEA domain contains an about 80-residue conserved region and an about 40-residue segment that separates the conserved region from the subsequent C-terminal domains with an alternating conformation of β-sheets and α-helices. Structural analysis of MUC1 SEA domain revealed that it adopts a ferredoxin-like fold in which the cleavage site is located in the middle of the β-hairpin of the second and third β-strands. MUC1 SEA domain undergoes autoproteolysis at the glycine-serine peptide bond and the Ser responsible of this activity is located in the consensus motif GSXXX (X: a hydrophobic residue) [, , ].Some proteins known to contain a SEA domain include:Vertebrate agrin, an heparan sulfate proteoglycan of the basal lamina of the neuromuscular junction. It is responsible for the clustering of acetylcholine receptors (AChRs) and other proteins at the neuromuscular junction.Mammalian enterokinase. It catalyses the conversion of trypsinogen to trypsin which in turn activates other proenzymes, including chymotrypsinogen, procarboxypeptidases, and proelastases.63kDa sea urchin sperm protein (SP63). It might mediate sperm-egg or sperm-matrix interactions.Animal perlecan, a heparan sulfate containing proteoglycan found in all basement membranes. It interacts with other basement membrane components such as laminin and collagen type IV and serves as an attachment substrate for cells.Some vertebrate epithelial mucins. They form a family of secreted and cell surface glycoproteins expressed by epithelial tissues and implicated in epithelial cell protection, adhesion modulation and signaling.Mammalian cell surface antigen 114/A10, an integral transmembrane protein that is highly expressed in hematopoietic progenitor cells and IL-3-dependent cell lines.
The proteasome (or macropain) () [, , , , ]is a multicatalytic proteinase complex in eukaryotes and archaea, and in some bacteria, that seems to be involved in an ATP/ubiquitin-dependent nonlysosomal proteolytic pathway. In eukaryotes the proteasome is composed of 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) of about 700kDa. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, alpha (A) and beta (B). These are arranged in four rings of seven proteins, consisting of a ring of alpha subunits, two rings of beta subunits, and a ring of alpha subunits. In eukaryotes, each alpha and each beta ring consists of different proteins. Three of the beta subunits are peptidases in subfamily T1A, and each has a distinctive specificity (trypsin-like, chymotrypsin-like and glutamyl peptidase-like). The peptidases are N-terminal nucleophile hydrolases in which the N-terminal threonine is the nucleophile in the hydrolytic reaction []. In the immunoproteasome, the catalytic components are replaced by three specialist, catalytic beta subunits []. In bacteria and archaea there is only one alpha subunit and one beta subunit, and each ring is a homoseptamer.This entry includes the beta subunit of the archaean proteasome (MEROPS identifier T01.002). The archaean proteasome consists of four stacked rings each of which contains a homoheptamer of either alpha or beta components, so that the rings are stacked in the order alpha, beta, beta, alpha. Alpha and beta subunits are homologous to one another, but only beta subunits are proteolytically active. The beta subunits are arranged so that the active sites are directed towards the centre of each ring. The proteasome is therefore a torus structure with a large cavity, and entrance and exit pores at the top and bottom. A dentured protein enters through the top pore, is degraded by the beta subunits into short peptides which exit from the bottom pore. The archaean proteasome is therefore similar to, but a simplified version of, the eukaryote proteasome. The crystal structure of the proteasome from Thermoplasma acidophylumwas the first to be solved, showing a structure similar to that of N-terminal nucleophile hydrolases [], and the beta subunit was found to be the first threonine peptidase [].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents MYND-type zinc finger domains. The MYND domain (myeloid, Nervy, and DEAF-1) is present in a large group of proteins that includes RP-8 (PDCD2), Nervy, and predicted proteins from Drosophila, mammals, Caenorhabditis elegans, yeast, and plants [, , ]. The MYND domain consists of a cluster of cysteine and histidine residues, arranged with an invariant spacing to form a potential zinc-binding motif []. Mutating conserved cysteine residues in the DEAF-1 MYND domain does not abolish DNA binding, which suggests that the MYND domain might be involved in protein-protein interactions []. Indeed, the MYND domain of ETO/MTG8 interacts directly with the N-CoR and SMRT co-repressors [, ]. Aberrant recruitment of co-repressor complexes and inappropriate transcriptionalrepression is believed to be a general mechanism of leukemogenesis caused by the t(8;21) translocations that fuse ETO with the acute myelogenous leukemia 1 (AML1) protein. ETO has been shown to be a co-repressor recruited by the promyelocytic leukemia zinc finger (PLZF) protein []. Adivergent MYND domain present in the adenovirus E1A binding protein BS69 was also shown to interact with N-CoR and mediate transcriptional repression []. The current evidence suggests that the MYND motif in mammalian proteins constitutes a protein-protein interaction domain that functions as a co-repressor-recruiting interface.
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups []. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [, , ].The human APJ gene which encodes this receptor was originally cloned in 1993 using a set of primers based on the 7 conserved TM domains. The putative sequence is closest in terms of identity (40-50% in the TM regions) to the angiotensin receptor (AT1); however, angiotensin II shows no affinity for the receptor []. It is a receptor for apelin receptor early endogenous ligand (APELA) and apelin (APLN) hormones, which are coupled to G proteins and inhibit adenylate cyclase activity []. The mature transcript encodes a preproprotein that yields a 13 amino acid active peptide from the C-terminal end. Apelin has a similar mRNA distribution to angiotensin II and the active peptides share some similarity. It plays a role in regulation of blood vessel formation, blood pressure, heart contractility and heart failure [, , ].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups []. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [, , ].Neurotensin is a 13-residue peptide transmitter, sharing significantsimilarity in its 6 C-terminal amino acids with several other neuropeptides,including neuromedin N. This region is responsible for the biological activity, the N-terminal portion having a modulatory role. Neurotensin is distributed throughout the central nervous system, with highest levels in the hypothalamus, amygdala and nucleus accumbens. It induces a variety of effects, including: analgesia, hypothermia and increased locomotor activity. It is also involved in regulation of dopamine pathways. In the periphery, neurotensin is found in endocrine cells of the small intestine, where it leads to secretion and smooth muscle contraction.The existence of 2 neurotensin receptor subtypes, with differing affinitiesfor neurotensin and differing sensitivities to the antihistamine levocabastine, was originally demonstratedby binding studies in rodent brain. Two neurotensin receptors (NT1 and NT2) with such properties have since been cloned and have been found to be G-protein-coupled receptor family members [].
The insulin family of proteins groups together several evolutionarily related active peptides []: these include insulin [, ], relaxin [, ], insect prothoracicotropic hormone (bombyxin) [], insulin-like growth factors (IGF1 and IGF2) [, ], mammalian Leydig cell-specific insulin-like peptide (gene INSL3), early placenta insulin-like peptide (ELIP) (gene INSL4), locust insulin-related peptide (LIRP), molluscan insulin-related peptides (MIP), and Caenorhabditis elegans insulin-like peptides. The 3D structures of a number of family members have been determined [, , ]. The fold comprises two polypeptide chains (A and B) linked by two disulphide bonds: all share a conserved arrangement of 4 cysteines in their A chain, the first of which is linked by a disulphide bond to the third, while the second and fourth are linked by interchain disulphide bonds to cysteines in the B chain. Insulin is found in many animals, and is involved in the regulation of normal glucose homeostasis. It also has other specific physiological effects, such as increasing the permeability of cells to monosaccharides, amino acids and fatty acids, and accelerating glycolysis and glycogen synthesis in the liver []. Insulin exerts its effects by interaction with a cell-surface receptor, which may also result in the promotion of cell growth []. Insulin is synthesised as a prepropeptide from which an endoplasmic reticulum-targeting sequence is cleaved to yield proinsulin. The sequence of prosinsulin contains 2 well-conserved regions (designated A and B), separated by an intervening connecting region (C), which is variable between species []. The connecting region is cleaved, liberating the active protein, which contains the A and B chains, held together by 2 disulphide bonds []. Relaxin has diverse actions in the reproductive tract and in other tissuesduring pregnancy []. Although binding sites for relaxin have been found inreproductive tissue, the nature of the receptor was previously unknown.Recently, two orphan GPCRs, LGR7 and LGR8, have been identified as receptorsfor the hormone. These two receptors contain large extracellular N-terminiwith leucine-rich repeat regions, and are structurally similar to thegonadotropin and thyrotropin receptors. LGR7 is expressed in the brain,kidney, testis, placenta, uterus, ovary, adrenal gland, prostate, skin andheart, while LGR8 is expressed mainly in the brain, kidney, muscle, testis,thyroid, uterus, peripheral blood cells and bone marrow. Upon binding toLGR7 or 8, relaxin stimulates a dose-dependent increase in cyclic AMPproduction, indicating coupling of the receptors to G proteins.
There are three distinct families of extracellular receptors for purine and pyrimidine nucleotides [], known as P1, P2X and P2Y purinoceptors []. These receptors induce a wide variety of biological effects and are involved in many different cellular functions [, , ]. P2X receptors are ligand-gated ion channels, whereas P1 and P2Y receptors are rhodopsin-like G protein-coupled receptors [, ]. The families also differ by their method of activation: P1 receptors are preferentially activated by adenosine [], P2X via ATP [], whereas the P2Y receptors, in addition to being activated by ATP, are activated by different adenine and/or uridine nucleoside di- and triphosphates (ADP, UDP, UTP, UDP and UDP-glucose) [].The P2Y purinoceptors currently consist of eleven subtypes: P2Y1, P2Y2, P2Y3 P2Y4, P2Y6, P2Y8, P2Y10, P2Y11, P2Y12, P2Y13 and P2Y14 [, , ]. P2Y3 has, as yet, only been found in birds [], whilst the rest have been cloned in humans. The gaps in P2Y receptor numbering are due to the reclassification of some receptors that were initially associated with to the P2Y family. These include P2Y5 (now known as lysophosphatidic acid receptor 6), P2Y7 (now leukotriene B4 receptor) and P2Y9 (lysophosphatidic acid receptor 4) [, , , ]. P2Y purinoceptor subtypes have different pharmacological selectivities, which overlap in some cases, for various adenosine and uridine nucleotides. They are widely expressed and are involved in platelet aggregation, vasodilation and neuromodulation, and a range of other processes, such as ion flux, differentiation, and synaptic communication [, , , ]. They exert their varied biological functions based on different G-protein coupling []. Each receptor subtype can couple to multiple G proteins, either Gi, Gq/11 or Gs, triggering the activation of diverse intracellular signalling cascades (stimulation of phospholipase C through Gq/11, stimulation of adenylyl cyclase via Gs, or ihibition of adenylyl cyclase via Gi [, ]).This entry represents the P2Y8 receptor. It was originally identified in Xenopus and found to be activated equipotently by all naturally occurring nucleoside triphosphates (ATP, CTP, GTP, ITP and UTP) but not by inorganic polyphosphates []. The receptor has been identified in human undifferentiated HL60 cells [, ]. It is currently regarded as an orphan receptor by the International Union of Basic and Clinical Pharmacology (IUPHAR). It has been suggested that this receptor may have a role in early development of the nervous system [, ].
Formate--tetrahydrofolate ligase () (formyltetrahydrofolate synthetase) (FTHFS) is one of the enzymesparticipating in the transfer of one-carbon units, an essential element of various biosynthetic pathways. FTHFS catalyzes the ATP-dependent activation of formate ion via its addition to the N10 position of tetrahydrofolate. FTHFS is a highly expressed key enzyme in both the Wood-Ljungdahl pathway of autotrophic CO2fixation (acetogenesis) and the glycine synthase/reductase pathways of purinolysis. Thekey physiological role of this enzyme in acetogens is to catalyze the formylation of tetrahydrofolate, an initial step in the reduction of carbon dioxide and other one-carbon precursors to acetate. In purinolytic organisms, the enzymatic reaction is reversed, liberating formate from 10-formyltetrahydrofolate with concurrent production of ATP [, ]. In many of these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). In eukaryotes the FTHFS activity is expressed by a multifunctional enzyme, C-1-tetrahydrofolate synthase (C1-THF synthase), which also catalyses the dehydrogenase and cyclohydrolase activities. Two forms of C1-THF synthases are known [], one is located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms the FTHFS domainconsists of about 600 amino acid residues and is located in the C-terminal section of C1-THF synthase. In prokaryotes FTHFS activity is expressed by a monofunctional homotetrameric enzyme of about 560 amino acid residues [].The crystal structure of N(10)-formyltetrahydrofolate synthetase from Moorella thermoacetica shows that the subunit is composed of three domains organised around three mixed β-sheets. There are two cavities between adjacent domains. One of them was identified as the nucleotide binding site by homology modelling. The large domain contains a seven-stranded β-sheet surrounded by helices on both sides. The second domain contains a five-stranded β-sheet with two α-helices packed on one side while the other two are a wall of the active site cavity. The third domain contains a four-stranded β-sheet forming a half-barrel. The concave side is covered by two helices while the convex side is another wall of the large cavity. Arg 97 is likely involved in formyl phosphate binding. The tetrameric molecule is relatively flat with the shape of the letter X, and the active sites are located at the end of the subunits far from the subunit interface [].
Members of this eukaryotic family are part of the group II chaperonin complex called CCT (chaperonin containing TCP-1 or Tailless Complex Polypeptide 1) or TRiC [, ]. Chaperonins are involved in productive folding of proteins []. They share a common general morphology, a double toroid of 2 stacked rings. The archaeal equivalent group II chaperonin is often called the thermosome []. Both the thermosome and the TCP-1 family of proteins are weakly, but significantly [], related to the cpn60/groEL chaperonin family (see ).The TCP-1 protein was first identified in mice where it is especially abundant in testis but present in all cell types. It has since been found and characterised in many other animal species, as well as in yeast, plants and protists. The TCP1 complex has a double-ring structure with central cavities where protein folding takes place []. TCP-1 is a highly conserved protein of about 60kDa (556 to 560 residues) which participates in a hetero-oligomeric 900kDa double-torus shaped particle []with 6 to 8 other different, but homologous, subunits []. These subunits, the chaperonin containing TCP-1 (CCT) subunit beta, gamma, delta, epsilon, zeta and eta are evolutionary related to TCP-1 itself [, ]. Non-native proteins are sequestered inside the central cavity and folding is promoted by using energy derived from ATP hydrolysis [, , ]. The CCT is known to act as a molecular chaperone for tubulin, actin and probably some other proteins [, ].Thermosome (or cpn60) is the name given to the archaeal rather than eukaryotic form of the group II chaperonin (counterpart to the group I chaperonin, GroEL/GroES, in bacteria), a toroidal, ATP-dependent molecular chaperone that assists in the folding or refolding of nascent or denatured proteins []. Cpn60 consists of two stacked octameric rings, which are composed of one or two different subunits. Various homologous subunits, one to five per archaeal genome, may be designated alpha, beta, etc., but phylogenetic analysis does not show distinct alpha subunit and beta subunit lineages traceable to ancient paralogs. TF55 from thermophilic bacteria is also included in this entry.
Wnt proteins constitute a large family of secreted molecules that are involved in intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1 (mouse) and wingless (Drosophila) []. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals []. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [, ].Wnt-mediated signalling is believed to proceed initially through binding to cell surface receptors of the frizzled family; the signal is subsequently transduced through several cytoplasmic components to B-catenin, which enters the nucleus and activates the transcription of several genes important indevelopment []. Several non-canonical Wnt signalling pathways have also been elucidated that act independently of B-catenin. Canonical and noncanonical Wnt signaling branches are highly interconnected, and cross-regulate each other [].Members of the Wnt gene family are defined by their sequence similarity to mouse Wnt-1 and Wingless in Drosophila. They encode proteins of ~350-400 residues in length, with orthologues identified in several,mostly vertebrate, species. Very little is known about the structure of Wnts as they are notoriously insoluble, but they share the following features characteristics of secretory proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines []that are probably involved in disulphide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters. Fifteen major Wnt gene families have been identified in vertebrates, with multiple subtypes within some classes.The Wnt-1 gene was first identified in 1982 as a proto-oncogene activated bythe integration of mouse mammary tumour virus (MMTV) in mammary tumours.With the identification of Drosophila wingless, however, it became clearthat Wnt genes are important regulators of many developmental decisions.Mutation of the embryonic mouse Wnt-1 gene leads to loss of the midbrain andcerebellum; and a number of processes including segment polarity and limbdevelopment are interrupted in Drosophila wingless mutants [].
The entry refers to the EVH1 domain found in WASP family proteins.The EVH1 (WH1, RanBP1-WASP) domain is found in multi-domain proteins implicated in a diverse range of signalling, nuclear transport and cytoskeletal events. This domain of around 115 amino acids is present in species ranging from yeast to mammals. Many EVH1-containing proteins associate with actin-based structures and play a role in cytoskeletal organisation. EVH1 domains recognise and bind the proline-rich motif FPPPP with low-affinity, further interactions then form between flanking residues [, ].WASP family proteins contain an EVH1 (WH1) in their N-terminals which bind proline-rich sequences in the WASP interacting protein. Proteins of the RanBP1 family contain a WH1 domain in their N-terminal region, which seems to bind a different sequence motif present in the C-terminal part of RanGTP protein [, ]. Tertiary structure of the WH1 domain of the Mena protein revealed structure similarities with the pleckstrin homology (PH) domain. The overall fold consists of a compact parallel β-sandwich, closed along one edge by a long α-helix. A highly conserved cluster of three surface-exposed aromatic side-chains forms the recognition site for the molecules target ligands. [].The actin nucleation-promoting factor WAS (WASP; also called Bee1p) and its homologue N (neuronal)-WASP are signal transduction proteins that promote actin polymerization in response to upstream intracellular signals []. Wiskott-Aldrich Syndrome (WAS) is an X-linked recessive disease, characterized by eczema, immunodeficiency, and thrombocytopenia []. The majority of patients with WAS, or a milder version of the disorder, X-linked thrombocytopenia (XLT), have point mutations in the EVH1 domain of WASP []. WASP is an actin regulatory protein consisting of an N-terminal EVH1 domain, a basic region (B), a GTP binding domain (GBP), a proline rich region, a WH2 domain, and a verprolin-cofilin-acidic motif (VCA) which activates the actin-related protein (Arp)2/3 actin nucleating complex []. The B, GBD, and the proline-rich region are involved in autoinhibitory interactions that repress or block the activity of the VCA. Yeast members lack the GTP binding domain. The EVH1 domains are part of the PH domain superfamily [].
Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocasepathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them tothe translocase component []. From there, the mature proteins are either targeted to the outermembrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterialchromosome. The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integralmembrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release ofthe mature peptide into the periplasm (SecD and SecF) []. The chaperone protein SecB []is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm.SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membraneprotein ATPase SecA for secretion []. Together with SecY and SecG, SecE forms a multimericchannel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. Thelatter is mediated by SecA. The structure of theEscherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmicdomains []. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15transmembrane helices. The SecD and SecF equivalents of theGram-positive bacterium Bacillus subtilis are jointly present in one polypeptide,denoted SecDF, that is required to maintain a high capacity for protein secretion.Unlike the SecD subunit of the pre-protein translocase of E. coli, SecDFof B. subtilis was not required for the release of a mature secretory protein fromthe membrane, indicating that SecDF is involved in earlier translocation steps [].Comparison with SecD andSecF proteins from other organisms revealed the presence of 10 conservedregions in SecDF, some of which appear to be important for SecDF function.Interestingly, the SecDF protein of B. subtilis has 12 putative transmembranedomains. Thus, SecDF does not only show sequence similarity but also structuralsimilarity to secondary solute transporters [].This family consists of various archaeal SecF proteins. They show a high degree of structural and functional similarity to their bacterial homologues, despite the different composition of their translocation machineries [].
Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocasepathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them tothe translocase component []. From there, the mature proteins are either targeted to the outermembrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterialchromosome. The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integralmembrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release ofthe mature peptide into the periplasm (SecD and SecF) []. The chaperone protein SecB []is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm.SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membraneprotein ATPase SecA for secretion []. Together with SecY and SecG, SecE forms a multimericchannel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. Thelatter is mediated by SecA. The structure of theEscherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmicdomains []. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15transmembrane helices. The SecD and SecF equivalents of theGram-positive bacterium Bacillus subtilis are jointly present in one polypeptide,denoted SecDF, that is required to maintain a high capacity for protein secretion.Unlike the SecD subunit of the pre-protein translocase of E. coli, SecDFof B. subtilis was not required for the release of a mature secretory protein fromthe membrane, indicating that SecDF is involved in earlier translocation steps [].Comparison with SecD andSecF proteins from other organisms revealed the presence of 10 conservedregions in SecDF, some of which appear to be important for SecDF function.Interestingly, the SecDF protein of B. subtilis has 12 putative transmembranedomains. Thus, SecDF does not only show sequence similarity but also structuralsimilarity to secondary solute transporters [].This family consists of various archaeal SecD proteins. They show a high degree of structural and functional similarity to their bacterial homologues, despite the different composition of their translocation machineries [].
This entry represents the CRIB domain superfamily. Many putative downstream effectors of the small GTPases Cdc42 and Rac contain a GTPase binding domain (GBD), also called p21 binding domain (PBD), which has been shown to specifically bind the GTP bound form of Cdc42 or Rac, with a preference for Cdc42 [, ]. The most conserved region of GBD/PBD domains is the N-terminal Cdc42/Rac interactive binding motif (CRIB), which consists of about 16 amino acids with the consensus sequence I-S-x-P-x(2,4)-F-x-H-x(2)-H-V-G [].Although the CRIB motif is necessary for the binding to Cdc42 and Rac, it is not sufficient to give high-affinity binding [, ]. A less well conserved inhibitory switch (IS) domain responsible for maintaining the proteins in a basal (autoinhibited) state is located C-terminaly of the CRIB-motif [, , ].GBD domains can adopt related but distinct folds depending on context. Although GBD domains are largely unstructured in the free state, the IS domain forms an N-terminal beta; hairpin that immediately follows the conserved CRIB motif and a central bundle of three alpha; helices in the autoinhibited state. The interaction between GBD domains and their respective G proteins leads to the formation of a high-affinity complex in which unstructured regions of both the effector and the G protein become rigid. CRIB motifs from various GBD domains interact with Cdc42 in a similar manner, forming an intermolecular beta;-sheet with strand beta;-2 of Cdc42. Outside the CRIB motif, the C-termini of the various GBD domains are very divergent and show variation in their mode of binding to Cdc42, perhaps determining the specificity of the interaction. Binding of Cdc42 or Rac to the GBD domain causes a dramatic conformational change, refolding part of the IS domain and unfolding the rest [, , , , ].Some proteins known to contain a CRIB domain are listed below:Mammalian activated Cdc42-associated kinases (ACKs), nonreceptor tyrosine kinases implicated in integrin-coupled pathways.Mammalian p21-activated kinases (PAK1 to PAK4), serine/threonine kinases that modulate cytoskeletal assembly and activate MAP-kinase pathways.Mammalian Actin nucleation-promoting factor WAS proteins (WASPs), non-kinase proteins involved in the organisation of the actin cytoskeleton.Yeast STE20 and CLA4, the homologues of mammalian PAKs. STE20 is involved in the mating/pheromone MAP kinase cascade.
This entry represents a group of inositol monophosphatases (IMPases) mainly from bacteria. E. coli SuhB is part of the rRNA transcription anti-termination complex (rrnTAC) that preventsrho-dependent termination. The rrnTAC consists of Nus factors A, B, E and G, ribosomal protein S4 and inositol mono-phosphatase SuhB. SuhB directly binds a C-terminal acidic repeat domain of NusA and contacts at least one other region in NusA as well as the nut-like RNA signal element. It thereby facilitates entry of the NusB/E dimer into an rrnTAC [].E. coli SuhB has D,L-inositol-1-monophosphatase and beta-glycerophosphatase activity []. However, E.coli makes very low amounts of myo-inositol-containing phospholipids, so the catalytic necessity for this enzyme is low []. Inositol polyphosphate 1-phosphatase (1PTASE) and inositol monophosphatase (MPTASE) are enzymes of the inositol signalling pathway that share similar enzymatic activity []. Both enzymes exhibit an absolute requirement for metal ions (Mg2+ is preferred), and both are uncompetitively inhibited by submillimolar concentrations of Li+. Their amino acid sequences contain a number of conserved motifs, which are also shared by several other proteins related to MPTASE (including products of fungal QaX and qutG, bacterial suhB and cysQ, and yeast hal2) []. Structural analysis of these proteins has revealed a common core of 155 residues: the core comprises 5 α-helices and 11 β-strands, and includes residues essential for metal binding and catalysis. While the core has been conserved, presumably to impart catalytic function, the loops and regions of structure outside the core have evolved unique regulatory domains []. An interesting property of the enzymes of this family is their sensitivity to Li+ at levels achieved in patients undergoing therapy for manic depression. The targets and mechanism of action of Li+ are unknown, but over-active inositol phosphate signalling may account for symptoms of the disease [, ]. It has been proposed that these Li+-sensitive proteins could represent targets for Li+ in manic depressive disease [, , ]. The structures of several members of the superfamily have been determined by X-ray crystallography [, ]. The fold of fructose 1,6-bisphosphatase (FBPTASE) was noted to be identical to that of MPTASE []. The suhB gene product from Escherichia coli, which is thought to participate in post-transcriptional control of gene expression, also possesses inositol-1-phosphatase activity [, ]. The major difference between this enzyme from other characterised inositol-1-phosphatases is that it exists as a monomer in solution (rather than a dimer or tetramer); it is also more hydrophobic - it is thought that these physical differences might underlie the biological role of wild-type SuhB in E.coli [].
This is the highly conserved DNA-binding domain of T-box transcription factors.Transcription factors of the T-box family are required both for early cell-fate decisions, such as those necessary for formation of the basic vertebrate body plan, for differentiation and organogenesis []and also have been associated to multiple aspects of development and in adult terminal cell-type differentiation in different animal lineages []. The T-box is defined as the minimal region within the T-box protein that is both necessary and sufficient for sequence-specific DNA binding, all members of the family so far examined bind to the DNA consensus sequence TCACACCT and function as transcriptional repressors and/or activators []. The T-box is a relatively large DNA-binding domain, generally comprising about a third of the entire protein (17-26kDa) [].These genes were uncovered on the basis of similarity to the DNA binding domain []of Mus musculus (Mouse) Brachyury (T) gene product, which similarity is the defining feature of the family. The Brachyury gene is named for its phenotype, which was identified 70 years ago as a mutant mouse strain with a short blunted tail. The gene, and its paralogues, have become a well-studied model for the family, and hence much of what is known about the T-box family is derived from the murine Brachyury gene.Consistent with its nuclear location, Brachyury protein has a sequence-specific DNA-binding activity and can act as a transcriptional regulator []. Homozygous mutants for the gene undergo extensive developmental anomalies, thus rendering the mutation lethal []. The postulated role of Brachyury is as a transcription factor, regulating the specification and differentiation of posterior mesoderm during gastrulation in a dose-dependent manner [].T-box proteins tend to be expressed in specific organs or cell types, especially during development, and they are generally required for the development of those tissues, for example, Brachyury is expressed in posterior mesoderm and in the developing notochord, and it is required for the formation of these cells in mice []. The T-box family is an ancient group that appears to play a critical role in development in all animal species [].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups []. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The secretin-like GPCRs include secretin [], calcitonin [], parathyroid hormone/parathyroid hormone-related peptides []and vasoactive intestinal peptide [], all of which activate adenylyl cyclase and the phosphatidyl-inositol-calcium pathway. These receptors contain seven transmembrane regions, in a manner reminiscent of the rhodopsins and other receptors believed to interact with G-proteins (however there is no significant sequence identity between these families, the secretin-like receptors thus bear their own unique '7TM' signature). Their N-terminal is probably located on the extracellular side of the membrane and potentially glycosylated. This N-terminal region contains a long conserved region which allows the binding of large peptidic ligand such as glucagon, secretin, VIP and PACAP; this region contains five conserved cysteines residues which could be involved in disulphide bond. The C-terminal region of these receptor is probably cytoplasmic. Every receptor gene in this family is encoded on multiple exons, and several of these genes are alternatively spliced to yield functionally distinct products. Several 7TM receptors have been cloned but their endogenous ligands are unknown; these have been termed orphan receptors. GPR1 (formerly GPR56) was isolated from a human heart cDNA library using oligonucleotide primers corresponding to TM domains 4 and 7 of the secretin-like receptor family. The mRNA transcript is widely distributed throughout most tissues, the highest levels being found in thyroid, brain and heart. Within the brain, the hippocampus and hypothalamic nuclei express GPR1 in particularly high levels. This entry also include other orphan receptors, such as human adhesion G-protein coupled receptor G3 and G5 (AGRG3/5).
The band-7 protein family comprises a diverse set of membrane-bound proteins characterised by the presence of a conserved domain, the band-7 domain, also known as SPFH or PHB domain. The exact function of the band-7 domain is not known, but examples from animal and bacterial stomatin-type proteins demonstrate binding to lipids and the ability to assemble into membrane-bound oligomers that form putative scaffolds [].A variety of proteins belong to the band-7 family. These include the stomatins, prohibitins, flottins and the HflK/C bacterial proteins. Eukaryotic band 7 proteins tend to be oligomeric and are involved in membrane-associated processes. Stomatins are involved in ion channel function, prohibitins are involved in modulating the activity of a membrane-bound FtsH protease and the assembly of mitochondrial respiratory complexes, and flotillins are involved in signal transduction and vesicle trafficking [].Stomatin, also known as human erythrocyte membrane protein band 7.2b [], was first identified in the band 7 region of human erythrocyte membrane proteins. It is an oligomeric, monotopic membrane protein associated with cholesterol-rich membranes/lipid rafts. Human stomatin is ubiquitously expressed in all tissues; highly in hematopoietic cells, relatively low in brain. It is associated with the plasma membrane and cytoplasmic vesicles of fibroblasts, epithelial and endothelial cells [].Stomatin is believed to be involved in regulating monovalent cation transport through lipid membranes. Absence of the protein in hereditary stomatocytosis is believed to be the reason for the leakage of Na+and K+ions into and from erythrocytes []. Stomatin is also expressed in mechanosensory neurons, where it may interact directly with transduction components, including cation channels [].Stomatin proteins have been identified in various organisms, including Caenorhabditis elegans. There are nine stomatin-like proteins in C. elegans, MEC-2 being the one best characterised []. In mammals, other stomatin family members are stomatin-like proteins SLP1, SLP2 and SLP3, and NPHS2 (podocin), which display selective expression patterns []. Stomatin family members are oligomeric, they mostly localise to membrane domains, and in many cases have been shown to modulate ion channel activity.The stomatins and prohibitins, and to a lesser extent flotillins, are highly conserved protein families and are found in a variety of organisms ranging from prokaryotes to higher eukaryotes, whereas HflK and HflC homologues are only present in bacteria [].This entry matches Stomatin, HflK and HflC proteins.
Transient receptor potential (TRP) channels can be described as tetramers formed by subunits with six transmembrane domains and containing cation-selective pores, which in several cases show high calcium permeability. The molecular architecture of TRP channels is reminiscent ofvoltage-gated channels and comprises six putative transmembrane segments (S1-S6), intracellular N- and C-termini, and a pore-forming reentrant loop between S5 and S6 [].TRP channels represent a superfamily conserved from worms to humans that comprise seven subfamilies []: TRPC (canonical), TRPV (vanilloid), TRPM (melastatin or long TRPs), TRPA (ankyrin, whose only member is Transient receptor potential cation channel subfamily A member 1, TrpA1), TRPP (polycystin), TRPML (mucolipin) and TRPN (Nomp-C homologues), which has a single member that can be found in worms, flies, and zebrafish. TRPs are classified essentially according to their primary amino acid sequence rather than selectivity or ligand affinity, due to their heterogeneous properties and complex regulation.TRP channels are involved in many physiological functions, ranging from pure sensory functions, such as pheromone signalling, taste transduction, nociception, and temperature sensation, over homeostatic functions, such as Ca2+ and Mg2+ reabsorption and osmoregulation, to many other motile functions, such as muscle contraction and vaso-motor control [].The TRPV (vanilloid) subfamily can be divided into two distinct groups. The first, which comprises TrpV1, TrpV2, TrpV3, and TrpV4, with nonselective cation conducting pores, has members which can be activated by temperature as well as chemical stimuli. They are involved in a range of functions including nociception, thermosensing and osmolarity sensing. The second group, which consists of TrpV5 and TrpV6, (also known as epithelial calcium channels 1 and 2), highly calcium selective, are involved in renal Ca2+ absorption/reabsorption [, ].TRPV1 was the first vanilloid receptor identified. It is a nonselective cation channel with a preference for calcium and is activated by noxious stimuli, heat, protons, pH 5.9, and various, mostly obnoxious, natural products []. TRPV1 is predominantly expressed in sensory neurons []and is believed to play a crucial role in temperature sensing and nociception [], qualifying therefore as a molecular target for pain treatment.
Eukaryotic P1 and P2 are functionally equivalent to the bacterial protein L7/L12, but are not homologous to L7/L12. P2 is located in the L12 stalk, with proteins P1, P0, L11, and 28S rRNA. P1 and P2 are the only proteins in the ribosome to occur as multimers, always appearing as sets of heterodimers. Eukaryotes have four copies (two heterodimers), while most archaeal species contain six copies of L12p (three homodimers). Bacteria may have four or six copies of L7/L12 (two or three homodimers) depending on the species [, , ]. Experiments using S. cerevisiae P1 and P2 indicate that P1 proteins are positioned more internally with limited reactivity in the C-terminal domains, while P2 proteins seem to be more externally located and are more likely to interact with other cellular components []. In lower eukaryotes, P1 and P2 are further subdivided into P1A, P1B, P2A, and P2B, which form P1A/P2B and P1B/P2A heterodimers []. Some plants have a third P-protein, called P3, which is not homologous to P1 and P 2 [].In humans, P1 and P2 are strongly autoimmunogenic. They play a significant role in the etiology and pathogenesis of systemic lupus erythema (SLE). In addition, the ribosome-inactivating protein trichosanthin (TCS) interacts with human P0, P1, and P2, with its primary binding site in the C-terminal region of P2. TCS inactivates the ribosome by depurinating a specific adenine in thesarcin-ricin loop of 28S rRNA [].Archaeal L12 is functionally equivalent to L7/L12 in bacteria and the P1 and P2 proteins in eukaryotes. L12 is homologous to P1 and P2 but is not homologous to bacterial L7/L12. It is located in the L12 stalk, with proteins L10, L11, and 23S rRNA. In several mesophilic and thermophilic archaeal species, the binding of 23S rRNA to protein L11 and to the L10/L12p pentameric complex was found to be temperature-dependent and cooperative [].This entry includes eukaryotic 60S acidic ribosomal protein P1/P2 , as well as archaeal 50S ribosomal protein L12. These proteins play an important role in the elongation step of protein synthesis [, ].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [, ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [, ].A number of eukaryotic and archaeabacterial ribosomal proteins can be grouped on the basis of sequence similarities. One of these families includes mammalian ribosomal protein L6 (L6 was previously known as TAX-responsive enhancer element binding protein 107); Caenorhabditis elegans ribosomal protein L6 (R151.3); Saccharomyces cerevisiae (Baker's yeast) ribosomal protein YL16A/YL16B; and Mesembryanthemum crystallinum (Common ice plant) ribosomal protein YL16-like. These proteins have 175 (yeast) to 287 (mammalian) amino acids.
Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocasepathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them tothe translocase component []. From there, the mature proteins are either targeted to the outermembrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterialchromosome. The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integralmembrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release ofthe mature peptide into the periplasm (SecD and SecF) []. The chaperone protein SecB []is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm.SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membraneprotein ATPase SecA for secretion []. Together with SecY and SecG, SecE forms a multimericchannel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. Thelatter is mediated by SecA. The structure of theEscherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmicdomains []. Eachmembrane is composed of dimers of SecYEG. The monomeric complex contains 15transmembrane helices. The SecD and SecF equivalents of theGram-positive bacterium Bacillus subtilis are jointly present in one polypeptide,denoted SecDF, that is required to maintain a high capacity for protein secretion.Unlike the SecD subunit of the pre-protein translocase of E. coli, SecDFof B. subtilis was not required for the release of a mature secretory protein fromthe membrane, indicating that SecDF is involved in earlier translocation steps [].Comparison with SecD andSecF proteins from other organisms revealed the presence of 10 conservedregions in SecDF, some of which appear to be important for SecDF function.Interestingly, the SecDF protein of B. subtilis has 12 putative transmembranedomains. Thus, SecDF does not only show sequence similarity but also structuralsimilarity to secondary solute transporters [].This entry represents bacterial SecD and SecF protein export membrane proteins. It is found in association with SecD and SecF proteins are part of the multimeric protein export complex comprising SecA, D, E, F, G, Y, and YajC []. SecD and SecF are required to maintain a proton motive force [].
Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocasepathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them tothe translocase component []. From there, the mature proteins are either targeted to the outermembrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterialchromosome. The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integralmembrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release ofthe mature peptide intothe periplasm (SecD and SecF) []. The chaperone protein SecB []is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm.SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membraneprotein ATPase SecA for secretion []. Together with SecY and SecG, SecE forms a multimericchannel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. Thelatter is mediated by SecA. The structure of theEscherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmicdomains []. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15transmembrane helices. The SecD and SecF equivalents of theGram-positive bacterium Bacillus subtilis are jointly present in one polypeptide,denoted SecDF, that is required to maintain a high capacity for protein secretion.Unlike the SecD subunit of the pre-protein translocase of E. coli, SecDFof B. subtilis was not required for the release of a mature secretory protein fromthe membrane, indicating that SecDF is involved in earlier translocation steps [].Comparison with SecD andSecF proteins from other organisms revealed the presence of 10 conservedregions in SecDF, some of which appear to be important for SecDF function.Interestingly, the SecDF protein of B. subtilis has 12 putative transmembranedomains. Thus, SecDF does not only show sequence similarity but also structuralsimilarity to secondary solute transporters [].This entry represents a GG-containing domain found in the N-terminal region of prokaryotic SecD and SecF protein export membrane proteins. It is found in association with . SecD and SecF proteins are part of the multimeric protein export complex comprising SecA, D, E, F, G, Y, and YajC []. SecD and SecF are required to maintain a proton motive force [].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [, ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [, ].This is a family of short mitochondrial ribosomal proteins, less than 200 amino acids long. MRP-S35 was proposed as a more appropriate name to this group of proteins [].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups []. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The nematode Caenorhabditis elegans has only 14 types of chemosensory neuron, yet is able to sense and respond to several hundred different chemicals because each neuron detects several stimuli []. Chemoperception is one of the central senses of soil nematodes like C. elegans which are otherwise 'blind' and 'deaf' []. Chemoreception in C. elegans is mediated by members of the seven-transmembrane G-protein-coupled receptor class (7TM GPCRs). More than 1300 potential chemoreceptor genes have been identified in C. elegans, which are generally prefixed sr for serpentine receptor. The receptor superfamilies include Sra (Sra, Srb, Srab, Sre), Str (Srh, Str, Sri, Srd, Srj, Srm, Srn) and Srg (Srx, Srt, Srg, Sru, Srv, Srxa), as well as the families Srw, Srz, Srbc, Srsx and Srr [, , ]. Many of these proteins have homologues in Caenorhabditis briggsae.Srab is part of the Sra superfamily of chemoreceptors. The expression pattern of the srab genes is biologically intriguing. Of the six promoters successfully expressed in transgenic organisms, one was exclusively expressed in the tail phasmid neurons, two were exclusively expressed in a head amphid neuron, and two were expressed both in the head and tail neurons as well as a limited number of other cells [].
Formate--tetrahydrofolate ligase () (formyltetrahydrofolate synthetase) (FTHFS) is one of the enzymesparticipating in the transfer of one-carbon units, an essential element of various biosynthetic pathways. FTHFS catalyzes the ATP-dependent activation of formate ion via its addition to the N10 position of tetrahydrofolate. FTHFS is a highly expressed key enzyme in both the Wood-Ljungdahl pathway of autotrophic CO2fixation (acetogenesis) and the glycine synthase/reductase pathways of purinolysis. The key physiological role of this enzyme in acetogens is to catalyze the formylation of tetrahydrofolate, an initial step in the reduction of carbon dioxide and other one-carbon precursors to acetate. In purinolytic organisms, the enzymatic reaction is reversed, liberating formate from 10-formyltetrahydrofolate with concurrent production of ATP [, ]. In many of these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). In eukaryotes the FTHFS activity is expressed by a multifunctional enzyme, C-1-tetrahydrofolate synthase (C1-THF synthase), which also catalyses the dehydrogenase and cyclohydrolase activities. Two forms of C1-THF synthases are known [], one is located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms the FTHFS domainconsists of about 600 amino acid residues and is located in the C-terminal section of C1-THF synthase. In prokaryotes FTHFS activity is expressed by a monofunctional homotetrameric enzyme of about 560 amino acid residues [].The crystal structure of N(10)-formyltetrahydrofolate synthetase from Moorella thermoacetica shows that the subunit is composed of three domains organised around three mixed β-sheets. There are two cavities between adjacent domains. One of them was identified as the nucleotide binding site by homology modelling. The large domain contains a seven-stranded β-sheet surrounded by helices on both sides. The second domain contains a five-stranded β-sheet with two α-helices packed on one side while the other two are a wall of the active site cavity. The third domain contains a four-stranded β-sheet forming a half-barrel. The concave side is covered by two helices while the convex side is another wall of the large cavity. Arg 97 is likely involved in formyl phosphate binding. The tetrameric molecule is relatively flat with the shape of the letter X, and the active sites are located at the end of the subunits far from the subunit interface [].These signature patterns cover two regions that are almost perfectly conserved. The first one is a glycine-rich segment located in the N-terminal part of FTHFS and which could be part of an ATP-binding domain []. The second pattern is located in the central section of FTHFS.
Tensins constitute an eukaryotic family of lipid phosphatases that are definedby the presence of two adjacent domains: a lipid phosphatase domain and aC2-like domain. The tensin-type C2 lacks the canonical Ca(2+) ligands found inclassical C2 domains, and in this respect it is similar tothe C2 domains of PKC-type [, ]. The tensin-type C2 domain can bindphospholipid membranes in a Ca(2+) independent manner []. In the tumorsuppressor protein PTEN, the best characterized member of the family, thelipid phosphatase domain was shown to specifically dephosphorylate the D3position of the inositol ring of the lipid second messenger,phosphatydilinositol-3-4-5-triphosphate (PIP3). The lipid phosphatase domaincontains the signature motif HCXXGXXR present in the active sites of proteintyrosine phosphatases (PTPs) and dual specificity phosphatases (DSPs).Furthermore, two invariant lysines are found only in the tensin-typephosphatase motif (HCKXGKXR) and are suspected to interact with the phosphategroup at position D1 and D5 of the inositol ring [, ].The crystal structure of the PTEN tumor suppressor has been solved []. The lipid phosphatase domain has a structure similar to thedual specificity phosphatase. However, PTEN has a largeractive site pocket that could be important to accomodate PI(3,4,5)P3. Thetensin-type C2 domain has a structure similar to the classical C2 domain thatmediates the Ca2+ dependent membrane recruitment of several signalingproteins. However the tensin-type C2 domain lacks two of the three conservedloops that bind Ca2+.Proteins known to contain a phosphatase and a C2 tensin-type domain are listed below:Tensin, a focal-adhesion molecule that binds to actin filaments. It may beinvolved in cell migration, cartilage development and in linking signaltransduction pathways to the cytoskeleton.Phosphatase and tensin homologue deleted on chromosome 10 protein (PTEN).It antagonizes PI 3-kinase signalling by dephosphorylating the 3-positionof the inositol ring of PI(3,4,5)P3 and thus inactivates downstreamsignalling. It plays major roles both during development and in the adultto control cell size, growth, and survival.Auxilin. It binds clathrin heavy chain and promotes its assembly intoregular cages.Cyclin G-associated kinase or auxilin-2. It is a potential regulator ofclathrin-mediated membrane trafficking.PTEN homologues in fungi have the tensin phosphatase domain, but they lack the C2 domain. .This entry represents the phosphatase domain.
Neuropeptide FF receptors []belong to a family of neuropeptides containing an RF-amide motif at their C terminus which have a high affinity for the pain modulatory peptide neuropeptide NPFF (NPFF) []. Neuropeptide FF (NPFF) receptors have two subtypes, neuropeptide FF receptor type 1 (NPFF1) and neuropeptide FF receptor type 2 (NPFF2), they are members of rhodopsin G protein-coupled receptor family. The neuropeptide FF is found at high concentrations in the posterior pituitary, spinal cord, hypothalamus and medulla and is believed to be involved in pain modulation, opioid tolerance, cardiovascular regulation, memory and neuroendocrine regulation [, , , ].Comparing the distribution of NPFF1 and NPFF2 receptors in different species reveals important species differences []. The NPFF1 receptor is broadly distributed in the central nervous system with the highest levels found in the limbic system and the hypothalamus, is thought to participate in neuroendocrine functions. Whereas as the NPFF2 receptor is present in high density, particularly in mammals in the superficial layers of the spinal cord []where it is involved in nociception and modulation of opioid functions [], consistent with a potential role of NPFF in the modulation of sensory inputs, like pain responses [, , ].This entry represents NPFF2, which is expressed at high levels in the thymus and placenta, with moderate levels in the pituitary, spleen, testis and brain. Low levels were detected in the spinal cord, pancreas, small intestine, uterus, stomach, lung, heart and skeletal muscle. No expression was detected in liver or kidney []. The NPFF2 receptor has been found to regulate adenylyl cyclase in some recombinant cell lines [, ]. In acutely dissociated neurons, the NPFF2 receptors specifically counteract N-type Ca2+ channel inhibition by opioids [, ]. In SH-SY5Y neuroblastoma cells stably expressing human NPFF receptors, NPFF agonists also reduce the inhibitory effect of mu-opioid and delta-opioid receptor activation on an N-type Ca2+ channel [, ]. These regulations could be due in part to receptor heteromerisation since NPFF2 receptors have been shown to physically interact with mu-opiod receptors []]and induce their trans-phosphorylations [].
This entry represents members of the SMP-30/CGR1 family which act as lactonases such as regucalcin. Regucalcin (RGN) is a gluconolactonase (), converting D-glucono-1,5-lactone to D-gluconate, but also hydrolyzes other carbohydrate lactones. This enzyme is required for the penultimate step in vitamin C biosynthesis. From its crystal structure, regucalcin has a six-bladed β-propeller fold, and binds a single metal ion (either Ca2+or Zn2+)) []. Homologues with similar catalytic activity have been isolated and characterized from bacteria []. There are other bacterial homologues. L-arabinolactonase (), from Azospirillum brasilense, converts L-arabino-gamma-lactone to L-arabonate, allowing the bacterium utilize L-arabinose as a sole carbon source []; lactonase drp35 from Staphylococcusspecies acts as a lactonase on dihydrocoumarin or 2-coumaranone [].A homologue from the squid Loligo vulgarican act as a diisopropyl-fluorophosphatase (), but its physiological substrate is unknown [].Regucalcin, also known as senesence marker protein-30 (SMP30), was discovered in 1978 as a Ca2+binding protein that does not contain EF-hand motifs, suggesting a novel class of Ca2+binding protein. It is primarily localised to the liver and kidney cortex of animals. Expression of its mRNA in the liver and renal cortex of rats is stimulated by an increase in cellular Ca2+levels [, ]. Regucalin, as a regulatory protein of Ca2+, has a pivotal role in thecontrol of many cell functions. The protein has a reversible effect on Ca2+-induced activation and inhibition of many enzymes in both the liver and renal cortex cells []. It has also been shown to inhibit various protein kinases (including Ca2+/calmodulin-dependent protein kinase [], protein kinase C []and tyrosine kinase) and protein phosphatases, indicating a regulatory role in signal transduction within the cell. In addition, regucalcin regulates intracellular Ca2+homeostasis by enhancing Ca2+-pumping activity in the plasma membrane through activation of the pump enzymes []. Moreover, it can inhibit RNA synthesis in the nuclei of normal and regenerating rat livers in vitro [].Hydropathy profiles indicate hydrophobic domains in both N- and C-terminal regions of the regucalcin molecule; the protein also exhibits hydrophilic characteristics. Human and rodent regucalcins share 89% sequence identity, the high degree of conservation between species suggesting that the complete structure is required for physiological function. SMP30 sequences also share a high level of similarity with proteins from awide taxonomic range: these include fly anterior fat body proteins; fireflyluciferin regenerating enzyme; putative calcium binding transcriptionalregulatory proteins from Rhizobium meliloti and Streptomyces coelicolor;gluconolactonase from Brucella melitensis; cell growth protein CGR1 fromCandida albicans; and homologues from Thermoplasma acidophilum, Thermoplasmavolcanium, Sulfolobus tokodaii, Sulfolobus solfataricus, Bacillus subtilisand Rhizobium loti. As such, a number of lactonases are included in this family.
Two highly similar activities are represented in this group: thymidine phosphorylase (TP, gene deoA, ) and pyrimidine-nucleoside phosphorylase (PyNP, gene pdp, ). Both are dimeric enzymes that function in the salvage pathway to catalyse the reversible phosphorolysis of pyrimidine nucleosides to the free base and sugar moieties. In the case of thymidine phosphorylase, thymidine (and to a lesser extent, 2'-deoxyuridine) is lysed to produce thymine (or uracil) and 2'-deoxyribose-1-phosphate. Pyrimidine-nucleoside phosphorylase performs the analogous reaction on thymidine (to produce the same products) and uridine (to produce uracil and ribose-1-phosphate). PyNP is typically the only pyrimidine nucleoside phosphorylase encoded by Gram-positive bacteria, while eukaryotes and proteobacteria encode two: TP, and the unrelated uridine phosphorylase. In humans, TP was originally characterised as platelet-derived endothelial cell growth factor and gliostatin []. Structurally, the enzymes are homodimers, each composed of a rigid all α-helix lobe and a mixed α-helix/β-sheet lobe, which are connected by a flexible hinge [, ]. Prior to substrate binding, the lobes are separated by a large cleft. A functional active site and subsequent catalysis occurs upon closing of the cleft. The active site, composed of a phosphate binding site and a (deoxy)ribonucleotide binding site within the cleft region, is highly conserved between the two enzymes of this group. Active site residues (Escherichia coli DeoA numbering) include the phosphate binding Lys84 and Ser86 (close to a glycine-rich loop), Ser113, and Thr123, and the pyrimidine nucleoside-binding Arg171, Ser186, and Lys190. Sequence comparison between the active site residues for both enzymes reveals only one difference [], which has been proposed to partially mediate substrate specificity. In TP, position 111 is a methionine, while the analogous position in PyNP is lysine. It should be noted that the uncharacterised archaeal members of this family differ in a number of respects from either of the characterised activities. The residue at position 108 is lysine, indicating the activity might be PyNP-like (though the determinants of substrate specificity have not been fully elucidated). Position 171 is glutamate (negative charge side chain) rather than arginine (positive charge side chain). In addition, a large loop that may "lock in"the substrates within the active site is much smaller than in the characterised members. It is not clear what effect these and other differences have on activity and specificity.
Thioredoxins [, , , ]are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of two cysteine thiol groups to a disulphide, accompanied by the transfer of two electrons and two protons. The net result is the covalent interconversion of a disulphide and a dithiol. In the NADPH-dependent protein disulphide reduction, thioredoxin reductase (TR) catalyses the reduction of oxidised thioredoxin (trx) by NADPH using FAD and its redox-active disulphide; reduced thioredoxin then directly reduces the disulphide in the substrate protein [].Thioredoxin is present in prokaryotes and eukaryotes and the sequence around the redox-active disulphide bond is well conserved. All thioredoxins contain a cis-proline located in a loop preceding β-strand 4, which makes contact with the active site cysteines, and is important for stability and function []. Thioredoxin belongs to a structural family that includes glutaredoxin, glutathione peroxidase, bacterial protein disulphide isomerase DsbA, and the N-terminal domain of glutathione transferase []. Thioredoxins have a beta-alpha unit preceding the motif common to all these proteins.A number of eukaryotic proteins contain domains evolutionary related to thioredoxin, most of them are protein disulphide isomerases (PDI). PDI () [, , ]is an endoplasmic reticulum multi-functional enzyme that catalyses the formation and rearrangement of disulphide bonds during protein folding []. All PDI contains two or three (ERp72) copies of the thioredoxin domain, each of which contributes to disulphide isomerase activity, but which are functionally non-equivalent []. Moreover, PDI exhibits chaperone-like activity towards proteins that contain no disulphide bonds, i.e. behaving independently of its disulphide isomerase activity []. The various forms of PDI which are currently known are:PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase (), as a component of oligosaccharyl transferase (), as thyroxine deiodinase (), as glutathione-insulin transhydrogenase () and as a thyroid hormone-binding proteinERp60 (ER-60; 58 Kd microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C isozyme and later to be a protease.ERp72.ERp5.Bacterial proteins that act as thiol:disulphide interchange proteins that allows disulphide bond formation in some periplasmic proteins also contain a thioredoxin domain. These proteins include:Escherichia coli DsbA (or PrfA) and its orthologs in Vibrio cholerae (TtcpG) and Haemophilus influenzae (Por).E. coli DsbC (or XpRA) and its orthologues in Erwinia chrysanthemi and H. influenzae.E. coli DsbD (or DipZ) and its H. influenzae orthologue.E. coli DsbE (or CcmG) and orthologues in H. influenzae.Rhodobacter capsulatus (Rhodopseudomonas capsulata) (HelX), Rhiziobiacae (CycY and TlpA).This entry represents the thioredoxin protein family.