Database chemical
A database chemical is a Database (possibly bibliographical) specifically dedicated to the chemical information. The majority of the chemical databases store information on stable Molécule S. The chemical structures are traditionally represented in a Représentation of the Lewis, who use lines for the chemical bonds (even electronic) between atoms, and related to paper (structural formulas two-dimensional). Although they constitute visual representations adapted for the Chimiste, they are not usable for a data-processing use and more particularly for the research and the storage.
The great chemical databases should be able to ensure the storage and the search for information on million molecules (or other objects chemical) on Teraoctet S of memory physical.
Representation
There exist two principal techniques to represent the chemical structures in the numerical bases:- tables of connections/matrices of adjacencies/lists with extra informations on the Chemical bond (edges) and given atomic (nodes) like:
- : MDL Molfile, PDB, linear CML
- notation based on a course in width or a in-depth course:
- : SPALLING HAMMERS /SMARTS, SLN, WLN, InChI
These approaches were refined in order to allow the representation of stereochemical differences , of loads as well as special types of connections like those of the organometallic compounds. The advantage principal of a data-processing representation is the possibility of a storage growing and a fast and flexible research.
Seek
The chemists can make a research in the bases by using parts of structures, parts of the nomenclatures IUPAC or constraints imposed on the properties. The chemical databases are particularly different from the other more general databases in their way of proceeding to research on substructures. This type of research is carried out by seeking a Isomorphisme of subgraph (sometimes also called Monomorphisme) and is a largely studied application of the Graph theory. The algorithms of research are intensive numerically, sometimes of temporal complexity O ( N 3) gold O ( N 4) (where N is the number of implied atoms). The intensive component of research is called research atom by atom (in English atom-by-atom-searching - ABAS). Research ABAS uses usually the Algorithme of Ullman or its variations. Profits from of speed are obtained by temporal damping, which consists in saving time by use of precalculated information. This precalculation typically implies the creation of Séquence of bits representing the presence or the absence of molecular fragments. By supervising the fragments present, it is possible in a research of structure to eliminate the need for a comparison ABAS with the molecules or chemical objects target not having the fragments required by structural research. This elimination is called écrantage (in English screening , not to confuse with the procedures of écrantage used in phramaceutic research or with the écrantage in Atomistique). The sequences of bits used for these applications are also called key structural. The performances of such keys depend on the choice of the fragments used to build the keys and of their probability of presence in the molecules of the database. Another type of key uses codes of chopping based on fragments determined numeŕiquement. They are called digital fingerprints although the term is sometimes used as synonym of structural keys. The space-memory necessary for the storage of these structural keys and digital fingerprints can be reduced by compaction , which is produced by combining parts of key by using judicious operations on the bits and thus reducing their total length.
Descriptors
All properties molecular beyond the structure can be separate either in physicochemical characteristics, or in pharmacological characteristics , so called descriptors. Over this system, there exist many artificial systems more or less standardized for the molecules and chemical objects which produce more or less ambiguous denominations and Synonyme S. the Nomenclature IUPAC is usually a good choice for the representation of the molecular structures in at the same time readable for each and everyone and constituting a Character string although becoming not very practical for large species. The commonplace names on another side abound with Homonyme S and Synonyme S and are consequently a bad key choice of of definition of the base. While the physicochemical descriptors like the molar Mass, the load (partial), the Solubility, etc can be almost directly calculated while being based on the molecular structure, the pharmacological descriptors can be only indirectly deduced starting from multivariationnelles statistics or from experimental results (Dépistage, biological Essai, etc). All these descriptors can be stored with the representation of the molecule, for reasons of costs of calculations, and are it in a current way.
Similarity
There does not exist simple definition of the similarity between two chemical objects, but however, the concept can be defined according to the context of application and is sometimes described like the Inverse of a measurement of distance in space of the descriptors. Two objects could for example thus be qualified more similar between them than of others if the difference of their molar masses respective weaker than is compared with others. A variety of other measurements could be combined in order to produce a measurement of distance to multiple variables. Measurements of distance are sometimes classified in Euclidean measurements and not-Euclidean measurements according to the choice of the triangular Inégalité.The species of the databases can be thus gathered by similarities. Approaches of hierarchical or not-hierarchical regroupings can be applied to chemical entities with multiple attributes. These attributes (or properties molecular) can be empirically given or numerically given descriptors. One of the most current approaches of regrouping is the algorithm of K closer neighbors of Jarvis-Patrick.
In bases directed towards the Pharmacology, the similarity is usually defined in biological terms of effects of made up (ADME /toxicit E) which can be deduced from similar combinations of physicochemical descriptors by using methods QSAR.
| Random links: | The Community of common Spaces in Pévèle | Torricelliaceae | Monsters of the sea-bed | David Fonseca | Sormiou | Cloonacool |