MetaFetcheR is an R package designed to link metabolites IDs from different Metabolome databases with eachother in a step to resolve ambiguity and standardize metabolites representation and annotation. Currently the package supports resolving IDs for the following databases:

  • Human Metabolome Database (HMDB)
  • Chemical Entities of Biological Interest (ChEBI)
  • PubChem
  • Kyoto Encyclopedia of Genes and Genomes (KEGG)
  • Lipidomics Gateway (LipidMaps)

Installation

  1. Install postgressql database on your system and create a user,you can download the database from here
  2. Install devtools in R
  1. Install MetafetcheR package
  1. Download the database SQL dump files
  1. Uncompress all downloaded files in a directory you create

  2. Create a new R project and install MetafetcheR package

  1. call write_config which is a function that sets the setting to connect to the postgres and automatically creates a database called metafetcher
  • host:“localhost” (this is the local host when you install postgres SQL)
  • port: 5432 (this is the local port when you install postgres SQL)
  • user: “postgres” (this is the local user that is created when you install postgresSQL)
  • password: write the password that you want the data base to be created with
  • path_of_tmp_folder: path to folder that contains the extracted downloaded files.
  • HMDB_file_name: name of the SQL dump file downloaded from HMDB repository
  • ChEBI_file_name: name of the SQL dump file downloaded from ChEBI repository
  • LIPIDMAPS_file_name:name of the SQL dump file downloaded from LIPID MAPS repository

8.call function install_database() for creating the tables and inserting the data from the SQL dump files. Preferably put the folder that has the SQL dump in your R project directory

The install_database() function is only called once to create the MetaFetcheR database, the tables and insert all data from the SQL dump there. This process may take a while (Approximately between 45 mintues to 1 hour)


Example

Create a csv file with input IDs in the following format

example input table
kegg_id hmdb_id chebi_id pubchem_id lipidmaps_id
C07326 HMDB02712 NA 64960 NA
NA HMDB10382 NA 460602 NA
C00956 HMDB00510 NA 469 NA
C02356 HMDB00452 NA 80283 NA
NA NA NA NA NA
C00233 HMDB00695 NA 70 NA
C01089 HMDB00357 NA 441 NA
NA HMDB13701 NA 68328 NA
C00334 HMDB00112 NA 119 NA
C00334 HMDB00112 NA 119 NA
NA HMDB01859 NA 1983 NA
C00417 HMDB00072 NA 643757 NA
C00020 HMDB00045 NA 6083 NA
output table
kegg_id hmdb_id chebi_id pubchem_id lipidmaps_id
C07326 HMDB02712 , HMDB0002712 16070 64960, 64960 NA
C04230 , C089215 HMDB10382 , HMDB0010382 72998, 17504 460602 LMGP01050018
C00956 HMDB00510 , HMDB0000510 37023, 37024 469, 469 , 92136 NA
C02356 HMDB00452 , HMDB0000452 35619 80283, 80283 LMFA01100034
NA NA NA NA NA
C00233 , C013082 HMDB00695 , HMDB0000695 48430 70, 70 NA
C01089 HMDB00357 , HMDB0000357 20067 441, 441 LMFA01050005
NA HMDB13701 , HMDB0013701 88950 68328, 68328 NA
C00334 , C082430 HMDB00112 , HMDB0000112 16865 119, 119 LMFA01100039
C00334 , C082430 HMDB00112 , HMDB0000112 16865 119, 119 LMFA01100039
C06804 , C083640 HMDB01859 , HMDB0001859 46195 1983, 1983 NA
C00417 HMDB00072 , HMDB0000072 32805 643757 NA
C00020 HMDB00045 , HMDB0000045 16027 6083, 6083 NA

To map only a single ID you can use function resolve_single_id

output table
chebi_id hmdb_id lipidmaps_id kegg_id pubchem_id inchi inchikey smiles names formula mass monoisotopic_mass
15412 HMDB0001005 NA C00603 439269 1S/C3H6N2O4/c4-3(9)5-1(6)2(7)8/h1,6H,(H,7,8)(H3,4,5,9)/t1-/m0/s1 NWZYYCVIOKVTII-SFOWXEAESA-N NC(=O)NC@@HC(O)=O , C(C(=O)O)(NC(=O)N)O,C@H(NC(=O)N)O ureidoglycolate , (-)-ureidoglycolic acid , (S)-Ureidoglycolate;,(-)-Ureidoglycolate , (2S)-2-hydroxy-2-ureido-acetic acid , (2S)-2-(carbamoylamino)-2-hydroxyacetic acid , (2S)-2-(carbamoylamino)-2-hydroxyacetic acid , (2S)-2-(aminocarbonylamino)-2-oxidanyl-ethanoic acid C3H6N2O4 134.0907, 134.0908, 134.0900 134.0328, 134.0328, 134.0328, 134.0328

Citation

Yones SA, Csombordi R, Komorowski J, and Diamanti K. MetaFetcheR: An R package for complete mapping of small compound data, bioRxiv, March 2021.