MetaFetcheR is an R package designed to link metabolites IDs from different Metabolome databases with eachother in a step to resolve ambiguity and standardize metabolites representation and annotation. Currently the package supports resolving IDs for the following databases:

Human Metabolome Database (HMDB)
Chemical Entities of Biological Interest (ChEBI)
PubChem
Kyoto Encyclopedia of Genes and Genomes (KEGG)
Lipidomics Gateway (LipidMaps)

Installation

Install postgressql database on your system and create a user,you can download the database from here
Install devtools in R

install.packages("devtools")

Install MetafetcheR package

library(devtools)
devtools::load_all()
install_github("komorowskilab/metafetcher")

Download the database SQL dump files

Uncompress all downloaded files in a directory you create
Create a new R project and install MetafetcheR package

library(devtools)
install_github("komorowskilab/metafetcher")
library(metafetcher)

call write_config which is a function that sets the setting to connect to the postgres and automatically creates a database called metafetcher

write_config(host,port,db_name,user,password,path_of_tmp_folder,HMDB_file_name,ChEBI_file_name,LIPIDMAPS_file_name)

host:“localhost” (this is the local host when you install postgres SQL)
port: 5432 (this is the local port when you install postgres SQL)
user: “postgres” (this is the local user that is created when you install postgresSQL)
password: write the password that you want the data base to be created with
path_of_tmp_folder: path to folder that contains the extracted downloaded files.
HMDB_file_name: name of the SQL dump file downloaded from HMDB repository
ChEBI_file_name: name of the SQL dump file downloaded from ChEBI repository
LIPIDMAPS_file_name:name of the SQL dump file downloaded from LIPID MAPS repository

8.call function install_database() for creating the tables and inserting the data from the SQL dump files. Preferably put the folder that has the SQL dump in your R project directory

The install_database() function is only called once to create the MetaFetcheR database, the tables and insert all data from the SQL dump there. This process may take a while (Approximately between 45 mintues to 1 hour)

install_database()

Example

Create a csv file with input IDs in the following format

example input table
kegg_id	hmdb_id	chebi_id	pubchem_id	lipidmaps_id
C07326	HMDB02712	NA	64960	NA
NA	HMDB10382	NA	460602	NA
C00956	HMDB00510	NA	469	NA
C02356	HMDB00452	NA	80283	NA
NA	NA	NA	NA	NA
C00233	HMDB00695	NA	70	NA
C01089	HMDB00357	NA	441	NA
NA	HMDB13701	NA	68328	NA
C00334	HMDB00112	NA	119	NA
C00334	HMDB00112	NA	119	NA
NA	HMDB01859	NA	1983	NA
C00417	HMDB00072	NA	643757	NA
C00020	HMDB00045	NA	6083	NA

library(metafetcher)
df.res <- read.csv("discovery.csv", stringsAsFactors=FALSE)
resp <- resolve_metabolites(df.res)
print(resp$df)

output table
kegg_id	hmdb_id	chebi_id	pubchem_id	lipidmaps_id
C07326	HMDB02712 , HMDB0002712	16070	64960, 64960	NA
C04230 , C089215	HMDB10382 , HMDB0010382	72998, 17504	460602	LMGP01050018
C00956	HMDB00510 , HMDB0000510	37023, 37024	469, 469 , 92136	NA
C02356	HMDB00452 , HMDB0000452	35619	80283, 80283	LMFA01100034
NA	NA	NA	NA	NA
C00233 , C013082	HMDB00695 , HMDB0000695	48430	70, 70	NA
C01089	HMDB00357 , HMDB0000357	20067	441, 441	LMFA01050005
NA	HMDB13701 , HMDB0013701	88950	68328, 68328	NA
C00334 , C082430	HMDB00112 , HMDB0000112	16865	119, 119	LMFA01100039
C00334 , C082430	HMDB00112 , HMDB0000112	16865	119, 119	LMFA01100039
C06804 , C083640	HMDB01859 , HMDB0001859	46195	1983, 1983	NA
C00417	HMDB00072 , HMDB0000072	32805	643757	NA
C00020	HMDB00045 , HMDB0000045	16027	6083, 6083	NA

To map only a single ID you can use function resolve_single_id

library(metafetcher)
resp1 <- resolve_single_id('hmdb_id', 'HMDB0001005')

df.out1 <- resp1$df
print(df.out1)

output table
chebi_id	hmdb_id	lipidmaps_id	kegg_id	pubchem_id	inchi	inchikey	smiles	names	formula	mass	monoisotopic_mass
15412	HMDB0001005	NA	C00603	439269	1S/C3H6N2O4/c4-3(9)5-1(6)2(7)8/h1,6H,(H,7,8)(H3,4,5,9)/t1-/m0/s1	NWZYYCVIOKVTII-SFOWXEAESA-N	NC(=O)NC@@HC(O)=O , C(C(=O)O)(NC(=O)N)O,C@H(NC(=O)N)O	ureidoglycolate , (-)-ureidoglycolic acid , (S)-Ureidoglycolate;,(-)-Ureidoglycolate , (2S)-2-hydroxy-2-ureido-acetic acid , (2S)-2-(carbamoylamino)-2-hydroxyacetic acid , (2S)-2-(carbamoylamino)-2-hydroxyacetic acid , (2S)-2-(aminocarbonylamino)-2-oxidanyl-ethanoic acid	C3H6N2O4	134.0907, 134.0908, 134.0900	134.0328, 134.0328, 134.0328, 134.0328

Citation

Yones SA, Csombordi R, Komorowski J, and Diamanti K. MetaFetcheR: An R package for complete mapping of small compound data, bioRxiv, March 2021.

MetaFetcheR: An R package for complete mapping of small compounds data

Installation

Example

Citation