MetaFetcheR is an R package designed to link metabolites IDs from different Metabolome databases with eachother in a step to resolve ambiguity and standardize metabolites representation and annotation. Currently the package supports resolving IDs for the following databases:
Uncompress all downloaded files in a directory you create
Create a new R project and install MetafetcheR package
8.call function install_database() for creating the tables and inserting the data from the SQL dump files. Preferably put the folder that has the SQL dump in your R project directory
The install_database() function is only called once to create the MetaFetcheR database, the tables and insert all data from the SQL dump there. This process may take a while (Approximately between 45 mintues to 1 hour)
Create a csv file with input IDs in the following format
kegg_id | hmdb_id | chebi_id | pubchem_id | lipidmaps_id |
---|---|---|---|---|
C07326 | HMDB02712 | NA | 64960 | NA |
NA | HMDB10382 | NA | 460602 | NA |
C00956 | HMDB00510 | NA | 469 | NA |
C02356 | HMDB00452 | NA | 80283 | NA |
NA | NA | NA | NA | NA |
C00233 | HMDB00695 | NA | 70 | NA |
C01089 | HMDB00357 | NA | 441 | NA |
NA | HMDB13701 | NA | 68328 | NA |
C00334 | HMDB00112 | NA | 119 | NA |
C00334 | HMDB00112 | NA | 119 | NA |
NA | HMDB01859 | NA | 1983 | NA |
C00417 | HMDB00072 | NA | 643757 | NA |
C00020 | HMDB00045 | NA | 6083 | NA |
library(metafetcher)
df.res <- read.csv("discovery.csv", stringsAsFactors=FALSE)
resp <- resolve_metabolites(df.res)
print(resp$df)
kegg_id | hmdb_id | chebi_id | pubchem_id | lipidmaps_id |
---|---|---|---|---|
C07326 | HMDB02712 , HMDB0002712 | 16070 | 64960, 64960 | NA |
C04230 , C089215 | HMDB10382 , HMDB0010382 | 72998, 17504 | 460602 | LMGP01050018 |
C00956 | HMDB00510 , HMDB0000510 | 37023, 37024 | 469, 469 , 92136 | NA |
C02356 | HMDB00452 , HMDB0000452 | 35619 | 80283, 80283 | LMFA01100034 |
NA | NA | NA | NA | NA |
C00233 , C013082 | HMDB00695 , HMDB0000695 | 48430 | 70, 70 | NA |
C01089 | HMDB00357 , HMDB0000357 | 20067 | 441, 441 | LMFA01050005 |
NA | HMDB13701 , HMDB0013701 | 88950 | 68328, 68328 | NA |
C00334 , C082430 | HMDB00112 , HMDB0000112 | 16865 | 119, 119 | LMFA01100039 |
C00334 , C082430 | HMDB00112 , HMDB0000112 | 16865 | 119, 119 | LMFA01100039 |
C06804 , C083640 | HMDB01859 , HMDB0001859 | 46195 | 1983, 1983 | NA |
C00417 | HMDB00072 , HMDB0000072 | 32805 | 643757 | NA |
C00020 | HMDB00045 , HMDB0000045 | 16027 | 6083, 6083 | NA |
To map only a single ID you can use function resolve_single_id
library(metafetcher)
resp1 <- resolve_single_id('hmdb_id', 'HMDB0001005')
df.out1 <- resp1$df
print(df.out1)
chebi_id | hmdb_id | lipidmaps_id | kegg_id | pubchem_id | inchi | inchikey | smiles | names | formula | mass | monoisotopic_mass |
---|---|---|---|---|---|---|---|---|---|---|---|
15412 | HMDB0001005 | NA | C00603 | 439269 | 1S/C3H6N2O4/c4-3(9)5-1(6)2(7)8/h1,6H,(H,7,8)(H3,4,5,9)/t1-/m0/s1 | NWZYYCVIOKVTII-SFOWXEAESA-N | NC(=O)NC@@HC(O)=O , C(C(=O)O)(NC(=O)N)O,C@H(NC(=O)N)O | ureidoglycolate , (-)-ureidoglycolic acid , (S)-Ureidoglycolate;,(-)-Ureidoglycolate , (2S)-2-hydroxy-2-ureido-acetic acid , (2S)-2-(carbamoylamino)-2-hydroxyacetic acid , (2S)-2-(carbamoylamino)-2-hydroxyacetic acid , (2S)-2-(aminocarbonylamino)-2-oxidanyl-ethanoic acid | C3H6N2O4 | 134.0907, 134.0908, 134.0900 | 134.0328, 134.0328, 134.0328, 134.0328 |
Yones SA, Csombordi R, Komorowski J, and Diamanti K. MetaFetcheR: An R package for complete mapping of small compound data, bioRxiv, March 2021.