Exported Data Formats

.edge (produced by export_utilities)

'Node1_id' (str):       the internal identifier for the source node
                        of the edge
'Node2_id' (str):       the internal identifier for the target node of
                        the edge
'Edge_weight' (float):  normalized weight of the edge in the subnetwork
'Edge_type' (str):      subnetwork edge type for the edge
'Source_id' (str):      internal identifier for the public source file
                        the edge was extracted from
'Line_num' (int):       original line number of edge information in the
                        public source file

.node_map (produced by export_utilities)

'Internal_id' (str):        the internal identifier for a node in the
                            subnetwork
'Mapped_id' (str):          the mapped internal identifier for a node
                            in the subnetwork
'Node_type' (str):          type of node 'Gene' or 'Property'
'Node_alias' (str):         common name for network node
'Node_description' (str):   full name/description for network node

.pnode_map (produced by export_utilities)

  • This file is produced only for Property type subnetworks and contains information nodes about the property nodes of the subnetwork in the same format as .node_map file.

.metadata (produced by export_utilities)

  • This yaml file contains information about the extracted Knowledge Network subnetwork. Its keys include summarizations about the network size (“data”), its public data source details (“datasets”), information about the meaning of its edges (“edge_type”), and some commands and configurations used in its construction (“export”).

Internal Data Formats

file_metadata (produced by check_utilities and updated by fetch_utilities)

'alias' (str):              the alias name
'alias_info' (str):         a short string with information
                            about the alias
'checksum' (str):           md5 checksum of the downloaded file
'dependencies' (list):      list of other aliases that the alias
                            depends on
'fetch_needed' (bool):      True if file needs to be downloaded
                            from remote source. A fetch will
                            be needed if the local file does
                            not exist, or if the local and
                            remote files have different date
                            modified or file sizes
'file_exists' (bool):       boolean representing if file is
                            already present on local disk
'is_map' (bool):            boolean representing if alias is used for
                            source specific mapping of nodes or edges
'line_count' (int):         line count of the downloaded file
'local_file_name' (str):    name of the downloaded file on local disk
'num_chunks' (int):         number of raw_line chunks downloaded
                            file is split into
'remote_date' (float):      modification date of file on the remote
                            source
'remote_file' (str):        file to extract if remote file
                            location is a directory
'remote_size' (int):        size of file on the remote source
'remote_url' (str):         url of file on the remote source
'remote_version' (str):     release version of the remote source
'source' (str):             the source name

rawline (produced by fetch_utilities)

'line_hash' (str):  md5 checksum of line_str field
'line num' (int):   line number in downloaded file
'file_id' (str):    processed name of downloaded file
'line_str' (str):   original line string from downloaded source

table (produced by table_utilities)

'line_hash' (str):  md5 checksum of original line string from source
'n1name' (str):     node 1 name to map from original source
'n1hint' (str):     suggestion of node 1 name type to aid mapping
'n1type' (str):     type of node 1 ('Gene', 'Property')
'n1spec' (int):     taxon id of node 1 species, 0 if property,
                    'unknown' otherwise
'n2name' (str):     node 2 name to map from original source
'n2hint' (str):     suggestion of node 2 name type to aid mapping
'n2type' (str):     type of node 2 ('Gene', 'Property')
'n2spec' (int):     taxon id of node 2 species, 0 if property,
                    'unknown' otherwise
'et_hint' (str):    name / hint of edge type
'weight' (float):   score for edge
'table_hash' (str): md5 checksum of raw edge generated from source line

edge_meta (produced by table_utilities)

'line_hash' (str):  md5 checksum of original line string from source
'info_type' (str):  type of metadate: 'reference', 'experiment', etc
'info_desc' (str):  description string of metadata

node_meta (produced by table_utilities)

'node_id' (str):    mapped node identifier
'info_type' (str):  type of metadata ('alt_alias', 'link', etc)
'info_desc' (str):  description string of metadata

node (produced by table_utilities)

'node_id' (str):    node identifier
'n_alias' (str):    alternate name for node
'n_type' (str):     type of node ('Gene', 'Property')

edge (produced by conv_utilities)

'edge_hash' (str):  md5 checksum of mapped edge
'n1_id' (str):      node 1 mapped identifier
'n2_id' (str):      node 2 mapped identifier
'et_name' (str):    name edge type
'weight' (float):   score for edge type

edge2line (produced by conv_utilities)

'edge_hash' (str):  md5 checksum of mapped edge
'line_hash' (str):  md5 checksum of original line string from source

status (produced by conv_utilities)

'table_hash' (str):     md5 checksum of raw edge generated from source
                        line
'n1_id' (str):          node 1 mapped identifier
'n2_id' (str):          node 2 mapped identifier
'et_name' (str):        name edge type
'weight' (float):       score for edge type
'edge_hash' (str):      md5 checksum of mapped edge
'line_hash' (str):      md5 checksum of original line string from
                        source
'status' (str):         "production" if both nodes mapped and
                        "unmapped" otherwise
'status_desc' (str):    description of reason for status label