Customize new datasets and tasks
To handle the new datasets, one needs to 1) process the data into either standard raw-data format or the PyG compatible format, and 2) customize a dataprocessor 3) customize a runner for the dataset.
The Data Format that DGRL-Hardware asscpets
Raw Data
DGRL accepts the following csv format to store the raw data:
file name |
description |
|---|---|
edge.csv node-feat.csv num-edge-list.csv num-node-list.csv edge-feat.csv flexible |
saves all the edges saves all the edge feature saves the number of edges in each graph saves the number of nodes in each graph saves the edge features of each graph may save the labels |
Examples of the raw data can be found at ./data_raw/.
DataProcessor to handle the raw data
One need to customize a Data processor to process the raw data into PyG compatible data. The file name should be NEWDATA_data_processor.py (e.g. AMP_data_processor), saved in the folder ./data_processor.
A tutorial to customize such data processor is as follows:
class [$NEWDATA]DataProcessor(InMemoryDataset):
def __init__(self, config, mode):
# one may directly follow/copy the implementation of the initialization of data processors in existing datasets
def process(self):
# here to process the raw data into the PyG compatible data format
def read_csv_graph_raw(self, raw_dir, check_repeat_edge):
# this is the key function to process .csv files into the PyG data,
# for setails please see https://github.com/Graph-COM/Benchmark_for_DGRL_in_Hardwares/tree/main/DGRL_Hardware/data_processor.
Runner to Run with the New Datasets
One also needs to customize a Runner to run with the new dataset. The file name should be NEWDATA_runner.py (e.g. AMP_runner.py). saved in the folder ./runner.
A tutorial to customize such runner is as follows:
class [$NewData]Runner():
def __init__(self, config):
# one may follow/copy the implementation of any existing DatasetRunners.
def train_ray(self, tune_parameter_config):
# the function that tuning with RAY would call for training, one may refer to the implementation of existing datasets
# All the datasets/tasks share almost the same implementation
def train(self):
# the function that evaluation would call for training, one may refer to the implementation of existing datasets
# All the datasets/tasks share almost the same implementation
def raytune(self):
# the function to load hyper-parameter design space
# All the datasets/tasks share almost the same implementation
def train_one_epoch(self, data_loader, mode, epoch_idx):
# One may need to customize due to the difference in evaluation metrics
def test(self, load_statedict = True, test_num_idx = 0):
# One may need to customize due to the difference in evaluation metrics
The other functions in the Runner class share the same implementation across the datasets.