BIDS Conversion

The conversion of a study raw dataset to a BIDS compliant dataset relies on a proper specification. The outcome of such a conversion is again a datalad dataset. It contains a reference to the study dataset it was build from, but can be used and shared without access to the study dataset.

Dataset creation

First, create the to-be BIDS dataset pretty much the same way as you create a study dataset:

datalad rev-create [TARGET-DIR]

If you don't provide a target dir, the dataset will be created in the current working directory. This will create an empty datalad dataset.

To preconfigure it to be a BIDS dataset however, you need to run a dedicated setup procedure from within the dataset:

cd [TARGET-DIR]
datalad run-procedure setup_bids_dataset

Reference needed input

Now the study dataset to be converted is needed. Install it as a subdataset sourcedata into the BIDS dataset:

datalad install -d . -s [PATH TO STUDY DATASET] sourcedata

Note, that we are assuming you created your study dataset according to the setup page or the accompanying demo. That is, it has the toolbox installed as a subdataset.

Conversion

Assuming that the study dataset comes with a proper study specification, you can now convert it by calling:

datalad hirni-spec2bids --anonymize sourcedata/studyspec.json sourcedata/*/studyspec.json

Several things are to be noted here. First, there is a switch --anonymize. This is optional and ensures that within the resulting BIDS dataset subjects are referred to only by their anon_subject ID according to the specification. There shouldn't be any hint on the original subject ID in the commit messages or paths in the new dataset. By default this should also run a defacing routine, that should be specified in an acquisition's specification file within a snippet of type dicomseries:all. You can change this by editing this specification.

Furthermore, you may notice that the call as shown above references not the study dataset to be converted but the specification files. This means, you don't need to convert the entire dataset at once. You can also convert a single acquisition instead. In fact, you can even have several specification files per acquisition and run a conversion based on a single file. Further limitation is available via the option --only-type, which allows to convert only snippets of that particular type.

Dropping raw data

Finally, you can uninstall the source dataset by running:

datalad uninstall -d . -r --nocheck sourcedata

This will leave you with just the BIDS dataset. It still contains a reference to the data it was derived from, but doesn't contain that data.

Demo: Conversion to BIDS

This demo shows how to convert a hirni study dataset into a BIDS compliant dataset. The study dataset we use is the one created by the study dataset demo. We will use a published version of that dataset available from github, but you can also build it yourself by following said demo and use that one.

BIDS Dataset

The idea is to create a new dataset, that will become the BIDS Dataset and reference our study dataset - that bundles all the raw data - by making it a subdataset of the derived one. Please note, that this does NOT mean, that the new BIDS dataset contains the raw data. It just references it and thereby creates a fully reproducible history record of how it came to be. The study dataset does NOT need to be shared if you want to share the BIDS dataset. Rather it is possible to trace everything back to the original raw data for everyone who has the BIDS dataset IF he also has access/permission to get that subdataset.

In order to get our to-be BIDS dataset from the raw dataset, we create a new dataset and run the setup_bids_dataset procedure to configure it:

% datalad rev-create demo_bids
% cd demo_bids
% datalad run-procedure setup_bids_dataset

Now we install our study dataset as a subdataset into our new dataset at its subdirectory sourcedata. By that, we reference the exact state of our study dataset at the moment of installation. While this may create some data duplication, please note several things: First, the new subdataset doesn't need to hold all of the actual content of the study dataset's files (although it can retrieve it, it doesn't by default during installation). Rather it's about referencing the input data (including the code and environments in hirni's toolbox) at their exact version to achieve full reproducibility. We can thereby track the converted data back to the raw data and the exact conversion routine that brought it into existence. Second, this subdataset can later be removed by datalad uninstall, freeing the space on the filesystem while keeping the reference:

% datalad install --dataset . --source https://github.com/psychoinformatics-de/hirni-demo sourcedata --recursive

Note, that if you want to use a local study dataset (i.e. created yourself via the study dataset demo) you can simply replace that URL with the path to your local one.

The actual conversion is based on the specification files in the study dataset. You can convert a single one of them (meaning: Everything such a file specifies) or an arbitrary number, including everything at once, of course. Lets first convert the study level specification and second all the acquisitions by the following call:

% datalad hirni-spec2bids --anonymize sourcedata/studyspec.json sourcedata/*/studyspec.json

The anonymize switch will cause the command to use the anonymized subject identifiers and encode all records of where exactly the data came from into hidden sidecar files, that can tha be excluded from publishing/sharing this dataset.

datalad hirni-spec2bids will run datalad procedures on the raw data as specified in the specification files (remember for example that we set a procedure "copy-converter" for our events.tsv file). Those procedures are customizable. The defaults we are using here, come from hirni's toolbox dataset. The default procedure to convert the DICOM files uses a containerized converter. It will NOT use, what you happen to have locally, but this defined and in the datasets referenced environment to do the conversion. This requires a download of that container (happens automatically) and enables the reproducibility of this routine, since the exact environment the conversion was ran in will be recorded in the dataset's history. In addition, this will cause datalad to retrieve the actual data of the study subdataset in sourcedata. Remember that you can datalad uninstall that subdataset after conversion or use datalad drop to throw away its copy of particular files. If you use the BIDS-Validator (https://bids-standard.github.io/bids-validator/) to check the resulting dataset, there should be an error message, though. This is because our events.tsv file references stimuli files, we don't actually have available to add to the dataset. For the purpose of this demo, this should be fine.

Other than that, we have a valid BIDS dataset now, that can be used with BIDS-Apps or any kind of software that is able to deal with this standard. Since we have the raw data in a subdataset, we can aggregate DICOM metadata from it into the BIDS dataset, which would be available even when the study dataset was uninstalled from the BIDS dataset. If we keep using datalad-run / datalad containers-run for any processing to follow (as hirni internally does), we are able to trace back the genesis and evolution of each file to the raw data, the exact code and the environments it ran in to alter this file or bring it into its existence.