Skip to content

Horizontal & Vertical Formats

dgraphpandas takes two kinds of input; vertical and horizontal. In both instances they are expected to be in csv format.

Horizontal

Horizontal follows a tabular structure and is probably the more likely format found out in the wild. It might look like this:

customer_id    weight    height
1              90        190
2              23        120
3              100       56

When you provide the subject fields as ['customer_id'] then dgraphpandas will be treat the rest of the columns as data values. It will be pivoted like this:

customer_id    predicate    object
1              weight       90
1              height       190
2              weight       23
2              height       120
3              weight       100
3              height       56

Then along with the options provided within the passed configuration then the output RDF might look like this:

<customer_1>     <weight>       "90"^^<xs:int> .
<customer_1>     <height>       "190"^^<xs:int> .
<customer_2>     <weight>       "23"^^<xs:int> .
<customer_2>     <height>       "120"^^<xs:int> .
<customer_3>     <weight>       "100"^^<xs:int> .
<customer_3>     <height>       "56"^^<xs:int> .

Where customer_ was appended as it was defined as the type for this export and types were associated because it was defined inside type_overrides.

Vertical

Vertical transformation is very similar to the above Horizontal explanation but we skip the initial pivoting step as the data is already looks like customer_id, predicate, object.

Edges

Edges are derived from the edge_fields defined inside the file level configuration and they are sent just like data fields from the input file.

As they are defined in edge_fields, dgraphpandas will split these out and treat them slightly differently during transformation and generation of the RDF output.

For example if we had an E-Commerce Orders table:

order_id    customer_id    store_id
5           1              1
9           2              2
2           3              1

And we had a configuration like this:

{
    "transform": "horizontal",
    "files": {
       "order": {
            "subject_fields": ["order_id"],
            "edge_fields": ["customer_id", "store_id"]
        }
    }
}

Then the output RDF would look like this:

<order_5> <customer> <customer_1> .
<order_9> <customer> <customer_2> .
<order_2> <customer> <customer_3> .
<order_5> <store> <store_1> .
<order_9> <store> <store_2> .
<order_2> <store> <store_1> .

Where each of the orders has been associated with it's customer and store.