How to import data using project APIs
Requirement
Use following endpoints to initialize the data model and the dataset into which the data must be imported :
Import data
Use the import endpoint by specifying the dataset id.
To see the process import, use this endpoint with the dataset version id returned by the import request.
During indexation items are split in chunks, it’s necessary to reduce the chunk size if data has a lot of properties, or if their values are large, so as not to saturate memory.
On the other hand, it is possible to speed up the import by increasing the size of the chunk if there is a lot of "small" data.
To do this, go to the desired environment and edit the value of import-chunk-size
in the data-virt
configmap, then restart the pod.
CSV import
The first line of the csv must contain the data model attributes. All attributes must be specified, even if there is no value.
The separator used must be a semicolon (;
). Encoding is utf-8
.
The csv import will be much less effective if it contains more than 150 columns.
Let car
be a data model with the following attributes:
-
model
: text field -
color
: text field -
nbDoors
: numeric field -
creationDate
: instant field -
position
: point field
The corresponding csv could be :
model;color;nbDoors;creationDate;position
nissan;red;3;1977-04-22T00:00:00Z;"{""type"": ""Point"",""coordinates"": [102.0, 0.5]}"
peugeot;blue;5;1968-09-06T00:00:;"{""type"": ""Point"",""coordinates"": [208.0, 0.10]}"
Or
model;nbDoors;color;creationDate;position
nissan;3;red;1977-04-22T00:00:00Z;"{""type"": ""Point"",""coordinates"": [102.0, 0.5]}"
peugeot;;blue;;;
Geo attributes
Geo data can be added to the csv as follows :
Multivalued attributes
The separator used for multivalued attributes is a pipe (|
).
valueA
and valueB
valueA|valueB
[102.0, 0.5]
and [92.0, 2.5]
"{""type"": ""Point"", ""coordinates"": [102.0, 0.5]}|{""type"": ""Point"", ""coordinates"": [92.0, 2.5]}"
Geo import
Geodetic datum is a reference frame used to measure locations on Earth.
To import geodetic data, it’s necessary to fill in a zipped folder containing at least the .shp
, .dbf
and .shx
files.
The archive may also contain the following files :
-
.cpg
-
.prj
The content-type must be application/shp
.
The attribute carrying the geometry must be named the_geom
in the data model and there can only be one. All attributes must be filled in, even if values are empty.
The geometry must correspond exactly to the attribute type : it’s not possible to have lines and polygon in a dataset. However, it is possible to have lines and multilines. To do this, when importing data you must send params normalizeGeo
to true..
When you create a field, it’s possible to precise the Coordinate Reference System (CRS) of your choice, otherwise it will be WGS84
by default.
All data will be added with the Coordinate Reference System (CRS) precise in the field.
If the precision of the geometry is greater than 15, it will be truncated at 15.