FlightGoggles : Writing a custom interface to the FlightGoggles binary

This page describes writing a custom interface/ extending the client API to the FlightGoggles Binary. The message passing between the client and renderer is performed using TCP via ZeroMQ. The outgoing and incoming message are serialized as JSON.

Outgoing Message

To request a render, or modify the positions of objects (including cameras) the client needs to send an outgoing state message. The state message has the following fields:

sceneIsInternal: Type boolean. This must always be set to true. This could be used to load external scenes in the Unity editor through TriLib (This is currently unsupported).
sceneFilename: Type string. This is set to the scene that should be loaded. This can currently be set to Abandoned_Factory_Sunset, Stata_GroundFloor or Stata_Basement.
ntime: Type int64_t. This is the timestamp of the requested render in nanoseconds and can be specified as a 64 bit integer.
camWidth: Type int. This is used to specify the width in pixels of the requested render.
camHeight: Type int. This is used to specify the height in pixels of the requested render.
camFOV: Type float. This is used to specify the camera field of view.
camDepthScale. Type double. This is used to specify the depth resolution of the camera, e.g. 0.02 corresponds to a resolution of 2 cm.
cameras. Type List<Camera>(see Camera type). This is used to specify the list of cameras that are required from the render binary.
objects. Type List<Objects>(see Object type). This is used to specify the list of objects and their poses in the environment.

An example of this struct in c++ is provided below:

struct StateMessage_t
{
  bool sceneIsInternal = true;
  std::string sceneFilename = "Museum_Day_Small";
  int64_t ntime;
  int camWidth = 1024;
  int camHeight = 768;
  float camFOV = 70.0f;
  double camDepthScale = 0.20; 
  std::vector<Camera_t> cameras;
  std::vector<Object_t> objects;
  };

Camera Type

The camera type has the following fields:

ID: type string. This field is used to specify the unique id of the camera.
position: type List<double>. This field is a list with 3 elements used to specify the translation of the camera in the environment. Note, this is specified in Unity co-ordinates (X Right, Y Up, Z Forward)
rotation. type List<double>. This field is a list with 4 elements used to specify the rotation of the camera in the environment as a quaternion.
channels: type int. This field is used to specify the number of channels. This can be set to 1 for grayscale and 3 for RGB/ Semantic cameras.
isDepth: type boolean. This field is used to specify if this camera is a depth camera.
outputIndex: type int. This field is used to specify the output index of the camera in the incoming data packet.
hasCollisionCheck: type boolean. This field is used to specify if this camera should check for collisions.
doesLandmarkVisCheck: type boolean. This field is used to specify if this camera should check for visible infra red beacons.

An example struct in c++ is shown below.

struct Camera_t
{
  std::string ID;
  std::vector<double> position;
  std::vector<double> rotation;
  int channels;
  bool isDepth;
  int outputIndex;
  bool hasCollisionCheck = true;
  bool doesLandmarkVisCheck = false;
};

Object Type

The object type has the following fields:

ID: type string. This field is used to specify the unique object ID.
prefabID: type string. This field specifies the name of the prefab object to instantiate and place. This must match a prefab in the Resources folder of the binary.
position: type List<double>. This field specifies the 3 translation elements of the object in the environment. Note, this is specified in Unity co-ordinates (X right, Y up, Z forward).
rotation: type List<double>. This field specifies the rotation of the object in the environment as a quaternion.
size: type List<double>. This field specifies the scaling of the object in the x, y, and z axis.

An example of the object structure in c++ is shown below:

struct Object_t
{
  std::string ID;
  std::string prefabID;
  std::vector<double> position;
  std::vector<double> rotation;
  std::vector<double> size;
};

Incoming Message

The incoming message from Unity has the following fields:

renderMetadata: type RenderMetadata_t. This field specifies all the returned render metadata.
images: type List<Mat>. This field returns all the requested renderers as a list.

An example of the struct in c++ is shown below:

struct RenderOutput_t
{
  RenderMetadata_t renderMetadata;
  std::vector<cv::Mat> images;
};

RenderMetadata_t type

The render metadata type structure has the following fields:

ntime: Type int64_t. This field specifies the returns the timestamp in nanoseconds as a 64 bit integer.
camWidth: Type int. This field returns the camera width of the rendered image.
camHeight: Type int. This field returns the camera height of the rendered image.
camDepthScale: Type double. This field returns the camera depth scale.
cameraIDs: Type List<string>. This field returns the list of rendered camera IDs.
channels: Type List<int>. This field returns the number of channels in each rendered camera as a list.
hasCameraCollision: Type boolean. This field returns the collision state of the camera.
lidarReturn: Type float. This field returns the height measurement measured by the downward facing lidar.

An example c++ struct is shown below.

struct RenderMetadata_t
{
  int64_t ntime;
  int camWidth;
  int camHeight;
  double camDepthScale;
  std::vector<std::string> cameraIDs;
  std::vector<int> channels;
  bool hasCameraCollision;
  float lidarReturn;
};