NB5 Docs► Reference Section► Binding Functions▼ DNN_angular1_v 🖺

DNN_angular1_v

DNN_euclidean_neighbors

Compute the indices of the neighbors of a given v using DNN mapping. To avoid ambiguity on equidistant neighbors, odd neighborhood sizes are preferred.

DNN_euclidean_v

DNN_euclidean_v_series

DNN_euclidean_v_wrap

This represents an enumerated population of vectors of some dimension, where any ordinal values which address outside of that enumeration simply wrap with modulo.

DoubleArrayCache

Precompute the interior double[] values to use as a LUT.

DoubleCache

Precompute the interior double[] values to use as a LUT.

DoubleVectorPadLeft

Prefix the incoming array with an empty double[] so that it is sized up to at least the given size. If it is already at least that size, pass it through as-is.

DoubleVectorPadRight

Suffix the incoming array with an empty double[] so that it is sized up to at least the given size. If it is already at least that size, pass it through as-is.

DoubleVectorPrefix

Prefix the incoming array with an empty double[] of the given size.

DoubleVectorSuffix

Suffix the incoming array with an empty double[] of the given size.

DoubleVectors

This is a version of the NoSQLBench {@link io.nosqlbench.virtdata.library.basics.shared.util.Combiner} which is especially suited to constructing unique sequences of doubles. This can be to create arbitrarily long vectors in double[] form, where each vector corresponds to a specific character encoding. Based on the maximum cardinality of symbol values in each position, a step function on the unit interval is created for you and used as a source of magnitudes.

For example, with a combiner spec of "{@code a-yA-Y*1024}", the "{@code }a-yA-Y" part creates a character set mapping for 50 distinct indexed character values with the letter acting as a code, and then the "{@code *1024}" repeats ths mapping over 1024 digits of values, which are then concatenated into an array of values as a uniquely encoded vector. In actuality, the internal model is computed separately from the character encoding, so is efficient, although the character encoding can be used to uniquely identify each vector.

Note that as with other combiner forms, you can specify a different cardinality for each position, although the automatically computed step function for unit-interval will be based on the largest cardinality. It is not computed separately for each position. Thus, a specifier like "{@code a-z*5;0-9*2}" will only see the last two positions using a fraction of the possible magnitudes, as the a-z element has the most steps at 26 between 0.0 and 1.0.

FloatVectorPadLeft

Prefix the incoming array with an empty float[] so that it is sized up to at least the given size. If it is already at least that size, pass it through as-is.

FloatVectorPadRight

Suffix the incoming array with an empty float[] so that it is sized up to at least the given size. If it is already at least that size, pass it through as-is.

FloatVectorPrefix

Prefix the incoming array with an empty float[] of the given size.

FloatVectorSuffix

Suffix the incoming array with an empty double[] of the given size.

HashedDoubleVectors

Construct an arbitrarily large vector with hashes. The initial value is assumed to be non-hashed, and is thus hashed on input to ensure that inputs are non-contiguous. Once the starting value is hashed, the sequence of long values is walked and each value added to the vector is hashed from the values in that sequence.

HashedFloatVectors

Construct an arbitrarily large float vector with hashes. The initial value is assumed to be non-hashed, and is thus hashed on input to ensure that inputs are non-contiguous. Once the starting value is hashed, the sequence of long values is walked and each value added to the vector is hashed from the values in that sequence.

HdfDatasetToCqlPredicates

Binding function that accepts a long input value for the cycle and returns a string consisting of the CQL predicate parsed from a single record in an HDF5 dataset

HdfDatasetToString

This function reads a vector dataset from an HDF5 file. The entire dataset is parsed into a single String Object with the discreet values separated by the user supplied separator character. It is intended for use only with small datasets where the entire dataset can be read into memory and there is no need to read individual vectors from the dataset. The lambda function simply returns the String representation of the dataset.

HdfDatasetToStrings

This function reads a dataset of any supported type from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value.

HdfDatasetsToString

HdfFileToFloatArray

This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.

This implementation is specific to returning an array of floats

HdfFileToFloatList

This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.

This implementation is specific to returning a List of Floats, so as to work with the normalization functions e.g. NormalizeListVector and its variants.

HdfFileToIntArray

This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.

This implementation is specific to returning an array of ints

HdfFileToIntList

This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.

This implementation is specific to returning a List of Integers

HdfFileToLongArray

This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.

This implementation is specific to returning an array of longs

HdfFileToLongList

This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.

This implementation is specific to returning a List of Longs

NormalizeCqlVector

Normalize a vector in List form, calling the appropriate conversion function depending on the component (Class) type of the incoming List values.

NormalizeDoubleListVector

Normalize a vector.

NormalizeDoubleVector

NormalizeFloatListVector

Normalize a vector.

NormalizeFloatVector

NormalizeListVector

Normalize a vector in List form, calling the appropriate conversion function depending on the component (Class) type of the incoming List values.

RepeatList

Repeat the incoming list into a new list, filling it to the given size.

ToCqlVector

ToFloatVector

TokenMapFileCycle

Utility function used for advanced data generation experiments.

TokenMapFileNextCycle

Utility function used for advanced data generation experiments.

TokenMapFileNextToken

Utility function used for advanced data generation experiments.

TokenMapFileToken

Utility function used for advanced data generation experiments.

TriangularStep

Compute a value which increases monotonically with respect to the cycle value. All values for f(X+(m>=0)) will be equal or greater than f(X). In effect, this means that with a sequence of monotonic inputs, the results will be monotonic as well as clustered. The values will approximate input/average, but will vary in frequency around a simple binomial distribution.

The practical effect of this is to be able to compute a sequence of values over inputs which can act as foreign keys, but which are effectively ordered.

Call for Ideas

Due to the complexity of generalizing this as a pure function over other distributions, this is the only function of this type for now. If you are interested in this problem domain and have some suggestions for how to extend it to other distributions, please join the project or let us know.

UniformVectorSizedStepped

Create a vector which consists of a number of uniform vector ranges. Each range is set as [min,max] inclusive by a pair of double values such as 3.0d, 5.0d, ... You may provide an initial integer to set the number of components in the vector. After the initial (optional) size integer, you may provide odd, even pairs of min, max. If a range is not specified for a component which is expected from the size, the it is automatically replaced with a unit interval double variate.

Back to top