Compute the indices of the neighbors of a given v using DNN mapping. To avoid ambiguity on equidistant neighbors, odd neighborhood sizes are preferred.
int -> DNN_angular1_neighbors(int: k, int: N, int: modulus) -> int[]
- notes: @param k The size of neighborhood @param N The number of total vectors, necessary for boundary conditions of defined vector @param modulus The modulus used during training of angular1 data; this corresponds to how periodically we cycle back to vectors with the same angle (hence have angular distance zero between them)
Compute the indices of the neighbors of a given v using DNN mapping. To avoid ambiguity on equidistant neighbors, odd neighborhood sizes are preferred.
int -> DNN_euclidean_neighbors(int: k, int: N, int: D) -> int[]
- notes: @param k The size of neighborhood @param N The number of total vectors, necessary for boundary conditions of defined vector @param D Number of dimensions in each vector
long -> DNN_euclidean_v(int: D, long: N) -> float[]
long -> DNN_euclidean_v(int: D, long: N, double: scale) -> float[]
long -> DNN_euclidean_v_series(int: dimensions, long: population, int: k) -> float[][]
This represents an enumerated population of vectors of some dimension, where any ordinal values which address outside of that enumeration simply wrap with modulo.
long -> DNN_euclidean_v_wrap(int: D, long: N, double: scale) -> float[]
long -> DNN_euclidean_v_wrap(int: D, long: N) -> float[]
long -> DnnAngular1V(int: D, long: N, long: M) -> float[]
- notes: @param D Dimensions in each vector @param N The number of vectors in the training set @param M The modulo which is used to construct equivalence classes
Precompute the interior double[] values to use as a LUT.
long -> DoubleArrayCache(io.nosqlbench.virtdata.lib.vectors.primitive.VectorSequence: function) -> double[]
Precompute the interior double[] values to use as a LUT.
long -> DoubleCache(io.nosqlbench.virtdata.lib.vectors.primitive.DoubleSequence: sequence) -> double
Prefix the incoming array with an empty double[] so that it is sized up to at least the given size. If it is already at least that size, pass it through as-is.
double[] -> DoubleVectorPadLeft(int: size) -> double[]
Suffix the incoming array with an empty double[] so that it is sized up to at least the given size. If it is already at least that size, pass it through as-is.
double[] -> DoubleVectorPadRight(int: size) -> double[]
Prefix the incoming array with an empty double[] of the given size.
double[] -> DoubleVectorPrefix(int: size) -> double[]
Suffix the incoming array with an empty double[] of the given size.
double[] -> DoubleVectorSuffix(int: size) -> double[]
This is a version of the NoSQLBench {@link io.nosqlbench.virtdata.library.basics.shared.util.Combiner} which is especially suited to constructing unique sequences of doubles. This can be to create arbitrarily long vectors in double[] form, where each vector corresponds to a specific character encoding. Based on the maximum cardinality of symbol values in each position, a step function on the unit interval is created for you and used as a source of magnitudes.
For example, with a combiner spec of "{@code a-yA-Y*1024}", the "{@code }a-yA-Y" part creates a character set mapping for 50 distinct indexed character values with the letter acting as a code, and then the "{@code *1024}" repeats ths mapping over 1024 digits of values, which are then concatenated into an array of values as a uniquely encoded vector. In actuality, the internal model is computed separately from the character encoding, so is efficient, although the character encoding can be used to uniquely identify each vector.
Note that as with other combiner forms, you can specify a different cardinality for each position, although the automatically computed step function for unit-interval will be based on the largest cardinality. It is not computed separately for each position. Thus, a specifier like "{@code a-z*5;0-9*2}" will only see the last two positions using a fraction of the possible magnitudes, as the a-z element has the most steps at 26 between 0.0 and 1.0.
long -> DoubleVectors(String: spec) -> double[]
notes: Create a radix-mapped vector function based on a spec of character ranges and combinations. @param spec - The string specifier for a symbolic cardinality and symbol model that represents the vector values
Create a sequence of vectors encoding a 10-valued step function over 12 dimensions
Create a sequence of vectors encoding a 2-valued step function over 1024 dimensions
Create a sequence of vectors encoding a 70-valued step function over 512 dimensions
Prefix the incoming array with an empty float[] so that it is sized up to at least the given size. If it is already at least that size, pass it through as-is.
float[] -> FloatVectorPadLeft(int: size) -> float[]
Suffix the incoming array with an empty float[] so that it is sized up to at least the given size. If it is already at least that size, pass it through as-is.
float[] -> FloatVectorPadRight(int: size) -> float[]
Prefix the incoming array with an empty float[] of the given size.
float[] -> FloatVectorPrefix(int: size) -> float[]
Suffix the incoming array with an empty double[] of the given size.
float[] -> FloatVectorSuffix(int: size) -> float[]
Construct an arbitrarily large vector with hashes. The initial value is assumed to be non-hashed, and is thus hashed on input to ensure that inputs are non-contiguous. Once the starting value is hashed, the sequence of long values is walked and each value added to the vector is hashed from the values in that sequence.
long -> HashedDoubleVectors(Object: sizer, Object: valueFunc) -> double[]
- notes: Build a double[] generator with a given size value or size function, and the given long->double function. @param sizer Either a numeric type which sets a fixed dimension, or a long->int function to derive it uniquely for each input @param valueFunc A long->double function
long -> HashedDoubleVectors(Object: sizer, double: min, double: max) -> double[]
long -> HashedDoubleVectors(Object: sizer) -> double[]
Construct an arbitrarily large float vector with hashes. The initial value is assumed to be non-hashed, and is thus hashed on input to ensure that inputs are non-contiguous. Once the starting value is hashed, the sequence of long values is walked and each value added to the vector is hashed from the values in that sequence.
long -> HashedFloatVectors(Object: sizer, Object: valueFunc) -> float[]
- notes: Build a double[] generator with a given size value or size function, and the given long->double function. @param sizer Either a numeric type which sets a fixed dimension, or a long->int function to derive it uniquely for each input @param valueFunc A long->double function
long -> HashedFloatVectors(Object: sizer, double: min, double: max) -> float[]
long -> HashedFloatVectors(Object: sizer) -> float[]
Binding function that accepts a long input value for the cycle and returns a string consisting of the CQL predicate parsed from a single record in an HDF5 dataset
long -> HdfDatasetToCqlPredicates(String: filename, String: datasetname, String: parsername) -> String
- notes: Create a new binding function that accepts a long input value for the cycle and returns a string @param filename @param datasetname @param parsername
long -> HdfDatasetToCqlPredicates(String: filename, String: datasetname) -> String
This function reads a vector dataset from an HDF5 file. The entire dataset is parsed into a single String Object with the discreet values separated by the user supplied separator character. It is intended for use only with small datasets where the entire dataset can be read into memory and there is no need to read individual vectors from the dataset. The lambda function simply returns the String representation of the dataset.
long -> HdfDatasetToString(String: filename, String: dataset, String: separator) -> String
- notes: Create a new binding function that accepts a long input value for the cycle and returns a string representation of the specified dataset @param filename @param dataset @param separator
long -> HdfDatasetToString(String: filename, String: dataset) -> String
This function reads a dataset of any supported type from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value.
long -> HdfDatasetToStrings(String: filename, String: datasetName) -> String
long -> HdfDatasetsToString(String: filename, String: DSNameLeft, String: DSNameRight, String: intraSeparator, String: interSeparator) -> String
This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.
This implementation is specific to returning an array of floats
long -> HdfFileToFloatArray(String: filename, String: datasetName) -> float[]
This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.
This implementation is specific to returning a List of Floats, so as to work with the normalization functions e.g. NormalizeListVector and its variants.
long -> HdfFileToFloatList(String: filename, String: datasetName) -> List<Float>
This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.
This implementation is specific to returning an array of ints
long -> HdfFileToIntArray(String: filename, String: datasetName) -> int[]
This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.
This implementation is specific to returning a List of Integers
long -> HdfFileToIntList(String: filename, String: datasetName) -> List<Integer>
This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.
This implementation is specific to returning an array of longs
long -> HdfFileToLongArray(String: filename, String: datasetName) -> long[]
This function reads a vector dataset from an HDF5 file. The dataset itself is not read into memory, only the metadata (the "dataset" Java Object). The lambda function reads a single vector from the dataset, based on the long input value. As currently written this class will only work for datasets with 2 dimensions where the 1st dimension specifies the number of vectors and the 2nd dimension specifies the number of elements in each vector. Only datatypes short, int, and float are supported at this time.
This implementation is specific to returning a List of Longs
long -> HdfFileToLongList(String: filename, String: datasetName) -> List<Long>
Binding function that accepts a long input value for the cycle and returns a string consisting of the CQL predicate parsed from a single record in an HDF5 dataset
long -> HdfPredicatesToCql(String: filename, String: datasetName, String: serDesType) -> String
- notes: Create a new binding function that accepts a long input value for the cycle and returns a string @param filename The HDF5 file to read the predicate dataset from @param datasetName The name of the dataset internal to the HDF5 file @param serDesType The type of serialization/deserialization to use for the predicate
Normalize a vector in List form, calling the appropriate conversion function depending on the component (Class) type of the incoming List values.
com.datastax.oss.driver.api.core.data.CqlVector -> NormalizeCqlVector() -> com.datastax.oss.driver.api.core.data.CqlVector
Normalize a vector.
List<Double> -> NormalizeDoubleListVector() -> List<Double>
double[] -> NormalizeDoubleVector() -> double[]
Normalize a vector.
List<Float> -> NormalizeFloatListVector() -> List<Float>
float[] -> NormalizeFloatVector() -> float[]
Normalize a vector in List form, calling the appropriate conversion function depending on the component (Class) type of the incoming List values.
List -> NormalizeListVector() -> List
Repeat the incoming list into a new list, filling it to the given size.
List -> RepeatList(int: size) -> List
notes: Create a list repeater to build up a list from a smaller list. @param size - the total size of the new list
repeat the incoming values into a new List of size 50
SequenceOf bindings allow you to specify an order and count of a set of values which will then be repeated in that order. SequenceOf bindings allow you to specify an order and count of a set of values which will then be repeated in that order.
long -> SequenceOf(int: ignored, String: spec) -> int
- notes:
This function produces values from a lookup table for direct control of numerical sequences. The sequence spec is a string containing the sequence values and their occurences, defaulting to 1 each. Example: "1:6 2 3 4 5", which means "1 at a relative frequency of 6 and 2, 3, 4, and 5 at a relative frequency of 1 each. This will yield pattern "1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, ..."
- notes:
Each implementation of {@link SequenceOf} must include a type sigil as the first parameter to disambiguate it from the others.
@param ignored any long value, discarded after signature matching. The exampleValue is thrown away, but is necessary for matching the right version of SequenceOf. @param spec A string of numbers separated by spaces, semicolons, or commas. This is the sequence spec..
SequenceOf(1L,'3:3 2:2 1:1')
Generate sequence 3,3,3,2,2,1
SequenceOf(1L,'1000:99 1000000:1')
Generate sequence 1000 (99 times) and then 1000000 (1 time)
long -> SequenceOf(long: ignored, String: spec) -> long
- notes:
This function produces values from a lookup table for direct control of numerical sequences. The sequence spec is a string containing the sequence values and their occurences, defaulting to 1 each. Example: "1:6 2 3 4 5", which means "1 at a relative frequency of 6 and 2, 3, 4, and 5 at a relative frequency of 1 each. This will yield pattern "1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, ..."
- notes:
Each implementation of {@link SequenceOf} must include a type sigil as the first parameter to disambiguate it from the others.
@param ignored any long value, discarded after signature matching. The exampleValue is thrown away, but is necessary for matching the right version of SequenceOf. @param spec A string of numbers separated by spaces, semicolons, or commas. This is the sequence spec..
- example:
SequenceOf(1L,'3:3 2:2 1:1')
- Generate sequence 3,3,3,2,2,1
- example:
SequenceOf(1L,'1000:99 1000000:1')
- Generate sequence 1000 (99 times) and then 1000000 (1 time)
double[] -> ToCqlVector() -> com.datastax.oss.driver.api.core.data.CqlVector
float[] -> ToCqlVector() -> com.datastax.oss.driver.api.core.data.CqlVector
List -> ToCqlVector() -> com.datastax.oss.driver.api.core.data.CqlVector
double[] -> ToFloatVector() -> float[]
Utility function used for advanced data generation experiments.
int -> TokenMapFileCycle(String: filename, boolean: loopdata, boolean: ascending) -> long
Utility function used for advanced data generation experiments.
int -> TokenMapFileNextCycle(String: filename, boolean: loopdata, boolean: ascending) -> long
Utility function used for advanced data generation experiments.
int -> TokenMapFileNextToken(String: filename, boolean: loopdata, boolean: ascending) -> long
Utility function used for advanced data generation experiments.
int -> TokenMapFileToken(String: filename, boolean: loopdata, boolean: ascending) -> long
Compute a value which increases monotonically with respect to the cycle value. All values for f(X+(m>=0)) will be equal or greater than f(X). In effect, this means that with a sequence of monotonic inputs, the results will be monotonic as well as clustered. The values will approximate input/average, but will vary in frequency around a simple binomial distribution.
The practical effect of this is to be able to compute a sequence of values over inputs which can act as foreign keys, but which are effectively ordered.
Call for Ideas
Due to the complexity of generalizing this as a pure function over other distributions, this is the only function of this type for now. If you are interested in this problem domain and have some suggestions for how to extend it to other distributions, please join the project or let us know.
long -> TriangularStep(long: average, long: variance) -> long
- example:
- Create a sequence of values where the average is 100, but the range of values is between 80 and 120.
- example:
- Create a sequence of values where the average is 80, but the range of values is between 70 and 90.
- example:
long -> TriangularStep(long: average) -> long
Create a vector which consists of a number of uniform vector ranges. Each range is set as [min,max] inclusive by a pair of double values such as 3.0d, 5.0d, ... You may provide an initial integer to set the number of components in the vector. After the initial (optional) size integer, you may provide odd, even pairs of min, max. If a range is not specified for a component which is expected from the size, the it is automatically replaced with a unit interval double variate.
long -> UniformVectorSizedStepped(Number[]...: dims) -> List<Double>
- example:
- create a 3-component vector from unit interval variates
- example:
- create a 2-component vector from the specified uniform ranges [1.0d,100.0d] and [5.0d,6.0d]
- example:
- create a 2-component vector from ranges [3.0d,6.0d] and [0.0d,1.0d]
- example: