MyCaffe.common.CudaDnn< T > Class Template Reference

The CudaDnn object is the main interface to the Low-Level Cuda C++ DLL. More...

Inheritance diagram for MyCaffe.common.CudaDnn< T >:

Public Member Functions

 CudaDnn (int nDeviceID, DEVINIT flags=(DEVINIT.CUBLAS|DEVINIT.CURAND), long? lSeed=null, string strPath="", bool bResetFirst=false, bool bEnableMemoryTrace=false)
 The CudaDnn constructor. More...
 CudaDnn (CudaDnn< T > cuda, bool bEnableGhostMemory)
 Alternate CudaDnn constructor. More...
void Dispose ()
 Disposes this instance freeing up all of its host and GPU memory. More...
void DisableGhostMemory ()
 Disables the ghost memory, if enabled. More...
void ResetGhostMemory ()
 Resets the ghost memory by enabling it if this instance was configured to use ghost memory. More...
void KernelCopy (int nCount, long hSrc, int nSrcOffset, long hDstKernel, long hDst, int nDstOffset, long hHostBuffer, long hHostKernel=-1, long hStream=-1, long hSrcKernel=-1)
 Copy memory from the look-up tables in one kernel to another. More...
void KernelAdd (int nCount, long hA, long hDstKernel, long hB, long hC)
 Add memory from one kernel to memory residing on another kernel. More...
long KernelCopyNccl (long hSrcKernel, long hSrcNccl)
 Copies an Nccl handle from one kernel to the current kernel of the current CudaDnn instance. More...
void SetDeviceID (int nDeviceID=-1, DEVINIT flags=DEVINIT.NONE, long? lSeed=null)
 Set the device ID used by the current instance of CudaDnn. More...
void SetRandomSeed (long lSeed)
 Set the random number generator seed. More...
int GetDeviceID ()
 Returns the current device id set within Cuda. More...
string GetDeviceName (int nDeviceID)
 Query the name of a device. More...
string GetDeviceP2PInfo (int nDeviceID)
 Query the peer-to-peer information of a device. More...
string GetDeviceInfo (int nDeviceID, bool bVerbose=false)
 Query the device information of a device. More...
void ResetDevice ()
 Reset the current device. More...
void SynchronizeDevice ()
 Synchronize the operations on the current device. More...
int GetMultiGpuBoardGroupID (int nDeviceID)
 Query the mutli-gpu board group id for a device. More...
int GetDeviceCount ()
 Query the number of devices (gpu's) installed. More...
bool CheckMemoryAttributes (long hSrc, int nSrcDeviceID, long hDst, int nDstDeviceID)
 Check the memory attributes of two memory blocks on different devices to see if they are compatible for peer-to-peer memory transfers. More...
double GetDeviceMemory (out double dfFree, out double dfUsed, out bool bCudaCallUsed, int nDeviceID=-1)
 Queries the amount of total, free and used memory on a given GPU. More...
string GetRequiredCompute (out int nMinMajor, out int nMinMinor)
 The GetRequiredCompute function returns the Major and Minor compute values required by the current CudaDNN DLL used. More...
bool DeviceCanAccessPeer (int nSrcDeviceID, int nPeerDeviceID)
 Query whether or not two devices can access each other via peer-to-peer memory copies. More...
void DeviceEnablePeerAccess (int nPeerDeviceID)
 Enables peer-to-peer access between the current device used by the CudaDnn instance and a peer device. More...
void DeviceDisablePeerAccess (int nPeerDeviceID)
 Disables peer-to-peer access between the current device used by the CudaDnn instance and a peer device. More...
long AllocMemory (List< double > rg)
 Allocate a block of GPU memory and copy a list of doubles to it. More...
long AllocMemory (List< float > rg)
 Allocate a block of GPU memory and copy a list of floats to it. More...
long AllocMemory (double[] rgSrc, long hStream=0)
 Allocate a block of GPU memory and copy an array of doubles to it, optionally using a stream for the copy. More...
long AllocMemory (float[] rgSrc, long hStream=0)
 Allocate a block of GPU memory and copy an array of float to it, optionally using a stream for the copy. More...
long AllocMemory (T[] rgSrc, long hStream=0, bool bHalfSize=false)
 Allocate a block of GPU memory and copy an array of type 'T' to it, optionally using a stream for the copy. More...
long AllocMemory (long lCapacity, bool bHalfSize=false)
 Allocate a block of GPU memory with a specified capacity. More...
void FreeMemory (long hMem)
 Free previously allocated GPU memory. More...
void CopyDeviceToHost (long lCount, long hGpuSrc, long hHostDst)
 Copy from GPU memory to Host memory. More...
void CopyHostToDevice (long lCount, long hHostSrc, long hGpuDst)
 Copy from Host memory to GPU memory. More...
long AllocHostBuffer (long lCapacity)
 Allocate a block of host memory with a specified capacity. More...
void FreeHostBuffer (long hMem)
 Free previously allocated host memory. More...
long GetHostBufferCapacity (long hMem)
 Returns the host memory capacity. More...
double[] GetHostMemoryDouble (long hMem)
 Retrieves the host memory as an array of doubles. More...
float[] GetHostMemoryFloat (long hMem)
 Retrieves the host memory as an array of floats. More...
T[] GetHostMemory (long hMem)
 Retrieves the host memory as an array of type 'T' More...
double[] GetMemoryDouble (long hMem, long lCount=-1)
 Retrieves the GPU memory as an array of doubles. More...
float[] GetMemoryFloat (long hMem, long lCount=-1)
 Retrieves the GPU memory as an array of float. More...
T[] GetMemory (long hMem, long lCount=-1)
 Retrieves the GPU memory as an array of type 'T' More...
void SetMemory (long hMem, List< double > rg)
 Copies a list of doubles into a block of already allocated GPU memory. More...
void SetMemory (long hMem, List< float > rg)
 Copies a list of float into a block of already allocated GPU memory. More...
void SetMemory (long hMem, double[] rgSrc, long hStream=0)
 Copies an array of double into a block of already allocated GPU memory. More...
void SetMemory (long hMem, float[] rgSrc, long hStream=0)
 Copies an array of float into a block of already allocated GPU memory. More...
void SetMemory (long hMem, T[] rgSrc, long hStream=0, int nCount=-1)
 Copies an array of type 'T' into a block of already allocated GPU memory. More...
void SetMemoryAt (long hMem, double[] rgSrc, int nOffset)
 Copies an array of double into a block of already allocated GPU memory starting at a specific offset. More...
void SetMemoryAt (long hMem, float[] rgSrc, int nOffset)
 Copies an array of float into a block of already allocated GPU memory starting at a specific offset. More...
void SetMemoryAt (long hMem, T[] rgSrc, int nOffset)
 Copies an array of type 'T' into a block of already allocated GPU memory starting at a specific offset. More...
T[] SetPixel (long hMem, int nCount, bool bReturnOriginal, int nOffset, params Tuple< int, T >[] rgPixel)
 Set a pixel value where each pixel is defined a set index, value tuple. More...
void SetHostMemory (long hMem, T[] rgSrc)
 Copies an array of type 'T' into a block of already allocated host memory. More...
long CreateMemoryPointer (long hData, long lOffset, long lCount)
 Creates a memory pointer into an already existing block of GPU memory. More...
void FreeMemoryPointer (long hData)
 Frees a memory pointer. More...
long CreateMemoryTest (out ulong ulTotalNumBlocks, out double dfMemAllocatedInGB, out ulong ulMemStartAddr, out ulong ulBlockSize, double dfPctToAllocate=1.0)
 Creates a new memory test on the current GPU. More...
void FreeMemoryTest (long h)
 Free a memory test, freeing up all GPU memory used. More...
T[] RunMemoryTest (long h, MEMTEST_TYPE type, ulong ulBlockStartOffset, ulong ulBlockCount, bool bVerbose, bool bWrite, bool bReadWrite, bool bRead)
 The RunMemoryTest method runs the memory test from the block start offset through the block count on the memory previously allocated using CreateMemoryTest. More...
long CreateImageOp (int nNum, double dfBrightnessProb, double dfBrightnessDelta, double dfContrastProb, double dfContrastLower, double dfContrastUpper, double dfSaturationProb, double dfSaturationLower, double dfSaturationUpper, long lRandomSeed=0)
 Create a new ImageOp used to perform image operations on the GPU. More...
void FreeImageOp (long h)
 Free an image op, freeing up all GPU memory used. More...
void DistortImage (long h, int nCount, int nNum, int nDim, long hX, long hY)
 Distort an image using the ImageOp handle provided. More...
long CreateStream (bool bNonBlocking=false, int nIndex=-1)
 Create a new stream on the current GPU. More...
void FreeStream (long h)
 Free a stream. More...
void SynchronizeStream (long h=0)
 Synchronize a stream on the current GPU, waiting for its operations to complete. More...
void SynchronizeThread ()
 Synchronize all kernel threads on the current GPU. More...
long CreateCuDNN (long hStream=0)
 Create a new instance of NVIDIA's cuDnn. More...
void FreeCuDNN (long h)
 Free an instance of cuDnn. More...
long CreateNCCL (int nDeviceId, int nCount, int nRank, Guid guid)
 Create an instance of NVIDIA's NCCL 'Nickel' More...
void FreeNCCL (long hNccl)
 Free an instance of NCCL. More...
void NcclInitializeSingleProcess (params long[] rghNccl)
 Initializes a set of NCCL instances for use in a single process. More...
void NcclInitializeMultiProcess (long hNccl)
 Initializes a set of NCCL instances for use in different processes. More...
void NcclBroadcast (long hNccl, long hStream, long hX, int nCount)
 Broadcasts a block of GPU data to all NCCL instances. More...
void NcclAllReduce (long hNccl, long hStream, long hX, int nCount, NCCL_REDUCTION_OP op, double dfScale=1.0)
 Performs a reduction on all NCCL instances as specified by the reduction operation. More...
long CreateExtension (string strExtensionDllPath)
 Create an instance of an Extension DLL. More...
void FreeExtension (long hExtension)
 Free an instance of an Extension. More...
T[] RunExtension (long hExtension, long lfnIdx, T[] rgParam)
 Run a function on the extension specified. More...
long CreateTensorDesc ()
 Create a new instance of a tensor descriptor for use with NVIDIA's cuDnn. More...
void FreeTensorDesc (long h)
 Free a tensor descriptor instance. More...
void SetTensorNdDesc (long hHandle, int[] rgDim, int[] rgStride, bool bHalf=false)
 Sets the values of a tensor descriptor. More...
void SetTensorDesc (long hHandle, int n, int c, int h, int w, bool bHalf=false)
 Sets the values of a tensor descriptor. More...
void SetTensorDesc (long hHandle, int n, int c, int h, int w, int nStride, int cStride, int hStride, int wStride, bool bHalf=false)
 Sets the values of a tensor descriptor. More...
void AddTensor (long hCuDnn, long hSrcDesc, long hSrc, int nSrcOffset, long hDstDesc, long hDst, int nDstOffset)
 Add two tensors together. More...
void AddTensor (long hCuDnn, T fAlpha, long hSrcDesc, long hSrc, int nSrcOffset, T fBeta, long hDstDesc, long hDst, int nDstOffset)
 Add two tensors together. More...
long CreateFilterDesc ()
 Create a new instance of a filter descriptor for use with NVIDIA's cuDnn. More...
void FreeFilterDesc (long h)
 Free a filter descriptor instance. More...
void SetFilterNdDesc (long hHandle, int[] rgDim, bool bHalf=false)
 Sets the values of a filter descriptor. More...
void SetFilterDesc (long hHandle, int n, int c, int h, int w, bool bHalf=false)
 Sets the values of a filter descriptor. More...
long CreateConvolutionDesc ()
 Create a new instance of a convolution descriptor for use with NVIDIA's cuDnn. More...
void FreeConvolutionDesc (long h)
 Free a convolution descriptor instance. More...
void SetConvolutionDesc (long hHandle, int hPad, int wPad, int hStride, int wStride, int hDilation, int wDilation, bool bUseTensorCores, bool bHalf=false)
 Set the values of a convolution descriptor. More...
void GetConvolutionInfo (long hCuDnn, long hBottomDesc, long hFilterDesc, long hConvDesc, long hTopDesc, ulong lWorkspaceSizeLimitInBytes, bool bUseTensorCores, out CONV_FWD_ALGO algoFwd, out ulong lWsSizeFwd, out CONV_BWD_FILTER_ALGO algoBwdFilter, out ulong lWsSizeBwdFilter, out CONV_BWD_DATA_ALGO algoBwdData, out ulong lWsSizeBwdData, CONV_FWD_ALGO preferredFwdAlgo=CONV_FWD_ALGO.NONE)
 Queryies the algorithms and workspace sizes used for a given convolution descriptor. More...
void ConvolutionForward (long hCuDnn, long hBottomDesc, long hBottomData, int nBottomOffset, long hFilterDesc, long hWeight, int nWeightOffset, long hConvDesc, CONV_FWD_ALGO algoFwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hTopDesc, long hTopData, int nTopOffset, bool bSyncStream=true)
 Perform a convolution forward pass. More...
void ConvolutionForward (long hCuDnn, T fAlpha, long hBottomDesc, long hBottomData, int nBottomOffset, long hFilterDesc, long hWeight, int nWeightOffset, long hConvDesc, CONV_FWD_ALGO algoFwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hTopDesc, long hTopData, int nTopOffset, bool bSyncStream=true)
 Perform a convolution forward pass. More...
void ConvolutionBackwardBias (long hCuDnn, long hTopDesc, long hTopDiff, int nTopOffset, long hBiasDesc, long hBiasDiff, int nBiasOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the bias. More...
void ConvolutionBackwardBias (long hCuDnn, T fAlpha, long hTopDesc, long hTopDiff, int nTopOffset, T fBeta, long hBiasDesc, long hBiasDiff, int nBiasOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the bias. More...
void ConvolutionBackwardFilter (long hCuDnn, long hBottomDesc, long hBottomData, int nBottomOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_FILTER_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hFilterDesc, long hWeightDiff, int nWeightOffset, bool bSyncStream)
 Perform a convolution backward pass on the filter. More...
void ConvolutionBackwardFilter (long hCuDnn, T fAlpha, long hBottomDesc, long hBottomData, int nBottomOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_FILTER_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hFilterDesc, long hWeightDiff, int nWeightOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the filter. More...
void ConvolutionBackwardData (long hCuDnn, long hFilterDesc, long hWeight, int nWeightOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_DATA_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hBottomDesc, long hBottomDiff, int nBottomOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the data. More...
void ConvolutionBackwardData (long hCuDnn, T fAlpha, long hFilterDesc, long hWeight, int nWeightOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_DATA_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hBottomDesc, long hBottomDiff, int nBottomOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the data. More...
long CreatePoolingDesc ()
 Create a new instance of a pooling descriptor for use with NVIDIA's cuDnn. More...
void FreePoolingDesc (long h)
 Free a pooling descriptor instance. More...
void SetPoolingDesc (long hHandle, PoolingMethod method, int h, int w, int hPad, int wPad, int hStride, int wStride)
 Set the values of a pooling descriptor. More...
void PoolingForward (long hCuDnn, long hPoolingDesc, T fAlpha, long hBottomDesc, long hBottomData, T fBeta, long hTopDesc, long hTopData)
 Perform a pooling forward pass. More...
void PoolingBackward (long hCuDnn, long hPoolingDesc, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a pooling backward pass. More...
void DeriveBatchNormDesc (long hFwdScaleBiasMeanVarDesc, long hFwdBottomDesc, long hBwdScaleBiasMeanVarDesc, long hBwdBottomDesc, BATCHNORM_MODE mode)
 Derive the batch norm descriptors for both the forward and backward passes. More...
void BatchNormForward (long hCuDnn, BATCHNORM_MODE mode, T fAlpha, T fBeta, long hFwdBottomDesc, long hBottomData, long hFwdTopDesc, long hTopData, long hFwdScaleBiasMeanVarDesc, long hScaleData, long hBiasData, double dfFactor, long hGlobalMean, long hGlobalVar, double dfEps, long hSaveMean, long hSaveInvVar, bool bTraining)
 Run the batch norm forward pass. More...
void BatchNormBackward (long hCuDnn, BATCHNORM_MODE mode, T fAlphaDiff, T fBetaDiff, T fAlphaParamDiff, T fBetaParamDiff, long hBwdBottomDesc, long hBottomData, long hTopDiffDesc, long hTopDiff, long hBottomDiffDesc, long hBottomDiff, long hBwdScaleBiasMeanVarDesc, long hScaleData, long hScaleDiff, long hBiasDiff, double dfEps, long hSaveMean, long hSaveInvVar)
 Run the batch norm backward pass. More...
long CreateDropoutDesc ()
 Create a new instance of a dropout descriptor for use with NVIDIA's cuDnn. More...
void FreeDropoutDesc (long h)
 Free a dropout descriptor instance. More...
void SetDropoutDesc (long hCuDnn, long hDropoutDesc, double dfDropout, long hStates, long lSeed)
 Set the dropout descriptor values. More...
void GetDropoutInfo (long hCuDnn, long hBottomDesc, out ulong ulStateCount, out ulong ulReservedCount)
 Query the dropout state and reserved counts. More...
void DropoutForward (long hCuDnn, long hDropoutDesc, long hBottomDesc, long hBottomData, long hTopDesc, long hTopData, long hReserved)
 Performs a dropout forward pass. More...
void DropoutBackward (long hCuDnn, long hDropoutDesc, long hTopDesc, long hTop, long hBottomDesc, long hBottom, long hReserved)
 Performs a dropout backward pass. More...
long CreateLRNDesc ()
 Create a new instance of a LRN descriptor for use with NVIDIA's cuDnn. More...
void FreeLRNDesc (long h)
 Free a LRN descriptor instance. More...
void SetLRNDesc (long hHandle, uint nSize, double fAlpha, double fBeta, double fK)
 Set the LRN descriptor values. More...
void LRNCrossChannelForward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDesc, long hBottomData, T fBeta, long hTopDesc, long hTopData)
 Perform LRN cross channel forward pass. More...
void LRNCrossChannelBackward (long hCuDnn, long hNormDesc, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform LRN cross channel backward pass. More...
void DivisiveNormalizationForward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDataDesc, long hBottomData, long hTemp1, long hTemp2, T fBeta, long hTopDataDesc, long hTopData)
 Performs a Devisive Normalization forward pass. More...
void DivisiveNormalizationBackward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDataDesc, long hBottomData, long hTopDiff, long hTemp1, long hTemp2, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Performs a Devisive Normalization backward pass. More...
void TanhForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a Tanh forward pass. More...
void TanhBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a Tanh backward pass. More...
void EluForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a Elu forward pass. More...
void EluBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a Elu backward pass. More...
void SigmoidForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a Sigmoid forward pass. More...
void SigmoidBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a Sigmoid backward pass. More...
void ReLUForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a ReLU forward pass. More...
void ReLUBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a ReLU backward pass. More...
void SoftmaxForward (long hCuDnn, SOFTMAX_ALGORITHM alg, SOFTMAX_MODE mode, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a Softmax forward pass. More...
void SoftmaxBackward (long hCuDnn, SOFTMAX_ALGORITHM alg, SOFTMAX_MODE mode, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a Softmax backward pass. More...
long CreateRnnDataDesc ()
 Create the RNN Data Descriptor. More...
void FreeRnnDataDesc (long h)
 Free an existing RNN Data descriptor. More...
void SetRnnDataDesc (long hRnnDataDesc, RNN_DATALAYOUT layout, int nMaxSeqLen, int nBatchSize, int nVectorSize, bool bBidirectional=false, int[] rgSeqLen=null)
 Sets the RNN Data Descriptor values. More...
long CreateRnnDesc ()
 Create the RNN Descriptor. More...
void FreeRnnDesc (long h)
 Free an existing RNN descriptor. More...
void SetRnnDesc (long hCuDnn, long hRnnDesc, int nHiddenCount, int nNumLayers, long hDropoutDesc, RNN_MODE mode, bool bUseTensorCores, RNN_DIRECTION direction=RNN_DIRECTION.RNN_UNIDIRECTIONAL)
 Sets the RNN Descriptor values. More...
int GetRnnParamCount (long hCuDnn, long hRnnDesc, long hXDesc)
 Returns the RNN parameter count. More...
ulong GetRnnWorkspaceCount (long hCuDnn, long hRnnDesc, long hXDesc, out ulong nReservedCount)
 Returns the workspace and reserved counts. More...
void GetRnnLinLayerParams (long hCuDnn, long hRnnDesc, int nLayer, long hXDesc, long hWtDesc, long hWtData, int nLinLayer, out int nWtCount, out long hWt, out int nBiasCount, out long hBias)
 Returns the linear layer parameters (weights). More...
void RnnForward (long hCuDnn, long hRnnDesc, long hXDesc, long hXData, long hHxDesc, long hHxData, long hCxDesc, long hCxData, long hWtDesc, long hWtData, long hYDesc, long hYData, long hHyDesc, long hHyData, long hCyDesc, long hCyData, long hWorkspace, ulong nWsCount, long hReserved, ulong nResCount, bool bTraining)
 Run the RNN through a forward pass. More...
void RnnBackwardData (long hCuDnn, long hRnnDesc, long hYDesc, long hYData, long hYDiff, long hHyDesc, long hHyDiff, long hCyDesc, long hCyDiff, long hWtDesc, long hWtData, long hHxDesc, long hHxData, long hCxDesc, long hCxData, long hXDesc, long hXDiff, long hdHxDesc, long hHxDiff, long hdCxDesc, long hCxDiff, long hWorkspace, ulong nWsCount, long hReserved, ulong nResCount)
 Run the RNN backward pass through the data. More...
void RnnBackwardWeights (long hCuDnn, long hRnnDesc, long hXDesc, long hXData, long hHxDesc, long hHxData, long hYDesc, long hYData, long hWorkspace, ulong nWsCount, long hWtDesc, long hWtDiff, long hReserved, ulong nResCount)
 Run the RNN backward pass on the weights. More...
bool IsRnn8Supported ()
 Returns whether or not RNN8 is supported. More...
long CreateRnn8 ()
 Create the RNN8. More...
void FreeRnn8 (long h)
 Free an existing RNN8. More...
void SetRnn8 (long hCuDnn, long hRnn, bool bTraining, RNN_DATALAYOUT layout, RNN_MODE cellMode, RNN_BIAS_MODE biasMode, int nSequenceLen, int nBatchSize, int nInputs, int nHidden, int nOutputs, int nProjection, int nNumLayers, float fDropout, ulong lSeed, bool bBidirectional=false)
 Set the RNN8 parameters. More...
void GetRnn8MemorySizes (long hCuDnn, long hRnn, out ulong szWtCount, out ulong szWorkSize, out ulong szReservedSize)
 Returns the memory sizes required for the RNN8. More...
void InitializeRnn8Weights (long hCuDnn, long hRnn, long hWt, RNN_FILLER_TYPE wtFt, double fWtVal, double fWtVal2, RNN_FILLER_TYPE biasFt, double fBiasVal, double fBiasVal2)
 Initialize the RNN8 weights More...
void Rnn8Forward (long hCuDnn, long hRnn, long hX, long hY, long hhX, long hhY, long hcX, long hcY, long hWts, long hWork, long hReserved)
 Calculate the forward pass through the RNN8. More...
void Rnn8Backward (long hCuDnn, long hRnn, long hY, long hdY, long hX, long hdX, long hhX, long hdhY, long hdhX, long hcX, long hdcY, long hdcX, long hWt, long hdWt, long hWork, long hReserved)
 Calculate the backward pass through the RNN8 for both data and weights. More...
long AllocPCAData (int nM, int nN, int nK, out int nCount)
 Allocates the GPU memory for the PCA Data. More...
long AllocPCAScores (int nM, int nN, int nK, out int nCount)
 Allocates the GPU memory for the PCA scores. More...
long AllocPCALoads (int nM, int nN, int nK, out int nCount)
 Allocates the GPU memory for the PCA loads. More...
long AllocPCAEigenvalues (int nM, int nN, int nK, out int nCount)
 Allocates the GPU memory for the PCA eigenvalues. More...
long CreatePCA (int nMaxIterations, int nM, int nN, int nK, long hData, long hScoresResult, long hLoadsResult, long hResiduals=0, long hEigenvalues=0)
 Creates a new PCA instance and returns the handle to it. More...
bool RunPCA (long hPCA, int nSteps, out int nCurrentK, out int nCurrentIteration)
 Runs a number of steps of the iterative PCA algorithm. More...
void FreePCA (long hPCA)
 Free the PCA instance associated with handle. More...
long CreateSSD (int nNumClasses, bool bShareLocation, int nLocClasses, int nBackgroundLabelId, bool bUseDiffcultGt, SSD_MINING_TYPE miningType, SSD_MATCH_TYPE matchType, float fOverlapThreshold, bool bUsePriorForMatching, SSD_CODE_TYPE codeType, bool bEncodeVariantInTgt, bool bBpInside, bool bIgnoreCrossBoundaryBbox, bool bUsePriorForNms, SSD_CONF_LOSS_TYPE confLossType, SSD_LOC_LOSS_TYPE locLossType, float fNegPosRatio, float fNegOverlap, int nSampleSize, bool bMapObjectToAgnostic, bool bNmsParam, float? fNmsThreshold=null, int? nNmsTopK=null, float? fNmsEta=null)
 Create an instance of the SSD GPU support. More...
void SetupSSD (long hSSD, int nNum, int nNumPriors, int nNumGt)
 Setup the SSD GPU support. More...
void FreeSSD (long hSSD)
 Free the instance of SSD GPU support. More...
int SsdMultiBoxLossForward (long hSSD, int nLocDataCount, long hLocGpuData, int nConfDataCount, long hConfGpuData, int nPriorDataCount, long hPriorGpuData, int nGtDataCount, long hGtGpuData, out List< DictionaryMap< List< int > > > rgAllMatchIndices, out List< List< int > > rgrgAllNegIndices, out int nNumNegs)
 Performs the SSD MultiBoxLoss forward operation. More...
void SsdEncodeLocPrediction (long hSSD, int nLocPredCount, long hLocPred, int nLocGtCount, long hLocGt)
 Encodes the SSD data into the location prediction and location ground truths. More...
void SsdEncodeConfPrediction (long hSSD, int nConfPredCount, long hConfPred, int nConfGtCount, long hConfGt)
 Encodes the SSD data into the confidence prediction and confidence ground truths. More...
long CreateLayerNorm (int nGpuID, int nCount, int nOuterNum, int nChannels, int nInnerNum, float fEps=1e-10f)
 Create the Cuda version of LayerNorm More...
void FreeLayerNorm (long hLayerNorm)
 Free the instance of LayerNorm GPU support. More...
void LayerNormForward (long hLayerNorm, long hXdata, long hYdata)
 Run the LayerNorm forward pass. More...
void LayerNormBackward (long hLayerNorm, long hYdata, long hYdiff, long hXdiff)
 Run the LayerNorm backward pass. More...
void set (int nCount, long hHandle, double fVal, int nIdx=-1)
 Set the values of GPU memory to a specified value of type More...
void set (int nCount, long hHandle, float fVal, int nIdx=-1)
 Set the values of GPU memory to a specified value of type More...
void set (int nCount, long hHandle, T fVal, int nIdx=-1, int nXOff=0)
 Set the values of GPU memory to a specified value of type 'T'. More...
double[] get_double (int nCount, long hHandle, int nIdx=-1)
 Queries the GPU memory by copying it into an array of More...
float[] get_float (int nCount, long hHandle, int nIdx=-1)
 Queries the GPU memory by copying it into an array of More...
T[] get (int nCount, long hHandle, int nIdx=-1)
 Queries the GPU memory by copying it into an array of type 'T'. More...
void copy (int nCount, long hSrc, long hDst, int nSrcOffset=0, int nDstOffset=0, long hStream=-1, bool? bSrcHalfSizeOverride=null, bool? bDstHalfSizeOverride=null)
 Copy data from one block of GPU memory to another. More...
void copy (int nCount, int nNum, int nDim, long hSrc1, long hSrc2, long hDst, long hSimilar, bool bInvert=false)
 Copy similar items of length 'nDim' from hSrc1 (where hSimilar(i) = 1) and dissimilar items of length 'nDim' from hSrc2 (where hSimilar(i) = 0). More...
void copy_batch (int nCount, int nNum, int nDim, long hSrcData, long hSrcLbl, int nDstCount, long hDstCache, long hWorkDevData, int nLabelStart, int nLabelCount, int nCacheSize, long hCacheHostCursors, long hWorkDataHost)
 Copy a batch of labeled items into a cache organized by label where older data is removed and replaced by newer data. More...
void copy_sequence (int nK, int nNum, int nDim, long hSrcData, long hSrcLbl, int nSrcCacheCount, long hSrcCache, int nLabelStart, int nLabelCount, int nCacheSize, long hCacheHostCursors, bool bOutputLabels, List< long > rghTop, List< int > rgnTopCount, long hWorkDataHost, bool bCombinePositiveAndNegative=false, int nSeed=0)
 Copy a sequence of cached items, organized by label, into an anchor, positive (if nK > 0), and negative blobs. More...
void copy_sequence (int n, long hSrc, int nSrcStep, int nSrcStartIdx, int nCopyCount, int nCopyDim, long hDst, int nDstStep, int nDstStartIdx, int nSrcSpatialDim, int nDstSpatialDim, int nSrcSpatialDimStartIdx=0, int nDstSpatialDimStartIdx=0, int nSpatialDimCount=-1)
 Copy a sequence from a source to a destination and allow for skip steps. More...
void copy_expand (int n, int nNum, int nDim, long hX, long hA)
 Expand a vector of length 'nNum' into a matrix of size 'nNum' x 'nDim' by copying each value of the vector into all elements of the corresponding matrix row. More...
void fill (int n, int nDim, long hSrc, int nSrcOff, int nCount, long hDst)
 Fill data from the source data 'n' times in the destination. More...
void sort (int nCount, long hY)
 Sort the data in the GPU memory specified. More...
void gemm (bool bTransA, bool bTransB, int m, int n, int k, double fAlpha, long hA, long hB, double fBeta, long hC)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
void gemm (bool bTransA, bool bTransB, int m, int n, int k, float fAlpha, long hA, long hB, float fBeta, long hC)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
void gemm (bool bTransA, bool bTransB, int m, int n, int k, T fAlpha, long hA, long hB, T fBeta, long hC, int nAOffset=0, int nBOffset=0, int nCOffset=0, int nGroups=1, int nGroupOffsetA=0, int nGroupOffsetB=0, int nGroupOffsetC=0)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
void gemm (bool bTransA, bool bTransB, int m, int n, int k, double fAlpha, long hA, long hB, double fBeta, long hC, uint lda, uint ldb, uint ldc)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
void gemm (bool bTransA, bool bTransB, int m, int n, int k, double fAlpha, long hA, long hB, double fBeta, long hC, uint lda, uint ldb, uint ldc, uint stridea, uint strideb, uint stridec, uint batch_count)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
void geam (bool bTransA, bool bTransB, int m, int n, double fAlpha, long hA, long hB, double fBeta, long hC)
 Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B) More...
void geam (bool bTransA, bool bTransB, int m, int n, float fAlpha, long hA, long hB, float fBeta, long hC)
 Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B) More...
void geam (bool bTransA, bool bTransB, int m, int n, T fAlpha, long hA, long hB, T fBeta, long hC, int nAOffset=0, int nBOffset=0, int nCOffset=0)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
void gemv (bool bTransA, int m, int n, double fAlpha, long hA, long hX, double fBeta, long hY)
 Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...
void gemv (bool bTransA, int m, int n, float fAlpha, long hA, long hX, float fBeta, long hY)
 Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...
void gemv (bool bTransA, int m, int n, T fAlpha, long hA, long hX, T fBeta, long hY, int nAOffset=0, int nXOffset=0, int nYOffset=0)
 Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...
void ger (int m, int n, double fAlpha, long hX, long hY, long hA)
 Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...
void ger (int m, int n, float fAlpha, long hX, long hY, long hA)
 Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...
void ger (int m, int n, T fAlpha, long hX, long hY, long hA)
 Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...
void axpy (int n, double fAlpha, long hX, long hY)
 Multiply the vector X by a scalar and add the result to the vector Y. More...
void axpy (int n, float fAlpha, long hX, long hY)
 Multiply the vector X by a scalar and add the result to the vector Y. More...
void axpy (int n, T fAlpha, long hX, long hY, int nXOff=0, int nYOff=0)
 Multiply the vector X by a scalar and add the result to the vector Y. More...
void axpby (int n, double fAlpha, long hX, double fBeta, long hY)
 Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y. More...
void axpby (int n, float fAlpha, long hX, float fBeta, long hY)
 Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y. More...
void axpby (int n, T fAlpha, long hX, T fBeta, long hY)
 Scale the vector x by Alpha and scale vector y by Beta and then add both together. More...
void mulbsx (int n, long hA, int nAOff, long hX, int nXOff, int nC, int nSpatialDim, bool bTranspose, long hB, int nBOff)
 Multiply a matrix with a vector. More...
void divbsx (int n, long hA, int nAOff, long hX, int nXOff, int nC, int nSpatialDim, bool bTranspose, long hB, int nBOff)
 Divide a matrix by a vector. More...
void matmul (uint nOuterCount, int m, int n, int k, long hA, long hB, long hC, double dfScale=1.0, bool bTransA=false, bool bTransB=false)
 Perform matmul operation hC = matmul(hA, hB), where hA, hB and hC are all in row-major format. More...
void transposeHW (int n, int c, int h, int w, long hSrc, long hDst)
 Transpose a n*c number of matrices along the height and width dimensions. All matrices are in row-major format. More...
void set_bounds (int n, double dfMin, double dfMax, long hX)
 Set the bounds of all items within the data to a set range of values. More...
void scal (int n, double fAlpha, long hX, int nXOff=0)
 Scales the data in X by a scaling factor. More...
void scal (int n, float fAlpha, long hX, int nXOff=0)
 Scales the data in X by a scaling factor. More...
void scal (int n, T fAlpha, long hX, int nXOff=0)
 Scales the data in X by a scaling factor. More...
double dot_double (int n, long hX, long hY)
 Computes the dot product of X and Y. More...
float dot_float (int n, long hX, long hY)
 Computes the dot product of X and Y. More...
dot (int n, long hX, long hY, int nXOff=0, int nYOff=0)
 Computes the dot product of X and Y. More...
double asum_double (int n, long hX, int nXOff=0)
 Computes the sum of absolute values in X. More...
float asum_float (int n, long hX, int nXOff=0)
 Computes the sum of absolute values in X. More...
asum (int n, long hX, int nXOff=0)
 Computes the sum of absolute values in X. More...
void scale (int n, double fAlpha, long hX, long hY)
 Scales the values in X and places them in Y. More...
void scale (int n, float fAlpha, long hX, long hY)
 Scales the values in X and places them in Y. More...
void scale (int n, T fAlpha, long hX, long hY, int nXOff=0, int nYOff=0)
 Scales the values in X and places them in Y. More...
void scale_to_range (int n, long hX, long hY, double fMin, double fMax)
 Scales the values in X and places the result in Y (can also run inline where X = Y). More...
double erf (double dfVal)
 Calculates the erf() function. More...
float erf (float fVal)
 Calculates the erf() function. More...
erf (T fVal)
 Calculates the erf() function. More...
void mask (int n, int nMaskDim, T fSearch, T fReplace, long hX, long hMask, long hY)
 Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...
void mask (int n, int nMaskDim, double fSearch, double fReplace, long hX, long hMask, long hY)
 Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...
void mask (int n, int nMaskDim, float fSearch, float fReplace, long hX, long hMask, long hY)
 Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...
void mask_batch (int n, int nBatch, int nMaskDim, T fSearch, T fReplace, long hX, long hMask, long hY)
 Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...
void mask_batch (int n, int nBatch, int nMaskDim, double fSearch, double fReplace, long hX, long hMask, long hY)
 Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...
void mask_batch (int n, int nBatch, int nMaskDim, float fSearch, float fReplace, long hX, long hMask, long hY)
 Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...
void interp2 (int nChannels, long hData1, int nX1, int nY1, int nHeight1, int nWidth1, int nHeight1A, int nWidth1A, long hData2, int nX2, int nY2, int nHeight2, int nWidth2, int nHeight2A, int nWidth2A, bool bBwd=false)
 Interpolates between two sizes within the spatial dimensions. More...
void add_scalar (int n, double fAlpha, long hY)
 Adds a scalar value to each element of Y. More...
void add_scalar (int n, float fAlpha, long hY)
 Adds a scalar value to each element of Y. More...
void add_scalar (int n, T fAlpha, long hY, int nYOff=0)
 Adds a scalar value to each element of Y. More...
void add (int n, long hA, long hB, long hC, long hY)
 Adds A, B and C and places the result in Y. More...
void add (int n, long hA, long hB, long hY)
 Adds A to B and places the result in Y. More...
void add (int n, long hA, long hB, long hY, double dfAlpha)
 Adds A to (B times scalar) and places the result in Y. More...
void add (int n, long hA, long hB, long hY, float fAlpha)
 Adds A to (B times scalar) and places the result in Y. More...
void add (int n, long hA, long hB, long hY, double dfAlphaA, double dfAlphaB, int nAOff=0, int nBOff=0, int nYOff=0)
 Adds A to (B times scalar) and places the result in Y. More...
void sub (int n, long hA, long hB, long hY, int nAOff=0, int nBOff=0, int nYOff=0, int nB=0)
 Subtracts B from A and places the result in Y. More...
void mul (int n, long hA, long hB, long hY, int nAOff=0, int nBOff=0, int nYOff=0)
 Multiplies each element of A with each element of B and places the result in Y. More...
void sub_and_dot (int n, int nN, int nInnerNum, long hA, long hB, long hY, int nAOff, int nBOff, int nYOff)
 Subtracts every nInnterNum element of B from A and performs a dot product on the result. More...
void mul_scalar (int n, double fAlpha, long hY)
 Mutlipy each element of Y by a scalar. More...
void mul_scalar (int n, float fAlpha, long hY)
 Mutlipy each element of Y by a scalar. More...
void mul_scalar (int n, T fAlpha, long hY)
 Mutlipy each element of Y by a scalar. More...
void div (int n, long hA, long hB, long hY)
 Divides each element of A by each element of B and places the result in Y. More...
void abs (int n, long hA, long hY)
 Calculates the absolute value of A and places the result in Y. More...
void exp (int n, long hA, long hY)
 Calculates the exponent value of A and places the result in Y. More...
void exp (int n, long hA, long hY, int nAOff, int nYOff, double dfBeta)
 Calculates the exponent value of A * beta and places the result in Y. More...
void log (int n, long hA, long hY)
 Calculates the log value of A and places the result in Y. More...
void log (int n, long hA, long hY, double dfBeta, double dfAlpha=0)
 Calculates the log value of (A * beta) + alpha, and places the result in Y. More...
void powx (int n, long hA, double fAlpha, long hY, int nAOff=0, int nYOff=0)
 Calculates the A raised to the power alpha and places the result in Y. More...
void powx (int n, long hA, float fAlpha, long hY, int nAOff=0, int nYOff=0)
 Calculates the A raised to the power alpha and places the result in Y. More...
void powx (int n, long hA, T fAlpha, long hY, int nAOff=0, int nYOff=0)
 Calculates the A raised to the power alpha and places the result in Y. More...
void sign (int n, long hX, long hY, int nXOff=0, int nYOff=0)
 Computes the sign of each element of X and places the result in Y. More...
void sqrt (int n, long hX, long hY)
 Computes the square root of each element of X and places the result in Y. More...
void sqrt_scale (int nCount, long hX, long hY)
 Scale the data by the sqrt of the data. y = sqrt(abs(x)) * sign(x) More...
void compare_signs (int n, long hA, long hB, long hY)
 Compares the signs of each value in A and B and places the result in Y. More...
void max (int n, long hA, long hB, long hY)
 Calculates the max of A and B and places the result in Y. This max is only computed on a per item basis, so the shape of Y = the shape of A and B and Y(0) contains the max of A(0) and B(0), etc. More...
void max_bwd (int n, long hAdata, long hBdata, long hYdiff, long hAdiff, long hBdiff)
 Propagates the Y diff back to the max of A or B and places the result in A if its data has the max, or B if its data has the max. More...
void min (int n, long hA, long hB, long hY)
 Calculates the min of A and B and places the result in Y. This min is only computed on a per item basis, so the shape of Y = the shape of A and B and Y(0) contains the min of A(0) and B(0), etc. More...
double max (int n, long hA, out long lPos, int nAOff=0, long hWork=0)
 Finds the maximum value of A. More...
double min (int n, long hA, out long lPos, int nAOff=0, long hWork=0)
 Finds the minimum value of A. More...
Tuple< double, double, double, double > minmax (int n, long hA, long hWork1, long hWork2, bool bDetectNans=false, int nAOff=0)
 Finds the minimum and maximum values within A. More...
void minmax (int n, long hA, long hWork1, long hWork2, int nK, long hMin, long hMax, bool bNonZeroOnly)
 Finds up to 'nK' minimum and maximum values within A. More...
void transpose (int n, long hX, long hY, long hXCounts, long hYCounts, long hMapping, int nNumAxes, long hBuffer)
 Perform a transpose on X producing Y, similar to the numpy.transpose operation. More...
double sumsq (int n, long hW, long hA, int nAOff=0)
 Calculates the sum of squares of A. More...
double sumsqdiff (int n, long hW, long hA, long hB, int nAOff=0, int nBOff=0)
 Calculates the sum of squares of differences between A and B More...
void width (int n, long hMean, long hMin, long hMax, double dfAlpha, long hWidth)
 Calculates the width values. More...
bool contains_point (int n, long hMean, long hWidth, long hX, long hWork, int nXOff=0)
 Returns true if the point is contained within the bounds. More...
void denan (int n, long hX, double dfReplacement)
 Replaces all NAN values witin X with a replacement value. More...
void im2col (long hDataIm, int nDataImOffset, int nChannels, int nHeight, int nWidth, int nKernelH, int nKernelW, int nPadH, int nPadW, int nStrideH, int nStrideW, int nDilationH, int nDilationW, long hDataCol, int nDataColOffset)
 Rearranges image blocks into columns. More...
void im2col_nd (long hDataIm, int nDataImOffset, int nNumSpatialAxes, int nImCount, int nChannelAxis, long hImShape, long hColShape, long hKernelShape, long hPad, long hStride, long hDilation, long hDataCol, int nDataColOffset)
 Rearranges image blocks into columns. More...
void col2im (long hDataCol, int nDataColOffset, int nChannels, int nHeight, int nWidth, int nKernelH, int nKernelW, int nPadH, int nPadW, int nStrideH, int nStrideW, int nDilationH, int nDilationW, long hDataIm, int nDataImOffset)
 Rearranges the columns into image blocks. More...
void col2im_nd (long hDataCol, int nDataColOffset, int nNumSpatialAxes, int nColCount, int nChannelAxis, long hImShape, long hColShape, long hKernelShape, long hPad, long hStride, long hDilation, long hDataIm, int nDataImOffset)
 Rearranges the columns into image blocks. More...
void channel_min (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, bool bReturnIdx=false)
 Calculates the minimum value within each channel of X and places the result in Y. More...
void channel_max (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, bool bReturnIdx=false)
 Calculates the maximum value within each channel of X and places the result in Y. More...
void channel_mean (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Calculates the mean value of each channel of X and places the result in Y. More...
void channel_compare (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Compares the values of the channels from X and places the result in Y where 1 is set if the values are equal otherwise 0 is set. More...
void channel_fillfrom (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, DIR dir)
 Fills each channel with the the values stored in Src data where the X data continains nOuterNum x nChannels of data, (e.g. one item per channel) that is then copied to all nInnerNum elements of each channel in Y More...
void channel_fill (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, int nLabelDim, long hLabels, long hY)
 Fills each channel with the channel item of Y with the data of X matching the label index specified by hLabels. More...
void channel_sub (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hA, long hX, long hY)
 Subtracts the values across the channels of X from A and places the result in Y. More...
void channel_sub (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Subtracts the values across the channels from X and places the result in Y. More...
void channel_sum (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, bool bSumAcrossChannels=true, DIR dir=DIR.FWD, int nChannelsY=-1)
 Calculates the sum the the values either across or within each channel (depending on bSumAcrossChannels setting) of X and places the result in Y. More...
void channel_div (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, int nMethod=1)
 Divides the values of the channels from X and places the result in Y. More...
void channel_mul (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, int nMethod=1)
 Multiplies the values of the channels from X and places the result in Y. More...
void channel_mulv (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hA, long hX, long hC)
 Multiplies the values in vector X by each channel in matrix A and places the result in matrix C. More...
void channel_scale (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hA, long hY)
 Multiplies the values of the channels from X with the scalar values in B and places the result in Y. More...
void channel_dot (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hA, long hY)
 Calculates the dot product the the values within each channel of X and places the result in Y. More...
void channel_duplicate (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Duplicates each channel 'nInnerNum' of times in the destination. More...
void channel_percentile (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, double dfPercentile)
 Calculates the percentile along axis = 0. More...
void channel_op_fwd (OP op, int nCount, int nC, int nN1, int nSD1, int nN2, int nSD2, long hA, long hB, long hY)
 Performs a channel operation forward on the data. More...
void channel_op_bwd (OP op, int nCount, int nC, int nN1, int nSD1, int nN2, int nSD2, int nCy, int nSDy, long hA, long hB, long hY, long hAd, long hBd, long hYd, long hWork)
 Performs a channel operation backward on the data. More...
void channel_add (int nCount, int nOuterNum, int nChannels, int nBlocks, int nInnerNum, int nOffset, long hX, long hY, DIR dir)
 Add data along channels similar to numpy split function but where the data is added instead of copied. More...
void channel_copy (int nCount, int nOuterNum, int nChannels, int nBlocks, int nInnerNum, int nOffset, long hX, long hY, DIR dir)
 Copy data along channels similar to numpy split function. More...
void channel_copyall (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Copy all data from X (shape 1,c,sd) to each num in Y (shape n,c,sd). More...
void sum (int nCount, int nOuterNum, int nInnerNum, long hX, long hY)
 Calculates the sum of inner values of X and places the result in Y. More...
void rng_setseed (long lSeed)
 Sets the random number generator seed used by random number operations. More...
void rng_uniform (int n, double fMin, double fMax, long hY)
 Fill Y with random numbers using a uniform random distribution. More...
void rng_uniform (int n, float fMin, float fMax, long hY)
 Fill Y with random numbers using a uniform random distribution. More...
void rng_uniform (int n, T fMin, T fMax, long hY)
 Fill Y with random numbers using a uniform random distribution. More...
void rng_gaussian (int n, double fMu, double fSigma, long hY)
 Fill Y with random numbers using a gaussian random distribution. More...
void rng_gaussian (int n, float fMu, float fSigma, long hY)
 Fill Y with random numbers using a gaussian random distribution. More...
void rng_gaussian (int n, T fMu, T fSigma, long hY)
 Fill Y with random numbers using a gaussian random distribution. More...
void rng_bernoulli (int n, double fNonZeroProb, long hY)
 Fill Y with random numbers using a bernoulli random distribution. More...
void rng_bernoulli (int n, float fNonZeroProb, long hY)
 Fill Y with random numbers using a bernoulli random distribution. More...
void rng_bernoulli (int n, T fNonZeroProb, long hY)
 Fill Y with random numbers using a bernoulli random distribution. More...
void accuracy_fwd (int nCount, int nOuterNum, int nInnerNum, long hBottomData, long hBottomLabel, long hAccData, long hAccTotals, int? nIgnoreLabel, bool bLastElementOnly, int nBatch)
 Performs the forward pass for the accuracy layer More...
void batchreidx_fwd (int nCount, int nInnerDim, long hBottomData, long hPermutData, long hTopData)
 Performs the forward pass for batch re-index More...
void batchreidx_bwd (int nCount, int nInnerDim, long hTopDiff, long hTopIdx, long hBegins, long hCounts, long hBottomDiff)
 Performs the backward pass for batch re-index More...
void embed_fwd (int nCount, long hBottomData, long hWeight, int nM, int nN, int nK, long hTopData)
 Performs the forward pass for embed More...
void embed_bwd (int nCount, long hBottomData, long hTopDiff, int nM, int nN, int nK, long hWeightDiff)
 Performs the backward pass for embed More...
void pooling_fwd (POOLING_METHOD method, int nCount, long hBottomData, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hTopData, long hMask, long hTopMask)
 Performs the forward pass for pooling using Cuda More...
void pooling_bwd (POOLING_METHOD method, int nCount, long hTopDiff, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hBottomDiff, long hMask, long hTopMask)
 Performs the backward pass for pooling using Cuda More...
void unpooling_fwd (POOLING_METHOD method, int nCount, long hBottomData, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hTopData, long hMask)
 Performs the forward pass for unpooling using Cuda More...
void unpooling_bwd (POOLING_METHOD method, int nCount, long hTopDiff, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hBottomDiff, long hMask)
 Performs the backward pass for unpooling using Cuda More...
void clip_fwd (int nCount, long hBottomData, long hTopData, T fMin, T fMax)
 Performs a Clip forward pass in Cuda. More...
void clip_bwd (int nCount, long hTopDiff, long hBottomData, long hBottomDiff, T fMin, T fMax)
 Performs a Clip backward pass in Cuda. More...
void math_fwd (int nCount, long hBottomData, long hTopData, MATH_FUNCTION function)
 Performs a Math function forward pass in Cuda. More...
void math_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, MATH_FUNCTION function)
 Performs a Math function backward pass in Cuda. More...
void mean_error_loss_bwd (int nCount, long hPredicted, long hTarget, long hBottomDiff, MEAN_ERROR merr)
 Performs a Mean Error Loss backward pass in Cuda. More...
void mish_fwd (int nCount, long hBottomData, long hTopData, double dfThreshold)
 Performs a Mish forward pass in Cuda. More...
void mish_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, double dfThreshold, int nMethod=0)
 Performs a Mish backward pass in Cuda. More...
void gelu_fwd (int nCount, long hBottomData, long hTopData, bool bEnableBertVersion)
 Performs a GELU forward pass in Cuda. More...
void gelu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, bool bEnableBertVersion)
 Performs a GELU backward pass in Cuda. More...
void silu_fwd (int nCount, long hBottomData, long hTopData)
 Performs the Sigmoid-weighted Linear Unit (SiLU) activation forward pass in Cuda. More...
void silu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData)
 Performs the Sigmoid-weighted Linear Unit (SiLU) activation backward pass in Cuda. More...
void softplus_fwd (int nCount, long hBottomData, long hTopData)
 Performs the Softplus function forward, a smooth approximation of the ReLU function More...
void softplus_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData)
 Performs the Softplus function backward, a smooth approximation of the ReLU function More...
void lecun_fwd (int nCount, long hBottomData, long hTopData)
 Performs the LeCun's Tanh function forward More...
void lecun_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData)
 Performs the LeCun's Tanh function backward More...
void serf_fwd (int nCount, long hBottomData, long hTopData, double dfThreshold)
 Performs a Serf forward pass in Cuda. More...
void serf_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, double dfThreshold)
 Performs a Serf backward pass in Cuda. More...
void tanh_fwd (int nCount, long hBottomData, long hTopData)
 Performs a TanH forward pass in Cuda. More...
void tanh_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff)
 Performs a TanH backward pass in Cuda. More...
void sigmoid_fwd (int nCount, long hBottomData, long hTopData)
 Performs a Sigmoid forward pass in Cuda. More...
void sigmoid_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff)
 Performs a Sigmoid backward pass in Cuda. More...
void swish_bwd (int nCount, long hTopDiff, long hTopData, long hSigmoidOutputData, long hBottomDiff, double dfBeta)
 Performs a Swish backward pass in Cuda. More...
void relu_fwd (int nCount, long hBottomData, long hTopData, T fNegativeSlope)
 Performs a Rectifier Linear Unit (ReLU) forward pass in Cuda. More...
void relu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, T fNegativeSlope)
 Performs a Rectifier Linear Unit (ReLU) backward pass in Cuda. More...
void elu_fwd (int nCount, long hBottomData, long hTopData, double dfAlpha)
 Performs a Exponential Linear Unit (ELU) forward pass in Cuda. More...
void elu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomData, long hBottomDiff, double dfAlpha)
 Performs a Exponential Linear Unit (ELU) backward pass in Cuda. More...
void dropout_fwd (int nCount, long hBottomData, long hMask, uint uiThreshold, T fScale, long hTopData)
 Performs a dropout forward pass in Cuda. More...
void dropout_bwd (int nCount, long hTopDiff, long hMask, uint uiThreshold, T fScale, long hBottomDiff)
 Performs a dropout backward pass in Cuda. More...
void bnll_fwd (int nCount, long hBottomData, long hTopData)
 Performs a binomial normal log liklihod (BNLL) forward pass in Cuda. More...
void bnll_bwd (int nCount, long hTopDiff, long hBottomData, long hBottomDiff)
 Performs a binomial normal log liklihod (BNLL) backward pass in Cuda. More...
void prelu_fwd (int nCount, int nChannels, int nDim, long hBottomData, long hTopData, long hSlopeData, int nDivFactor)
 Performs Parameterized Rectifier Linear Unit (ReLU) forward pass in Cuda. More...
void prelu_bwd_param (int nCDim, int nNum, int nTopOffset, long hTopDiff, long hBottomData, long hBackBuffDiff)
 Performs Parameterized Rectifier Linear Unit (ReLU) backward param pass in Cuda. More...
void prelu_bwd (int nCount, int nChannels, int nDim, long hTopDiff, long hBottomData, long hBottomDiff, long hSlopeData, int nDivFactor)
 Performs Parameterized Rectifier Linear Unit (ReLU) backward pass in Cuda. More...
void softmaxloss_fwd (int nCount, long hProbData, long hLabel, long hLossData, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
 Performs Softmax Loss forward pass in Cuda. More...
void softmaxloss_bwd (int nCount, long hTopData, long hLabel, long hBottomDiff, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
 Performs Softmax Loss backward pass in Cuda. More...
void nllloss_fwd (int nCount, long hProbData, long hLabel, long hLossData, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
 Performs NLL Loss forward pass in Cuda. More...
void nllloss_bwd (int nCount, long hTopData, long hLabel, long hBottomDiff, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
 Performs NLL Loss backward pass in Cuda. More...
void max_fwd (int nCount, long hBottomDataA, long hBottomDataB, int nIdx, long hTopData, long hMask)
 Performs a max forward pass in Cuda. More...
void max_bwd (int nCount, long hTopDiff, int nIdx, long hMask, long hBottomDiff)
 Performs a max backward pass in Cuda. More...
void min_fwd (int nCount, long hBottomDataA, long hBottomDataB, int nIdx, long hTopData, long hMask)
 Performs a min forward pass in Cuda. More...
void min_bwd (int nCount, long hTopDiff, int nIdx, long hMask, long hBottomDiff)
 Performs a min backward pass in Cuda. More...
void crop_fwd (int nCount, int nNumAxes, long hSrcStrides, long hDstStrides, long hOffsets, long hBottomData, long hTopData)
 Performs the crop forward operation. More...
void crop_bwd (int nCount, int nNumAxes, long hSrcStrides, long hDstStrides, long hOffsets, long hBottomDiff, long hTopDiff)
 Performs the crop backward operation. More...
void concat_fwd (int nCount, long hBottomData, int nNumConcats, int nConcatInputSize, int nTopConcatAxis, int nBottomConcatAxis, int nOffsetConcatAxis, long hTopData)
 Performs a concat forward pass in Cuda. More...
void concat_bwd (int nCount, long hTopDiff, int nNumConcats, int nConcatInputSize, int nTopConcatAxis, int nBottomConcatAxis, int nOffsetConcatAxis, long hBottomDiff)
 Performs a concat backward pass in Cuda. More...
void slice_fwd (int nCount, long hBottomData, int nNumSlices, int nSliceSize, int nBottomSliceAxis, int nTopSliceAxis, int nOffsetSliceAxis, long hTopData)
 Performs a slice forward pass in Cuda. More...
void slice_bwd (int nCount, long hTopDiff, int nNumSlices, int nSliceSize, int nBottomSliceAxis, int nTopSliceAxis, int nOffsetSliceAxis, long hBottomDiff)
 Performs a slice backward pass in Cuda. More...
void tile_fwd (int nCount, long hBottomData, int nInnerDim, int nTiles, int nBottomTileAxis, long hTopData)
 Performs a tile forward pass in Cuda. More...
void tile_bwd (int nCount, long hTopDiff, int nTileSize, int nTiles, int nBottomTileAxis, long hBottomDiff)
 Performs a tile backward pass in Cuda. More...
void bias_fwd (int nCount, long hBottomData, long hBiasData, int nBiasDim, int nInnerDim, long hTopData)
 Performs a bias forward pass in Cuda. More...
void scale_fwd (int nCount, long hX, long hScaleData, int nScaleDim, int nInnerDim, long hY, long hBiasData=0)
 Performs a scale forward pass in Cuda. More...
void threshold_fwd (int nCount, double dfThreshold, long hX, long hY)
 Performs a threshold pass in Cuda. More...
void cll_bwd (int nCount, int nChannels, double dfMargin, bool bLegacyVersion, double dfAlpha, long hY, long hDiff, long hDistSq, long hBottomDiff)
 Performs a contrastive loss layer backward pass in Cuda. More...
void smoothl1_fwd (int nCount, long hX, long hY)
 Performs the forward operation for the SmoothL1 loss. More...
void smoothl1_bwd (int nCount, long hX, long hY)
 Performs the backward operation for the SmoothL1 loss. More...
void permute (int nCount, long hBottom, bool bFwd, long hPermuteOrder, long hOldSteps, long hNewSteps, int nNumAxes, long hTop)
 Performs data permutation on the input and reorders the data which is placed in the output. More...
void gather_fwd (int nCount, long hBottom, long hTop, int nAxis, int nDim, int nDimAtAxis, int nM, int nN, long hIdx)
 Performs a gather forward pass where data at specifies indexes along a given axis are copied to the output data. More...
void gather_bwd (int nCount, long hTop, long hBottom, int nAxis, int nDim, int nDimAtAxis, int nM, int nN, long hIdx)
 Performs a gather backward pass where data at specifies indexes along a given axis are copied to the output data. More...
void lrn_fillscale (int nCount, long hBottomData, int nNum, int nChannels, int nHeight, int nWidth, int nSize, T fAlphaOverSize, T fK, long hScaleData)
 Performs the fill scale operation used to calculate the LRN cross channel forward pass in Cuda. More...
void lrn_computeoutput (int nCount, long hBottomData, long hScaleData, T fNegativeBeta, long hTopData)
 Computes the output used to calculate the LRN cross channel forward pass in Cuda. More...
void lrn_computediff (int nCount, long hBottomData, long hTopData, long hScaleData, long hTopDiff, int nNum, int nChannels, int nHeight, int nWidth, int nSize, T fNegativeBeta, T fCacheRatio, long hBottomDiff)
 Computes the diff used to calculate the LRN cross channel backward pass in Cuda. More...
void sgd_update (int nCount, long hNetParamsDiff, long hHistoryData, T fMomentum, T fLocalRate)
 Perform the Stochastic Gradient Descent (SGD) update More...
void nesterov_update (int nCount, long hNetParamsDiff, long hHistoryData, T fMomentum, T fLocalRate)
 Perform the Nesterov update More...
void adagrad_update (int nCount, long hNetParamsDiff, long hHistoryData, T fDelta, T fLocalRate)
 Perform the AdaGrad update More...
void adadelta_update (int nCount, long hNetParamsDiff, long hHistoryData1, long hHistoryData2, T fMomentum, T fDelta, T fLocalRate)
 Perform the AdaDelta update More...
void adam_update (int nCount, long hNetParamsDiff, long hValM, long hValV, T fBeta1, T fBeta2, T fEpsHat, T fLearningRate, T fCorrection)
 Perform the Adam update More...
void adamw_update (int nCount, long hNetParamsDiff, long hValM, long hValV, T fBeta1, T fBeta2, T fEpsHat, T fLearningRate, T fDecayRate, long hNetParamsData, int nStep)
 Perform the AdamW update More...
void rmsprop_update (int nCount, long hNetParamsDiff, long hHistoryData, T fRmsDecay, T fDelta, T fLocalRate)
 Perform the RMSProp update More...
void lstm_fwd (int t, int nN, int nH, int nI, long hWeight_h, long hWeight_i, long hClipData, int nClipOffset, long hTopData, int nTopOffset, long hCellData, int nCellOffset, long hPreGateData, int nPreGateOffset, long hGateData, int nGateOffset, long hHT1Data, int nHT1Offset, long hCT1Data, int nCT1Offset, long hHtoGateData, long hContext=0, long hWeight_c=0, long hCtoGetData=0)
 Peforms the simple LSTM foward pass in Cuda. More...
void lstm_bwd (int t, int nN, int nH, int nI, double dfClippingThreshold, long hWeight_h, long hClipData, int nClipOffset, long hTopDiff, int nTopOffset, long hCellData, long hCellDiff, int nCellOffset, long hPreGateDiff, int nPreGateOffset, long hGateData, long hGateDiff, int nGateOffset, long hCT1Data, int nCT1Offset, long hDHT1Diff, int nDHT1Offset, long hDCT1Diff, int nDCT1Offset, long hHtoHData, long hContextDiff=0, long hWeight_c=0)
 Peforms the simple LSTM backward pass in Cuda. More...
void lstm_unit_fwd (int nCount, int nHiddenDim, int nXCount, long hX, long hX_acts, long hC_prev, long hCont, long hC, long hH)
 Peforms the simple LSTM foward pass in Cuda for a given LSTM unit. More...
void lstm_unit_bwd (int nCount, int nHiddenDim, int nXCount, long hC_prev, long hX_acts, long hC, long hH, long hCont, long hC_diff, long hH_diff, long hC_prev_diff, long hX_acts_diff, long hX_diff)
 Peforms the simple LSTM backward pass in Cuda for a given LSTM unit. More...
void coeff_sum_fwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hBottom, long hTop)
 Performs a coefficient sum foward pass in Cuda. More...
void coeff_sum_bwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hTopDiff, long hBottomDiff)
 Performs a coefficient sum backward pass in Cuda. More...
void coeff_sub_fwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hBottom, long hTop)
 Performs a coefficient sub foward pass in Cuda. More...
void coeff_sub_bwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hTopDiff, long hBottomDiff)
 Performs a coefficient sub backward pass in Cuda. More...
void sigmoid_cross_entropy_fwd (int nCount, long hInput, long hTarget, long hLoss, bool bHasIgnoreLabel, int nIgnoreLabel, long hCountData)
 Performs a sigmoid cross entropy forward pass in Cuda. More...
void sigmoid_cross_entropy_bwd (int nCount, int nIgnoreLabel, long hTarget, long hBottomDiff)
 Performs a sigmoid cross entropy backward pass in Cuda when an ignore label is specified. More...
void softmax_cross_entropy_fwd (int nCount, long hProbData, long hLabel, long hLossDiff, long hLossData, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
 Performs a softmax cross entropy forward pass in Cuda. More...
void softmax_cross_entropy_bwd (int nCount, int nIgnoreLabel, long hTarget, long hBottomDiff)
 Performs a softmax cross entropy backward pass in Cuda when an ignore label is specified. More...
void debug ()
 The debug function is uses only during debugging the debug version of the low-level DLL. More...
void matrix_meancenter_by_column (int nWidth, int nHeight, long hA, long hB, long hY, bool bNormalize=false)
 Mean center the data by columns, where each column is summed and then subtracted from each column value. More...
void gaussian_blur (int n, int nChannels, int nHeight, int nWidth, double dfSigma, long hX, long hY)
 The gaussian_blur runs a Gaussian blurring operation over each channel of the data using the sigma. More...
double hamming_distance (int n, double dfThreshold, long hA, long hB, long hY, int nOffA=0, int nOffB=0, int nOffY=0)
 The hamming_distance calculates the Hamming Distance between X and Y both of length n. More...
void calc_dft_coefficients (int n, long hX, int m, long hY)
 Calculates the discrete Fourier Transform (DFT) coefficients across the frequencies 1...n/2 (Nyquest Limit) for the array of values in host memory referred to by hA. Return values are placed in the host memory referenced by hY. More...
double[] calculate_batch_distances (DistanceMethod distMethod, double dfThreshold, int nItemDim, long hSrc, long hTargets, long hWork, int[,] rgOffsets)
 The calculate_batch_distances method calculates a set of distances based on the DistanceMethod specified. More...
void ReportMemory (Log log, string strLocation)
 Report the memory use on the current GPU managed by the CudaDnn object. More...

Static Public Member Functions

static string GetCudaDnnDllPath ()
 Returns the path to the CudaDnnDll module to use for low level CUDA processing. More...
static void SetDefaultCudaPath (string strPath)
 Used to optionally set the default path to the Low-Level Cuda Dnn DLL file. More...
static ulong basetype_size (bool bUseHalfSize)
 Returns the base type size in bytes. More...
static ulong ConvertByteSizeToCount (ulong ulSizeInBytes)
 Converts the byte size into the number of items in the base data type of float or double. More...

Protected Member Functions

virtual void Dispose (bool bDisposing)
 Disposes this instance freeing up all of its host and GPU memory. More...


ulong TotalMemoryUsed [get]
 Returns the total amount of GPU memory used by this instance. More...
string TotalMemoryUsedAsText [get]
 Returns the total amount of memory used. More...
long KernelHandle [get]
 Returns the Low-Level kernel handle used for this instance. Each Low-Level kernel maintains its own set of look-up tables for memory, streams, cuDnn constructs, etc. More...
string Path [get]
 Specifies the file path used to load the Low-Level Cuda DNN Dll file. More...
static string DefaultPath [get]
 Specifies the default path used t load the Low-Level Cuda DNN Dll file. More...
int OriginalDeviceID [get]
 Returns the original device ID used to create the instance of CudaDnn. More...
static ulong BaseSize [get]
 Returns the base data type size (e.g. float= 4, double = 8). More...

Detailed Description

The CudaDnn object is the main interface to the Low-Level Cuda C++ DLL.

This is the transition location where C# meets C++.

Template Parameters
TSpecifies the base type float or double. Using float is recommended to conserve GPU memory.

Definition at line 968 of file CudaDnn.cs.

Constructor & Destructor Documentation

◆ CudaDnn() [1/2]

MyCaffe.common.CudaDnn< T >.CudaDnn ( int  nDeviceID,
long?  lSeed = null,
string  strPath = "",
bool  bResetFirst = false,
bool  bEnableMemoryTrace = false 

The CudaDnn constructor.

nDeviceIDSpecifies the zero-based device (GPU) id. Note, if there are 5 GPU's in the system, the device ID's will be numbered 0, 1, 2, 3, 4.
flagsSpecifies the flags under which to initialize the Low-Level Cuda system.
lSeedOptionally specifies the random number generator seed. Typically this is only used during testing.
strPathSpecifies the file path of the Low-Level Cuda DNN Dll file. When NULL or empty, the Low-Level
file in the directory of the currently executing process (that is using the CudaDnn object) is used.
bResetFirstSpecifies to reset the device before initialzing. IMPORTANT: It is only recommended to set this to
when testing.
bEnableMemoryTraceOptionally, specifies to enable the memory tracing (only supported in debug mode and dramatically slows down processing).

Definition at line 1488 of file CudaDnn.cs.

◆ CudaDnn() [2/2]

MyCaffe.common.CudaDnn< T >.CudaDnn ( CudaDnn< T >  cuda,
bool  bEnableGhostMemory 

Alternate CudaDnn constructor.

cudaSpecifies an already created CudaDn instance. The internal Cuda Control of this instance is used by the new instance.
bEnableGhostMemorySpecifies to enable the ghost memory used to estimate GPU memory usage without allocating any GPU memory.

Definition at line 1587 of file CudaDnn.cs.

Member Function Documentation

◆ abs()

void MyCaffe.common.CudaDnn< T >.abs ( int  n,
long  hA,
long  hY 

Calculates the absolute value of A and places the result in Y.

Y = abs(X)

nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7437 of file CudaDnn.cs.

◆ accuracy_fwd()

void MyCaffe.common.CudaDnn< T >.accuracy_fwd ( int  nCount,
int  nOuterNum,
int  nInnerNum,
long  hBottomData,
long  hBottomLabel,
long  hAccData,
long  hAccTotals,
int?  nIgnoreLabel,
bool  bLastElementOnly,
int  nBatch 

Performs the forward pass for the accuracy layer

nCountSpecifies the number of items.
nOuterNumSpecifies the outer count.
nInnerNumSpecifies the inner count.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomLabelSpecifies a handle to the bottom labels in GPU memory.
hAccDataSpecifies a handle to temporary accuracy correct items in GPU memory.
hAccTotalsSpecifies a handle to the temporary accuracy totals in GPU memory.
nIgnoreLabelOptionally, specifies a label to igore.
bLastElementOnlyOptionally specifies to only test the last element in each set.
nBatchOptionally specifies the batch size.

Definition at line 8700 of file CudaDnn.cs.

◆ adadelta_update()

void MyCaffe.common.CudaDnn< T >.adadelta_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData1,
long  hHistoryData2,

Perform the AdaDelta update

See ADADELTA: An Adaptive Learning Rate Method by Zeiler, 2012

nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryData1Specifies a handle to history data in GPU memory.
hHistoryData2Specifies a handle to history data in GPU memory.
fMomentumSpecifies the momentum to use.
fDeltaSpecifies the numerical stability factor.
fLocalRateSpecifies the local learning rate.

Definition at line 10264 of file CudaDnn.cs.

◆ adagrad_update()

void MyCaffe.common.CudaDnn< T >.adagrad_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData,

Perform the AdaGrad update

See Adaptive Subgradient Methods for Online Learning and Stochastic Optimization by Duchi, et al., 2011

nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryDataSpecifies a handle to the history data in GPU memory.
fDeltaSpecifies the numerical stability factor.
fLocalRateSpecifies the local learning rate.

Definition at line 10243 of file CudaDnn.cs.

◆ adam_update()

void MyCaffe.common.CudaDnn< T >.adam_update ( int  nCount,
long  hNetParamsDiff,
long  hValM,
long  hValV,

Perform the Adam update

See Adam: A Method for Stochastic Optimization by Kingma, et al., 2014

nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hValMFirst moment.
hValVSecond moment.
fBeta1Momentum for first moment.
fBeta2Momentum for second moment.
fEpsHatSmall value used to avoid Nan.
fLearningRateLearning rate.
fCorrectionCorrection where Local Learning Rate = 'fCorrection' * 'fLearningRate'

Definition at line 10287 of file CudaDnn.cs.

◆ adamw_update()

void MyCaffe.common.CudaDnn< T >.adamw_update ( int  nCount,
long  hNetParamsDiff,
long  hValM,
long  hValV,
long  hNetParamsData,
int  nStep 

Perform the AdamW update

See also
Decoupled Weight Decay Regularization by Loshchilov, I. and Hutter, F., 2019. See Adam: A Method for Stochastic Optimization by Kingma, et al., 2014
nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hValMFirst moment.
hValVSecond moment.
fBeta1Momentum for first moment.
fBeta2Momentum for second moment.
fEpsHatSmall value used to avoid Nan.
fLearningRateLearning rate.
fDecayRateOptionally, enable detached weight decay for AdamW optimization using this decay rate (when 0, Adam update is used).
hNetParamsDataOptionally, specifies the net params weight data (used when fDecayRate != 0)
nStepOptionally, specifies the current step - used with AdamW optimization updates.

Definition at line 10313 of file CudaDnn.cs.

◆ add() [1/5]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hC,
long  hY 

Adds A, B and C and places the result in Y.

Y = A + B + C

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hCSpecifies a handle to the vector C in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7209 of file CudaDnn.cs.

◆ add() [2/5]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hY 

Adds A to B and places the result in Y.

Y = A + B

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7227 of file CudaDnn.cs.

◆ add() [3/5]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hY,
double  dfAlpha 

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dfAlphaSpecifies a scalar int type

Definition at line 7246 of file CudaDnn.cs.

◆ add() [4/5]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hY,
double  dfAlphaA,
double  dfAlphaB,
int  nAOff = 0,
int  nBOff = 0,
int  nYOff = 0 

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dfAlphaASpecifies a scalar int type 'T' applied to A.
dfAlphaBSpecifies a scalar int type 'T' applied to B.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A.
nBOffOptionally, specifies an offset (in items, not bytes) into the memory of B.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 7288 of file CudaDnn.cs.

◆ add() [5/5]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hY,
float  fAlpha 

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
fAlphaSpecifies a scalar int type

Definition at line 7265 of file CudaDnn.cs.

◆ add_scalar() [1/3]

void MyCaffe.common.CudaDnn< T >.add_scalar ( int  n,
double  fAlpha,
long  hY 

Adds a scalar value to each element of Y.

Y = Y + alpha

nSpecifies the number of items (not bytes) in the vector Y.
fAlphaSpecifies the scalar value in type
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7161 of file CudaDnn.cs.

◆ add_scalar() [2/3]

void MyCaffe.common.CudaDnn< T >.add_scalar ( int  n,
float  fAlpha,
long  hY 

Adds a scalar value to each element of Y.

Y = Y + alpha

nSpecifies the number of items (not bytes) in the vector Y.
fAlphaSpecifies the scalar value in type
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7175 of file CudaDnn.cs.

◆ add_scalar() [3/3]

void MyCaffe.common.CudaDnn< T >.add_scalar ( int  n,
long  hY,
int  nYOff = 0 

Adds a scalar value to each element of Y.

Y = Y + alpha

nSpecifies the number of items (not bytes) in the vector Y.
fAlphaSpecifies the scalar value in type 'T'.
hYSpecifies a handle to the vector Y in GPU memory.
nYOffOptionally, specifies an offset into Y. The default is 0.

Definition at line 7190 of file CudaDnn.cs.

◆ AddTensor() [1/2]

void MyCaffe.common.CudaDnn< T >.AddTensor ( long  hCuDnn,
long  hSrcDesc,
long  hSrc,
int  nSrcOffset,
long  hDstDesc,
long  hDst,
int  nDstOffset 

Add two tensors together.

hCuDnnSpecifies a handle to the cuDnn instance.
hSrcDescSpecifies a handle to the source tensor descriptor.
hSrcSpecifies a handle to the source GPU memory.
nSrcOffsetSpecifies an offset within the GPU memory.
hDstDescSpecifies a handle to the destination tensor descriptor.
hDstSpecifies a handle to the desination GPU memory.
nDstOffsetSpecifies an offset within the GPU memory.

Definition at line 3638 of file CudaDnn.cs.

◆ AddTensor() [2/2]

void MyCaffe.common.CudaDnn< T >.AddTensor ( long  hCuDnn,
long  hSrcDesc,
long  hSrc,
int  nSrcOffset,
long  hDstDesc,
long  hDst,
int  nDstOffset 

Add two tensors together.

hCuDnnSpecifies a handle to the cuDnn instance.
fAlphaSpecifies a scaling factor applied to the source GPU memory before the add.
hSrcDescSpecifies a handle to the source tensor descriptor.
hSrcSpecifies a handle to the source GPU memory.
nSrcOffsetSpecifies an offset within the GPU memory.
fBetaSpecifies a scaling factor applied to the destination GPU memory before the add.
hDstDescSpecifies a handle to the destination tensor descriptor.
hDstSpecifies a handle to the desination GPU memory.
nDstOffsetSpecifies an offset within the GPU memory.

Definition at line 3655 of file CudaDnn.cs.

◆ AllocHostBuffer()

long MyCaffe.common.CudaDnn< T >.AllocHostBuffer ( long  lCapacity)

Allocate a block of host memory with a specified capacity.

lCapacitySpecifies the capacity to allocate (in items, not bytes).
The handle to the host memory is returned.

Definition at line 2581 of file CudaDnn.cs.

◆ AllocMemory() [1/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( double[]  rgSrc,
long  hStream = 0 

Allocate a block of GPU memory and copy an array of doubles to it, optionally using a stream for the copy.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

rgSrcSpecifies an array of doubles to copy to the GPU.
hStreamOptionally specifies a stream to use for the copy.
The handle to the GPU memory is returned.

Definition at line 2314 of file CudaDnn.cs.

◆ AllocMemory() [2/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( float[]  rgSrc,
long  hStream = 0 

Allocate a block of GPU memory and copy an array of float to it, optionally using a stream for the copy.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

rgSrcSpecifies an array of float to copy to the GPU.
hStreamOptionally specifies a stream to use for the copy.
The handle to the GPU memory is returned.

Definition at line 2326 of file CudaDnn.cs.

◆ AllocMemory() [3/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( List< double >  rg)

Allocate a block of GPU memory and copy a list of doubles to it.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

rgSpecifies a list of doubles to copy to the GPU.
The handle to the GPU memory is returned.

Definition at line 2291 of file CudaDnn.cs.

◆ AllocMemory() [4/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( List< float >  rg)

Allocate a block of GPU memory and copy a list of floats to it.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

rgSpecifies a list of floats to copy to the GPU.
The handle to the GPU memory is returned.

Definition at line 2302 of file CudaDnn.cs.

◆ AllocMemory() [5/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( long  lCapacity,
bool  bHalfSize = false 

Allocate a block of GPU memory with a specified capacity.

lCapacitySpecifies the capacity to allocate (in items, not bytes).
bHalfSizeOptionally, specifies to use half size float memory - only available with the 'float' base type.
The handle to the GPU memory is returned.

Definition at line 2449 of file CudaDnn.cs.

◆ AllocMemory() [6/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( T[]  rgSrc,
long  hStream = 0,
bool  bHalfSize = false 

Allocate a block of GPU memory and copy an array of type 'T' to it, optionally using a stream for the copy.

rgSrcSpecifies an array of 'T' to copy to the GPU.
hStreamOptionally, specifies a stream to use for the copy.
bHalfSizeOptionally, specifies to use half size float memory - only available with the 'float' base type.
The handle to the GPU memory is returned.

Definition at line 2338 of file CudaDnn.cs.

◆ AllocPCAData()

long MyCaffe.common.CudaDnn< T >.AllocPCAData ( int  nM,
int  nN,
int  nK,
out int  nCount 

Allocates the GPU memory for the PCA Data.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K <= N).
nCountReturns the total number of items in the allocated data (nM * nN).

Definition at line 5319 of file CudaDnn.cs.

◆ AllocPCAEigenvalues()

long MyCaffe.common.CudaDnn< T >.AllocPCAEigenvalues ( int  nM,
int  nN,
int  nK,
out int  nCount 

Allocates the GPU memory for the PCA eigenvalues.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K <= N).
nCountReturns the total number of items in the allocated data (nM * nN).

Definition at line 5370 of file CudaDnn.cs.

◆ AllocPCALoads()

long MyCaffe.common.CudaDnn< T >.AllocPCALoads ( int  nM,
int  nN,
int  nK,
out int  nCount 

Allocates the GPU memory for the PCA loads.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K <= N).
nCountReturns the total number of items in the allocated data (nM * nN).

Definition at line 5353 of file CudaDnn.cs.

◆ AllocPCAScores()

long MyCaffe.common.CudaDnn< T >.AllocPCAScores ( int  nM,
int  nN,
int  nK,
out int  nCount 

Allocates the GPU memory for the PCA scores.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K <= N).
nCountReturns the total number of items in the allocated data (nM * nN).

Definition at line 5336 of file CudaDnn.cs.

◆ asum()

T MyCaffe.common.CudaDnn< T >.asum ( int  n,
long  hX,
int  nXOff = 0 

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
the absolute value sum is returned as a type 'T'.

Definition at line 6901 of file CudaDnn.cs.

◆ asum_double()

double MyCaffe.common.CudaDnn< T >.asum_double ( int  n,
long  hX,
int  nXOff = 0 

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
the absolute sum is returned as a type

Definition at line 6871 of file CudaDnn.cs.

◆ asum_float()

float MyCaffe.common.CudaDnn< T >.asum_float ( int  n,
long  hX,
int  nXOff = 0 

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
the absolute sum is returned as a type

Definition at line 6886 of file CudaDnn.cs.

◆ axpby() [1/3]

void MyCaffe.common.CudaDnn< T >.axpby ( int  n,
double  fAlpha,
long  hX,
double  fBeta,
long  hY 

Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type
hXSpecifies a handle to the vector X in GPU memory.
fBetaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6595 of file CudaDnn.cs.

◆ axpby() [2/3]

void MyCaffe.common.CudaDnn< T >.axpby ( int  n,
float  fAlpha,
long  hX,
float  fBeta,
long  hY 

Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type
hXSpecifies a handle to the vector X in GPU memory.
fBetaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6611 of file CudaDnn.cs.

◆ axpby() [3/3]

void MyCaffe.common.CudaDnn< T >.axpby ( int  n,
long  hX,
long  hY 

Scale the vector x by Alpha and scale vector y by Beta and then add both together.

Y = (X * fAlpha) + (Y * fBeta)

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type 'T'.
hXSpecifies a handle to the vector X in GPU memory.
fBetaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type 'T'.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6629 of file CudaDnn.cs.

◆ axpy() [1/3]

void MyCaffe.common.CudaDnn< T >.axpy ( int  n,
double  fAlpha,
long  hX,
long  hY 

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6544 of file CudaDnn.cs.

◆ axpy() [2/3]

void MyCaffe.common.CudaDnn< T >.axpy ( int  n,
float  fAlpha,
long  hX,
long  hY 

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6559 of file CudaDnn.cs.

◆ axpy() [3/3]

void MyCaffe.common.CudaDnn< T >.axpy ( int  n,
long  hX,
long  hY,
int  nXOff = 0,
int  nYOff = 0 

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type 'T'.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6576 of file CudaDnn.cs.

◆ basetype_size()

static ulong MyCaffe.common.CudaDnn< T >.basetype_size ( bool  bUseHalfSize)

Returns the base type size in bytes.

bUseHalfSizeSpecifies whether or not to use half size or the base size.

Definition at line 1899 of file CudaDnn.cs.

◆ BatchNormBackward()

void MyCaffe.common.CudaDnn< T >.BatchNormBackward ( long  hCuDnn,
long  hBwdBottomDesc,
long  hBottomData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDiffDesc,
long  hBottomDiff,
long  hBwdScaleBiasMeanVarDesc,
long  hScaleData,
long  hScaleDiff,
long  hBiasDiff,
double  dfEps,
long  hSaveMean,
long  hSaveInvVar 

Run the batch norm backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
modeSpecifies the batch normalization mode.
fAlphaDiffSpecifies the alpha value applied to the diff.
fBetaDiffSpecifies the beta value applied to the diff.
fAlphaParamDiffSpecifies the alpha value applied to the param diff.
fBetaParamDiffSpecifies the beta value applied to the param diff.
hBwdBottomDescSpecifies a handle to the backward bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data tensor.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor.
hTopDiffSpecifies a handle to the top diff tensor.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff tensor.
hBwdScaleBiasMeanVarDescSpecifies a handle to the backward scale bias mean var descriptor.
hScaleDataSpecifies a handle to the scale data tensor.
hScaleDiffSpecifies a handle to the scale diff tensor.
hBiasDiffSpecifies a handle to the bias diff tensor.
dfEpsSpecifies the epsilon value.
hSaveMeanSpecifies a handle to the saved mean tensor.
hSaveInvVarSpecifies a handle to the saved variance tensor.

Definition at line 4191 of file CudaDnn.cs.

◆ BatchNormForward()

void MyCaffe.common.CudaDnn< T >.BatchNormForward ( long  hCuDnn,
long  hFwdBottomDesc,
long  hBottomData,
long  hFwdTopDesc,
long  hTopData,
long  hFwdScaleBiasMeanVarDesc,
long  hScaleData,
long  hBiasData,
double  dfFactor,
long  hGlobalMean,
long  hGlobalVar,
double  dfEps,
long  hSaveMean,
long  hSaveInvVar,
bool  bTraining 

Run the batch norm forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
modeSpecifies the batch normalization mode.
fAlphaSpecifies the alpha value.
fBetaSpecifies the beta value.
hFwdBottomDescSpecifies a handle to the forward bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data tensor.
hFwdTopDescSpecifies a handle to the forward top tensor descriptor.
hTopDataSpecifies a handle to the top tensor.
hFwdScaleBiasMeanVarDescSpecifies a handle to the forward scale bias mean variance descriptor.
hScaleDataSpecifies a handle to the scale tensor.
hBiasDataSpecifies a handle to the bias tensor.
dfFactorSpecifies a scaling factor.
hGlobalMeanSpecifies a handle to the global mean tensor.
hGlobalVarSpecifies a handle to the global variance tensor.
dfEpsSpecifies the epsilon value to avoid dividing by zero.
hSaveMeanSpecifies a handle to the saved mean tensor.
hSaveInvVarSpecifies a handle to the saved variance tensor.
bTrainingSpecifies that this is a training pass when true, and a testing pass when false.

Definition at line 4161 of file CudaDnn.cs.

◆ batchreidx_bwd()

void MyCaffe.common.CudaDnn< T >.batchreidx_bwd ( int  nCount,
int  nInnerDim,
long  hTopDiff,
long  hTopIdx,
long  hBegins,
long  hCounts,
long  hBottomDiff 

Performs the backward pass for batch re-index

nCountSpecifies the number of items.
nInnerDimSpecifies the inner dimension.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopIdxSpecifies a handle to the top indexes in GPU memory.
hBeginsSpecifies a handle to the begin data in GPU memory.
hCountsSpecifies a handle to the counts in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 8745 of file CudaDnn.cs.

◆ batchreidx_fwd()

void MyCaffe.common.CudaDnn< T >.batchreidx_fwd ( int  nCount,
int  nInnerDim,
long  hBottomData,
long  hPermutData,
long  hTopData 

Performs the forward pass for batch re-index

nCountSpecifies the number of items.
nInnerDimSpecifies the inner dimension.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hPermutDataSpecifies a handle to the permuation data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 8727 of file CudaDnn.cs.

◆ bias_fwd()

void MyCaffe.common.CudaDnn< T >.bias_fwd ( int  nCount,
long  hBottomData,
long  hBiasData,
int  nBiasDim,
int  nInnerDim,
long  hTopData 

Performs a bias forward pass in Cuda.

nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
hBiasDataSpecifies a handle to the bias data in GPU memory.
nBiasDimSpecifies the bias dimension.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9958 of file CudaDnn.cs.

◆ bnll_bwd()

void MyCaffe.common.CudaDnn< T >.bnll_bwd ( int  nCount,
long  hTopDiff,
long  hBottomData,
long  hBottomDiff 

Performs a binomial normal log liklihod (BNLL) backward pass in Cuda.

nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9516 of file CudaDnn.cs.

◆ bnll_fwd()

void MyCaffe.common.CudaDnn< T >.bnll_fwd ( int  nCount,
long  hBottomData,
long  hTopData 

Performs a binomial normal log liklihod (BNLL) forward pass in Cuda.

Computes $ f(x) = ln(1 + e^x) $

nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9501 of file CudaDnn.cs.

◆ calc_dft_coefficients()

void MyCaffe.common.CudaDnn< T >.calc_dft_coefficients ( int  n,
long  hX,
int  m,
long  hY 

Calculates the discrete Fourier Transform (DFT) coefficients across the frequencies 1...n/2 (Nyquest Limit) for the array of values in host memory referred to by hA. Return values are placed in the host memory referenced by hY.

nSpecifies the number of items.
hXSpecifies a handle to the host memory holding the input values.
mSpecifies the number of items in hY, must = n/2 (Nyquest Limit)
hYSpecifies a handle to the host memory holding the n/2 output values (Nyquest Limit)
See also
Implement the Spectrogram from scratch in python by Yumi, Yumi's Blog, 2018

Definition at line 11027 of file CudaDnn.cs.

◆ calculate_batch_distances()

double[] MyCaffe.common.CudaDnn< T >.calculate_batch_distances ( DistanceMethod  distMethod,
double  dfThreshold,
int  nItemDim,
long  hSrc,
long  hTargets,
long  hWork,
int  rgOffsets[,] 

The calculate_batch_distances method calculates a set of distances based on the DistanceMethod specified.

distMethodSpecifies the DistanceMethod to use (i.e. HAMMING or EUCLIDEAN).
dfThresholdSpecifies the threshold used when binarifying the values for the HAMMING distance. This parameter is ignored when calculating the EUCLIDEAN distance.
nItemDimSpecifies the dimension of a single item.
hSrcSpecifies the GPU memory containing the source items.
hTargetsSpecifies the GPU memory containing the target items that are compared against the source items.
hWorkSpecifies the GPU memory containing the work memory - this must be the same size as the maximum size of the src or targets.
rgOffsetsSpecifies the array of offset pairs where the first offset is into the source and the second is into the target.
The array distances corresponding to each offset pair is returned.

Definition at line 11046 of file CudaDnn.cs.

◆ channel_add()

void MyCaffe.common.CudaDnn< T >.channel_add ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nBlocks,
int  nInnerNum,
int  nOffset,
long  hX,
long  hY,
DIR  dir 

Add data along channels similar to numpy split function but where the data is added instead of copied.

nCountSpecifies the total number of elements in Y which = count(X)/nBlocks in length.
nOuterNumSpecifies the number of items.
nChannelsSpecifies the number of channels.
nBlocksSpecifies the number of blocks in each channel.
nInnerNumSpecifies the dimension of each inner dim within the channel.
nOffsetSpecifies the offset of the inner dim.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dirSpecifies the direction of data flow (0 = fwd X->Y, 1 = bwd Y->X).

Definition at line 8437 of file CudaDnn.cs.

◆ channel_compare()

void MyCaffe.common.CudaDnn< T >.channel_compare ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 

Compares the values of the channels from X and places the result in Y where 1 is set if the values are equal otherwise 0 is set.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory of length nOuterNum.

Definition at line 8133 of file CudaDnn.cs.

◆ channel_copy()

void MyCaffe.common.CudaDnn< T >.channel_copy ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nBlocks,
int  nInnerNum,
int  nOffset,
long  hX,
long  hY,
DIR  dir 

Copy data along channels similar to numpy split function.

nCountSpecifies the total number of elements in Y which = count(X)/nBlocks in length.
nOuterNumSpecifies the number of items.
nChannelsSpecifies the number of channels.
nBlocksSpecifies the number of blocks in each channel.
nInnerNumSpecifies the dimension of each inner dim within the channel.
nOffsetSpecifies the offset of the inner dim.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dirSpecifies the direction of data flow (0 = fwd X->Y, 1 = bwd Y->X).

Definition at line 8457 of file CudaDnn.cs.

◆ channel_copyall()

void MyCaffe.common.CudaDnn< T >.channel_copyall ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 

Copy all data from X (shape 1,c,sd) to each num in Y (shape n,c,sd).

nCountSpecifies the full count of Y.
nOuterNumSpecifies the outer num of Y.
nChannelsSpecifies the channels in X and Y.
nInnerNumSpecifies the spatial dimension of X and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8474 of file CudaDnn.cs.

◆ channel_div()

void MyCaffe.common.CudaDnn< T >.channel_div ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
int  nMethod = 1 

Divides the values of the channels from X and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nMethodSpecifies the method of traversing the channel, nMethod = 1 (the default) is used by the SoftmaxLayer and nMethod = 2 is used by the GRNLayer.

Definition at line 8254 of file CudaDnn.cs.

◆ channel_dot()

void MyCaffe.common.CudaDnn< T >.channel_dot ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hA,
long  hY 

Calculates the dot product the the values within each channel of X and places the result in Y.

nCountSpecifies the number of elements.
nOuterNumSpecifies the number of images.
nChannelsSpecifies the number of channels per image.
nInnerNumSpecifies the dimension of each image.
hXSpecifies a handle to the vector X in GPU memory.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8326 of file CudaDnn.cs.

◆ channel_duplicate()

void MyCaffe.common.CudaDnn< T >.channel_duplicate ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 

Duplicates each channel 'nInnerNum' of times in the destination.

nCountSpecifies the total number of elements in Y which = count(X)*nInnerDim in length.
nOuterNumSpecifies the number of items.
nChannelsSpecifies the number of channels.
nInnerNumSpecifies the dimension of each inner dim within the channel.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8343 of file CudaDnn.cs.

◆ channel_fill()

void MyCaffe.common.CudaDnn< T >.channel_fill ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
int  nLabelDim,
long  hLabels,
long  hY 

Fills each channel with the channel item of Y with the data of X matching the label index specified by hLabels.

nCountSpecifies the number of items in Y.
nOuterNumSpecifies the num of Y and Labels.
nChannelsSpecifies the channel size of Y and X.
nInnerNumSpecifies the spatial dimension of X and Y, but is normally 1.
hXSpecifies the GPU memory containing the encodings (usually centroids) of each label 0, ... max label.
nLabelDimSpecifies the dimension of the label channels. A value > 1 indicates that more than one label are stored per channel in which case only the first label is used.
hLabelsSpecifies the label ordering that determines how Y is filled using data from X.
hYSpecifies the GPU memory of the output data.

This function is used to fill a blob with data matching a set of labels. For example in a 3 item encoding based system with 4 labels: X = 4 channels of 3 items each (e.g. an encoding for each label). The values of hLabels show the ordering for which to fill hY with the labeled encodings. So if hLabels = 0, 2, 1, 3, 1, then Y = size { 5, 3, 1, 1 }, 5 items each with encoding sizes of 3 items which are then filled with the encoding at position 0, (for label 0), followed by the encoding for label 2, then 1, 3 and ending with the encoding for 1 as specified by the labels.

Definition at line 8179 of file CudaDnn.cs.

◆ channel_fillfrom()

void MyCaffe.common.CudaDnn< T >.channel_fillfrom ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
DIR  dir 

Fills each channel with the the values stored in Src data where the X data continains nOuterNum x nChannels of data, (e.g. one item per channel) that is then copied to all nInnerNum elements of each channel in Y

nCountSpecifies the number of items in Y.
nOuterNumSpecifies the num of Y and Labels.
nChannelsSpecifies the channel size of Y and X.
nInnerNumSpecifies the spatial dimension of X and Y, but is normally 1.
hXSpecifies the GPU memory containing the src data of shape (nOuterNum, nChannels, 1).
hYSpecifies the GPU memory of the output data where the X src data is copied where each item per channel is filled across all nInnerNum elements of Y. Y should have shape (nOuterNum, nChannels, nInnerNum).
dirSpecifies the direction of data flow. When FWD X->Y, when BWD Y->X

Definition at line 8152 of file CudaDnn.cs.

◆ channel_max()

void MyCaffe.common.CudaDnn< T >.channel_max ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
bool  bReturnIdx = false 

Calculates the maximum value within each channel of X and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
bReturnIdxOptionally, specifies to return the index of the maximum value, otherwise the maximum value is returned.

Definition at line 8099 of file CudaDnn.cs.

◆ channel_mean()

void MyCaffe.common.CudaDnn< T >.channel_mean ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 

Calculates the mean value of each channel of X and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8116 of file CudaDnn.cs.

◆ channel_min()

void MyCaffe.common.CudaDnn< T >.channel_min ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
bool  bReturnIdx = false 

Calculates the minimum value within each channel of X and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
bReturnIdxOptionally, specifies to return the index of the minimum value, otherwise the minimum value is returned.

Definition at line 8081 of file CudaDnn.cs.

◆ channel_mul()

void MyCaffe.common.CudaDnn< T >.channel_mul ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
int  nMethod = 1 

Multiplies the values of the channels from X and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nMethodSpecifies the method of traversing the channel, nMethod = 1 (the default) is used by the SoftmaxLayer and nMethod = 2 is used by the GRNLayer.

Definition at line 8272 of file CudaDnn.cs.

◆ channel_mulv()

void MyCaffe.common.CudaDnn< T >.channel_mulv ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hA,
long  hX,
long  hC 

Multiplies the values in vector X by each channel in matrix A and places the result in matrix C.

nCountSpecifies the number of elements in A.
nOuterNumSpecifies the number of items within A.
nChannelsSpecifies the number of channels per item of A.
nInnerNumSpecifies the dimension of each item in A and X.
hASpecifies a handle to the matrix A in GPU memory.
hXSpecifies a handle to the vector X in GPU memory (must be of length nInnerDim).
hCSpecifies a handle to the matrix C in GPU memory where the results are placed (matrix A and C are the same shape).

Definition at line 8290 of file CudaDnn.cs.

◆ channel_op_bwd()

void MyCaffe.common.CudaDnn< T >.channel_op_bwd ( OP  op,
int  nCount,
int  nC,
int  nN1,
int  nSD1,
int  nN2,
int  nSD2,
int  nCy,
int  nSDy,
long  hA,
long  hB,
long  hY,
long  hAd,
long  hBd,
long  hYd,
long  hWork 

Performs a channel operation backward on the data.

opSpecifies the operation to perform.
nCountSpecifies the number of items in Y which should equal max(nN1, nN2) x nC x max(nSD1, nSD2).
nCSpecifies the channels in both A, B and Y.
nN1Specifies the number of items in A.
nSD1Specifies the spatial dimension of each item of A.
nN2Specifies the number of items in B.
nSD2Specifies the spatial dimension of each item of B.
nCySpecifies the channels of each item of Y.
nSDySpecifies the spatial dimension of each item of Y.
hASpecifies a handle to the memory of A which has the size nN1 x nC1 x nSD1.
hBSpecifies a handle to the memory of B which has the size nN2 x nC2 x nSD2.
hYSpecifies a handle to the memory where the result is placed during FWD with size max(nN1, nN2) x nC x max(nSD1, nSD2).
hAdOptionally, specifies a handle to the memory of the diff for A (filled during BWD) with size nN1, nC, nSD1.
hBdOptionally, specifies a handle to the memory of the diff for b (filled during BWD) with size nN2, nC, nSD2.
hYdOptionally, specifies a handle to the memory of the diff for Y (used during BWD).
hWorkOptionally, specifies a handle to work memory with the same size as Y (used during BWD)

Definition at line 8413 of file CudaDnn.cs.

◆ channel_op_fwd()

void MyCaffe.common.CudaDnn< T >.channel_op_fwd ( OP  op,
int  nCount,
int  nC,
int  nN1,
int  nSD1,
int  nN2,
int  nSD2,
long  hA,
long  hB,
long  hY 

Performs a channel operation forward on the data.

opSpecifies the operation to perform.
nCountSpecifies the number of items in Y which should equal max(nN1, nN2) x nC x max(nSD1, nSD2).
nCSpecifies the channels in both A, B and Y.
nN1Specifies the number of items in A.
nSD1Specifies the spatial dimension of each item of A.
nN2Specifies the number of items in B.
nSD2Specifies the spatial dimension of each item of B.
hASpecifies a handle to the memory of A which has the size nN1 x nC1 x nSD1.
hBSpecifies a handle to the memory of B which has the size nN2 x nC2 x nSD2.
hYSpecifies a handle to the memory where the result is placed during FWD with size max(nN1, nN2) x nC x max(nSD1, nSD2).

Definition at line 8382 of file CudaDnn.cs.

◆ channel_percentile()

void MyCaffe.common.CudaDnn< T >.channel_percentile ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
double  dfPercentile 

Calculates the percentile along axis = 0.

nCountSpecifies the total number of elements in Y which = count(X)*nInnerDim in length.
nOuterNumSpecifies the number of items.
nChannelsSpecifies the number of channels.
nInnerNumSpecifies the dimension of each inner dim within the channel.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dfPercentileSpecifies the percentile to calculate.

Definition at line 8361 of file CudaDnn.cs.

◆ channel_scale()

void MyCaffe.common.CudaDnn< T >.channel_scale ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hA,
long  hY 

Multiplies the values of the channels from X with the scalar values in B and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of items within X and B.
nChannelsSpecifies the number of channels per item of X and B.
nInnerNumSpecifies the dimension of each data item in X (B should have data dimension = 1).
hXSpecifies a handle to the vector X in GPU memory.
hASpecifies a handle to the vector B containing the scalar values, one per num * channel.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8308 of file CudaDnn.cs.

◆ channel_sub() [1/2]

void MyCaffe.common.CudaDnn< T >.channel_sub ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hA,
long  hX,
long  hY 

Subtracts the values across the channels of X from A and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hASpecifies a handle to the vector A in GPU memory.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8197 of file CudaDnn.cs.

◆ channel_sub() [2/2]

void MyCaffe.common.CudaDnn< T >.channel_sub ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 

Subtracts the values across the channels from X and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8214 of file CudaDnn.cs.

◆ channel_sum()

void MyCaffe.common.CudaDnn< T >.channel_sum ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
bool  bSumAcrossChannels = true,
DIR  dir = DIR.FWD,
int  nChannelsY = -1 

Calculates the sum the the values either across or within each channel (depending on bSumAcrossChannels setting) of X and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory (with expected size nOuterNum, nChannels, nInnerNum).
hYSpecifies a handle to the vector Y in GPU memory (with expected size nOuterNum, nChannels, 1).
bSumAcrossChannelsSpecifies to sum across channels (true), or within each channel (false), default = true.

" <param name="dir">Optionally, specifies the direction (default = DIR.FWD). When DIR.BWD is used, data flows from Y to X where Y data is copied to X and duplicated across the channels of Y. When using bSumAcrossChannels = true, ordering is based on Y ordering Y(c1,c2,c3,c1,c2,c3,c1,c2,c3), and when using bSumAcrossChannels = false, ordering is based on X ordering Y(c1,c1,c1,c2,c2,c2,c3,c3,c3).</param> <param name="nChannelsY">Optionally, specifies the channels of Y (used in special case where Y channels = 1)

Definition at line 8236 of file CudaDnn.cs.

◆ CheckMemoryAttributes()

bool MyCaffe.common.CudaDnn< T >.CheckMemoryAttributes ( long  hSrc,
int  nSrcDeviceID,
long  hDst,
int  nDstDeviceID 

Check the memory attributes of two memory blocks on different devices to see if they are compatible for peer-to-peer memory transfers.

hSrcSpecifies the handle to the source memory.
nSrcDeviceIDSpecifies the device id where the source memory resides.
hDstSpecifies the handle to the destination memory.
nDstDeviceIDSpecifies the device id where the destination memory resides.
This function returns
when both devices support peer-to-peer communcation,

Definition at line 2160 of file CudaDnn.cs.

◆ clip_bwd()

void MyCaffe.common.CudaDnn< T >.clip_bwd ( int  nCount,
long  hTopDiff,
long  hBottomData,
long  hBottomDiff,

Performs a Clip backward pass in Cuda.

nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
fMinSpecifies the bottom value to clip to.
fMaxSpecifies the top value to clip to.

Definition at line 8931 of file CudaDnn.cs.

◆ clip_fwd()

void MyCaffe.common.CudaDnn< T >.clip_fwd ( int  nCount,
long  hBottomData,
long  hTopData,

Performs a Clip forward pass in Cuda.

Calculation $ Y[i] = \max(min, \min(max,X[i])) $

nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
fMinSpecifies the bottom value to clip to.
fMaxSpecifies the top value to clip to.

Definition at line 8914 of file CudaDnn.cs.

◆ cll_bwd()

void MyCaffe.common.CudaDnn< T >.cll_bwd ( int  nCount,
int  nChannels,
double  dfMargin,
bool  bLegacyVersion,
double  dfAlpha,
long  hY,
long  hDiff,
long  hDistSq,
long  hBottomDiff 

Performs a contrastive loss layer backward pass in Cuda.

See Dimensionality Reduction by Learning an Invariant Mapping by Hadsel, et al., 2006

nCountSpecifies the number of items.
nChannelsSpecifies the number of channels.
dfMarginSpecifies the margin to use. The default is 1.0.
the calculation proposed by Hadsell, et al., 2006 is used where $ (margin - d)^2 $, otherwise the legacy version is used where $ (margin - d^2) $. The default is
hYSpecifies the Y data in GPU memory used to determine similar pairs.
hDiffSpecifies the diff in GPU memory.
hDistSqSpecifies the distance squared data in GPU memory.
hBottomDiffSpecifies the bottom diff in GPU memory.

Definition at line 10025 of file CudaDnn.cs.

◆ coeff_sub_bwd()

void MyCaffe.common.CudaDnn< T >.coeff_sub_bwd ( int  nCount,
int  nDim,
int  nNumOffset,
double  dfCoeff,
long  hCoeffData,
long  hTopDiff,
long  hBottomDiff 

Performs a coefficient sub backward pass in Cuda.

nCountSpecifies the number of items.
nDimSpecifies the dimension of the data where the data is sized 'num' x 'dim'.
nNumOffsetSpecifies the offset applied to the coefficent indexing.
dfCoeffSpecifies a primary coefficient value applied to each input before summing.
hCoeffDataOptionally specifies a handle to coefficient data that is applied to the primary coefficient.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 10537 of file CudaDnn.cs.

◆ coeff_sub_fwd()

void MyCaffe.common.CudaDnn< T >.coeff_sub_fwd ( int  nCount,
int  nDim,
int  nNumOffset,
double  dfCoeff,
long  hCoeffData,
long  hBottom,
long  hTop 

Performs a coefficient sub foward pass in Cuda.

nCountSpecifies the number of items.
nDimSpecifies the dimension of the data where the data is sized 'num' x 'dim'.
nNumOffsetSpecifies the offset applied to the coefficent indexing.
dfCoeffSpecifies a primary coefficient value applied to each input before summing.
hCoeffDataOptionally specifies a handle to coefficient data that is applied to the primary coefficient.
hBottomSpecifies a handle to the bottom data in GPU memory.
hTopSpecifies a handle to the top data in GPU memory.

Definition at line 10518 of file CudaDnn.cs.

◆ coeff_sum_bwd()

void MyCaffe.common.CudaDnn< T >.coeff_sum_bwd ( int  nCount,
int  nDim,
int  nNumOffset,
double  dfCoeff,
long  hCoeffData,
long  hTopDiff,
long  hBottomDiff 

Performs a coefficient sum backward pass in Cuda.

nCountSpecifies the number of items.
nDimSpecifies the dimension of the data where the data is sized 'num' x 'dim'.
nNumOffsetSpecifies the offset applied to the coefficent indexing.
dfCoeffSpecifies a primary coefficient value applied to each input before summing.
hCoeffDataOptionally specifies a handle to coefficient data that is applied to the primary coefficient.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 10500 of file CudaDnn.cs.

◆ coeff_sum_fwd()

void MyCaffe.common.CudaDnn< T >.coeff_sum_fwd ( int  nCount,
int  nDim,
int  nNumOffset,
double  dfCoeff,
long  hCoeffData,
long  hBottom,
long  hTop 

Performs a coefficient sum foward pass in Cuda.

nCountSpecifies the number of items.
nDimSpecifies the dimension of the data where the data is sized 'num' x 'dim'.
nNumOffsetSpecifies the offset applied to the coefficent indexing.
dfCoeffSpecifies a primary coefficient value applied to each input before summing.
hCoeffDataOptionally specifies a handle to coefficient data that is applied to the primary coefficient.
hBottomSpecifies a handle to the bottom data in GPU memory.
hTopSpecifies a handle to the top data in GPU memory.

Definition at line 10481 of file CudaDnn.cs.

◆ col2im()

void MyCaffe.common.CudaDnn< T >.col2im ( long  hDataCol,
int  nDataColOffset,
int  nChannels,
int  nHeight,
int  nWidth,
int  nKernelH,
int  nKernelW,
int  nPadH,
int  nPadW,
int  nStrideH,
int  nStrideW,
int  nDilationH,
int  nDilationW,
long  hDataIm,
int  nDataImOffset 

Rearranges the columns into image blocks.

hDataColSpecifies a handle to the column data in GPU memory.
nDataColOffsetSpecifies an offset into the column memory.
nChannelsSpecifies the number of channels in the image.
nHeightSpecifies the height of the image.
nWidthSpecifies the width of the image.
nKernelHSpecifies the kernel height.
nKernelWSpecifies the kernel width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nDilationHSpecifies the dilation along the height.
nDilationWSpecifies the dilation along the width.
hDataImSpecifies a handle to the image block in GPU memory.
nDataImOffsetSpecifies an offset into the image block memory.

Definition at line 8039 of file CudaDnn.cs.

◆ col2im_nd()

void MyCaffe.common.CudaDnn< T >.col2im_nd ( long  hDataCol,
int  nDataColOffset,
int  nNumSpatialAxes,
int  nColCount,
int  nChannelAxis,
long  hImShape,
long  hColShape,
long  hKernelShape,
long  hPad,
long  hStride,
long  hDilation,
long  hDataIm,
int  nDataImOffset 

Rearranges the columns into image blocks.

hDataColSpecifies a handle to the column data in GPU memory.
nDataColOffsetSpecifies an offset into the column memory.
nNumSpatialAxesSpecifies the number of spatial axes.
nColCountSpecifies the number of kernels.
nChannelAxisSpecifies the axis containing the channel.
hImShapeSpecifies a handle to the image shape data in GPU memory.
hColShapeSpecifies a handle to the column shape data in GPU memory.
hKernelShapeSpecifies a handle to the kernel shape data in GPU memory.
hPadSpecifies a handle to the pad data in GPU memory.
hStrideSpecifies a handle to the stride data in GPU memory.
hDilationSpecifies a handle to the dilation data in GPU memory.
hDataImSpecifies a handle to the image block in GPU memory.
nDataImOffsetSpecifies an offset into the image block memory.

Definition at line 8063 of file CudaDnn.cs.

◆ compare_signs()

void MyCaffe.common.CudaDnn< T >.compare_signs ( int  n,
long  hA,
long  hB,
long  hY 

Compares the signs of each value in A and B and places the result in Y.

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7653 of file CudaDnn.cs.

◆ concat_bwd()

void MyCaffe.common.CudaDnn< T >.concat_bwd ( int  nCount,
long  hTopDiff,
int  nNumConcats,
int  nConcatInputSize,
int  nTopConcatAxis,
int  nBottomConcatAxis,
int  nOffsetConcatAxis,
long  hBottomDiff 

Performs a concat backward pass in Cuda.

nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nNumConcatsSpecifies the number of concatenations.
nConcatInputSizeSpecifies the concatenation input size.
nBottomConcatAxisNEEDS REVIEW
nOffsetConcatAxisNEEDS REVIEW
hBottomDiffSpecifies a handle to the Bottom diff in GPU memory.

Definition at line 9869 of file CudaDnn.cs.

◆ concat_fwd()

void MyCaffe.common.CudaDnn< T >.concat_fwd ( int  nCount,
long  hBottomData,
int  nNumConcats,
int  nConcatInputSize,
int  nTopConcatAxis,
int  nBottomConcatAxis,
int  nOffsetConcatAxis,
long  hTopData 

Performs a concat forward pass in Cuda.

nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
nNumConcatsSpecifies the number of concatenations.
nConcatInputSizeSpecifies the concatenation input size.
nTopConcatAxisSpecifies the top axis to concatenate.
nBottomConcatAxisNEEDS REVIEW
nOffsetConcatAxisNEEDS REVIEW
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9849 of file CudaDnn.cs.

◆ contains_point()

bool MyCaffe.common.CudaDnn< T >.contains_point ( int  n,
long  hMean,
long  hWidth,
long  hX,
long  hWork,
int  nXOff = 0 

Returns true if the point is contained within the bounds.

nSpecifies the number of items.
hMeanSpecifies a handle to the mean values in GPU memory.
hWidthSpecifies a handle to the width values in GPU memory.
hXSpecifies a handle to the X values in GPU memory.
hWorkSpecifies a handle to the work data in GPU memory.
nXOffOptionally, specifies an offset into the X vector (default = 0).
If the X values are within the bounds, true is returned, otherwise false.

Definition at line 7943 of file CudaDnn.cs.

◆ ConvertByteSizeToCount()

static ulong MyCaffe.common.CudaDnn< T >.ConvertByteSizeToCount ( ulong  ulSizeInBytes)

Converts the byte size into the number of items in the base data type of float or double.

ulSizeInBytesSpecifies the size in bytes to convert.
The number of items is returned.

Definition at line 2438 of file CudaDnn.cs.

◆ ConvolutionBackwardBias() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardBias ( long  hCuDnn,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hBiasDesc,
long  hBiasDiff,
int  nBiasOffset,
bool  bSyncStream = true 

Perform a convolution backward pass on the bias.

hCuDnnSpecifies a handle to the instance of cuDnn.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hBiasDescSpecifies a handle to the bias tensor descriptor.
hBiasDiffSpecifies a handle to the bias diff in GPU memory.
nBiasOffsetSpecifies an offset into the diff memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3901 of file CudaDnn.cs.

◆ ConvolutionBackwardBias() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardBias ( long  hCuDnn,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hBiasDesc,
long  hBiasDiff,
int  nBiasOffset,
bool  bSyncStream = true 

Perform a convolution backward pass on the bias.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
fBetaSpecifies a scaling factor applied to the prior destination value.
hBiasDescSpecifies a handle to the bias tensor descriptor.
hBiasDiffSpecifies a handle to the bias diff in GPU memory.
nBiasOffsetSpecifies an offset into the diff memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3919 of file CudaDnn.cs.

◆ ConvolutionBackwardData() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardData ( long  hCuDnn,
long  hFilterDesc,
long  hWeight,
int  nWeightOffset,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hConvDesc,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hBottomDesc,
long  hBottomDiff,
int  nBottomOffset,
bool  bSyncStream = true 

Perform a convolution backward pass on the data.

hCuDnnSpecifies a handle to the instance of cuDnn.
hFilterDescSpecifies a handle to the filter descriptor.
hWeightSpecifies a handle to the weight data in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoBwdSpecifies the algorithm to use when performing the backward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3999 of file CudaDnn.cs.

◆ ConvolutionBackwardData() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardData ( long  hCuDnn,
long  hFilterDesc,
long  hWeight,
int  nWeightOffset,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hConvDesc,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hBottomDesc,
long  hBottomDiff,
int  nBottomOffset,
bool  bSyncStream = true 

Perform a convolution backward pass on the data.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hFilterDescSpecifies a handle to the filter descriptor.
hWeightSpecifies a handle to the weight data in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoBwdSpecifies the algorithm to use when performing the backward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 4025 of file CudaDnn.cs.

◆ ConvolutionBackwardFilter() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardFilter ( long  hCuDnn,
long  hBottomDesc,
long  hBottomData,
int  nBottomOffset,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hConvDesc,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hFilterDesc,
long  hWeightDiff,
int  nWeightOffset,
bool  bSyncStream 

Perform a convolution backward pass on the filter.

hCuDnnSpecifies a handle to the instance of cuDnn.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoBwdSpecifies the algorithm to use when performing the backward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
hFilterDescSpecifies a handle to the filter descriptor.
hWeightDiffSpecifies a handle to the weight diff in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3946 of file CudaDnn.cs.

◆ ConvolutionBackwardFilter() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardFilter ( long  hCuDnn,
long  hBottomDesc,
long  hBottomData,
int  nBottomOffset,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hConvDesc,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hFilterDesc,
long  hWeightDiff,
int  nWeightOffset,
bool  bSyncStream = true 

Perform a convolution backward pass on the filter.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoBwdSpecifies the algorithm to use when performing the backward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
fBetaSpecifies a scaling factor applied to the prior destination value.
hFilterDescSpecifies a handle to the filter descriptor.
hWeightDiffSpecifies a handle to the weight diff in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3972 of file CudaDnn.cs.

◆ ConvolutionForward() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionForward ( long  hCuDnn,
long  hBottomDesc,
long  hBottomData,
int  nBottomOffset,
long  hFilterDesc,
long  hWeight,
int  nWeightOffset,
long  hConvDesc,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hTopDesc,
long  hTopData,
int  nTopOffset,
bool  bSyncStream = true 

Perform a convolution forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
hFilterDescSpecifies a handle to the filter descriptor.
hWeightSpecifies a handle to the weight data in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoFwdSpecifies the algorithm to use for the foward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3856 of file CudaDnn.cs.

◆ ConvolutionForward() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionForward ( long  hCuDnn,
long  hBottomDesc,
long  hBottomData,
int  nBottomOffset,
long  hFilterDesc,
long  hWeight,
int  nWeightOffset,
long  hConvDesc,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hTopDesc,
long  hTopData,
int  nTopOffset,
bool  bSyncStream = true 

Perform a convolution forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
hFilterDescSpecifies a handle to the filter descriptor.
hWeightSpecifies a handle to the weight data in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoFwdSpecifies the algorithm to use for the foward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3882 of file CudaDnn.cs.

◆ copy() [1/2]

void MyCaffe.common.CudaDnn< T >.copy ( int  nCount,
int  nNum,
int  nDim,
long  hSrc1,
long  hSrc2,
long  hDst,
long  hSimilar,
bool  bInvert = false 

Copy similar items of length 'nDim' from hSrc1 (where hSimilar(i) = 1) and dissimilar items of length 'nDim' from hSrc2 (where hSimilar(i) = 0).

nCountSpecifies the total data length of hSrc1, hSrc2 and hDst.
nNumSpecifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDimSpecifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrc1Specifies a handle to the GPU memory of source 1.
hSrc2Specifies a handle to the GPU memory of source 2.
hDstSpecifies a handle to the GPU memory of the destination.
hSimilarSpecifies a handle to the GPU memory of the similar data.
bInvertOptionally, specifies whether or not to invert the similar values (e.g. copy when similar = 0 instead of similar = 1)

Definition at line 6035 of file CudaDnn.cs.

◆ copy() [2/2]

void MyCaffe.common.CudaDnn< T >.copy ( int  nCount,
long  hSrc,
long  hDst,
int  nSrcOffset = 0,
int  nDstOffset = 0,
long  hStream = -1,
bool?  bSrcHalfSizeOverride = null,
bool?  bDstHalfSizeOverride = null 

Copy data from one block of GPU memory to another.

This function uses NVIDIA's cuBlas but with a different parameter ordering.

nCountSpecifies the number of items (not bytes) to copy.
hSrcSpecifies a handle to GPU memory containing the source data.
hDstSpecifies a handle to GPU memory containing the destination data.
nSrcOffsetOptionally specifies the offset into the source data where the copying starts.
nDstOffsetOptionally specifies the offset into the destination data where the copying starts.
hStreamOptionally, specifies a handle to a stream to use for the operation.
bSrcHalfSizeOverrideOptionally, specifies and override for the half size state of the source (default = null, which is ignored).
bDstHalfSizeOverrideOptionally, specifies and override for the half size state of the destination (default = null, which is ignored).

Definition at line 6007 of file CudaDnn.cs.

◆ copy_batch()

void MyCaffe.common.CudaDnn< T >.copy_batch ( int  nCount,
int  nNum,
int  nDim,
long  hSrcData,
long  hSrcLbl,
int  nDstCount,
long  hDstCache,
long  hWorkDevData,
int  nLabelStart,
int  nLabelCount,
int  nCacheSize,
long  hCacheHostCursors,
long  hWorkDataHost 

Copy a batch of labeled items into a cache organized by label where older data is removed and replaced by newer data.

nCountSpecifies the total data length of hSrc.
nNumSpecifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDimSpecifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrcDataSpecifies a handle to the GPU memory of source data.
hSrcLblSpecifies a handle to the GPU memory of source labels.
nDstCountSpecifies the total data length of the hDstCache
hDstCacheSpecifies a handle to the GPU memory of the destination cache.
hWorkDevDataSpecifies a handle to the GPU memory of the device work data that is the same size as the hDstCache.
nLabelStartSpecifies the first label of all possible labels.
nLabelCountSpecifies the total number of labels (expects labels to be sequential from 'nLabelStart').
nCacheSizeSpecifies the size of each labeled data cache.
hCacheHostCursorsSpecifies a handle to host memmory (allocated using AllocateHostBuffer) containing the label cursors - there should be 'nLabelCount' cursors.
hWorkDataHostSpecifies a handle to host memory (allocated using AllocateHostBuffer) used for work - must be nNum in item length.

NOTE: The cache size must be set at a sufficient size that covers the maximum number items for any given label within a batch, otherwise cached items will be overwritten for items in the current batch.

Definition at line 6062 of file CudaDnn.cs.

◆ copy_expand()

void MyCaffe.common.CudaDnn< T >.copy_expand ( int  n,
int  nNum,
int  nDim,
long  hX,
long  hA 

Expand a vector of length 'nNum' into a matrix of size 'nNum' x 'nDim' by copying each value of the vector into all elements of the corresponding matrix row.

nSpecifies the total number of items in the matrix 'A'
nNumSpecifies the total number of rows in the matrix 'A' and the total number of items in the vector 'X'.
nDimSpecifies the total number of columns in the matrix 'A'.
hXSpecifies the 'nNum' length vector to expand.
hASpecifies the 'nNum' x 'nDim' matrix.

Definition at line 6182 of file CudaDnn.cs.

◆ copy_sequence() [1/2]

void MyCaffe.common.CudaDnn< T >.copy_sequence ( int  n,
long  hSrc,
int  nSrcStep,
int  nSrcStartIdx,
int  nCopyCount,
int  nCopyDim,
long  hDst,
int  nDstStep,
int  nDstStartIdx,
int  nSrcSpatialDim,
int  nDstSpatialDim,
int  nSrcSpatialDimStartIdx = 0,
int  nDstSpatialDimStartIdx = 0,
int  nSpatialDimCount = -1 

Copy a sequence from a source to a destination and allow for skip steps.

nSpecifies the total number of items in src.
hSrcSpecifies a handle to the source GPU memory.
nSrcStepSpecifies the stepping used across the source.
nSrcStartIdxSpecifies the starting index into the source.
nCopyCountSpecifies the number of items to copy.
nCopyDimSpecifies the dimension to copy (which x spatial dim = total copy amount).
hDstSpecifies a handle to the destination GPU memory.
nDstStepSpecifies the steping used across the desination.
nDstStartIdxSpecifies the starting index where data is to be copied in the destination.
nSrcSpatialDimSpecifies the src spatial dim of each item copied. Src and Dst spatial dims should be equal when nSpatialDimCount is not used.
nDstSpatialDimSpecifies the dst spatial dim of each item copied. Src and Dst spatial dims should be equal when nSpatialDimCount is not used.
nSrcSpatialDimStartIdxOptionally, specifies the start index within the source spatial dim to start the copy (default = 0)
nDstSpatialDimStartIdxOptionally, specifies the start index within the destination spatial dim to start the copy (default = 0)
nSpatialDimCountOptionally, specifies the number of items to copy from within the spatial dim (default = -1, copy all)

Definition at line 6165 of file CudaDnn.cs.

◆ copy_sequence() [2/2]

void MyCaffe.common.CudaDnn< T >.copy_sequence ( int  nK,
int  nNum,
int  nDim,
long  hSrcData,
long  hSrcLbl,
int  nSrcCacheCount,
long  hSrcCache,
int  nLabelStart,
int  nLabelCount,
int  nCacheSize,
long  hCacheHostCursors,
bool  bOutputLabels,
List< long >  rghTop,
List< int >  rgnTopCount,
long  hWorkDataHost,
bool  bCombinePositiveAndNegative = false,
int  nSeed = 0 

Copy a sequence of cached items, organized by label, into an anchor, positive (if nK > 0), and negative blobs.

nKSpecifies the output type expected where: nK = 0, outputs to 2 tops (anchor and one negative), or nK > 0, outputs to 2 + nK tops (anchor, positive, nK negatives). The rghTop and rgnTopCount must be sized accordingly.
nNumSpecifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDimSpecifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrcDataSpecifies a handle to the GPU memory of source data.
hSrcLblSpecifies a handle to the GPU memory of source labels.
nSrcCacheCountSpecifis the number of items in hSrcCache (nCacheSize * nLabelCount).
hSrcCacheSpecifies a handle to the cached labeled data.
nLabelStartSpecifies the first label of all possible labels.
nLabelCountSpecifies the total number of labels (expects labels to be sequential from 'nLabelStart').
nCacheSizeSpecifies the size of each labeled data cache.
hCacheHostCursorsSpecifies a handle to host memmory containing the label cursors - there should be 'nLabelCount' cursors.
bOutputLabelsSpecifies whether or not to output labels or not. When true, one additional top is expected for the labels.
rghTopSpecifies a list of the GPU memory for each top item. The number of top items expected depends on the 'nK' value.
rgnTopCountSpecifies a list of the item count for each top item. The number of top items expected depends on the 'nK' value.
hWorkDataHostSpecifies a handle to host memory (allocated using AllocateHostBuffer) used for work - must be nNum in item length and must be the same hWorkDataHost passed to 'copy_batch'.
bCombinePositiveAndNegativeOptionally, specifies to combine the positive and negative items by alternating between each and placing both in Top[1], while also making sure the output labels reflect the alternation.
nSeedOptionally, specifies a seed for the random number generator (default = 0, which igores this parameter).

Receiving an error ERROR_BATCH_TOO_SMALL indicates that the batch size is too small and does not have enough labels to choose from. Each batch should have at least two instances of each labeled item.

NOTE: When 'nK' = 1 and 'bCombinePositiveAndNegative' = true, the label output has a dimension of 2, and and the tops used are as follows: top(0) = anchor; top(1) = alternating negative/positive, top(2) = labels if 'bOutputLabels' = true.

Definition at line 6095 of file CudaDnn.cs.

◆ CopyDeviceToHost()

void MyCaffe.common.CudaDnn< T >.CopyDeviceToHost ( long  lCount,
long  hGpuSrc,
long  hHostDst 

Copy from GPU memory to Host memory.

lCountSpecifies the number of items (of base type each) to copy.
hGpuSrcSpecifies the GPU memory containing the source data.
hHostDstSpecifies the Host memory containing the host destination.

Definition at line 2554 of file CudaDnn.cs.

◆ CopyHostToDevice()

void MyCaffe.common.CudaDnn< T >.CopyHostToDevice ( long  lCount,
long  hHostSrc,
long  hGpuDst 

Copy from Host memory to GPU memory.

lCountSpecifies the number of items (of base type each) to copy.
hHostSrcSpecifies the Host memory containing the host source data.
hGpuDstSpecifies the GPU memory containing the destination.

Definition at line 2568 of file CudaDnn.cs.

◆ CreateConvolutionDesc()

long MyCaffe.common.CudaDnn< T >.CreateConvolutionDesc ( )

Create a new instance of a convolution descriptor for use with NVIDIA's cuDnn.

The convolution descriptor handle is returned.

Definition at line 3747 of file CudaDnn.cs.

◆ CreateCuDNN()

long MyCaffe.common.CudaDnn< T >.CreateCuDNN ( long  hStream = 0)

Create a new instance of NVIDIA's cuDnn.

hStreamSpecifies a stream used by cuDnn.
The handle to cuDnn is returned.

Definition at line 3263 of file CudaDnn.cs.

◆ CreateDropoutDesc()

long MyCaffe.common.CudaDnn< T >.CreateDropoutDesc ( )

Create a new instance of a dropout descriptor for use with NVIDIA's cuDnn.

The dropout descriptor handle is returned.

Definition at line 4203 of file CudaDnn.cs.

◆ CreateExtension()

long MyCaffe.common.CudaDnn< T >.CreateExtension ( string  strExtensionDllPath)

Create an instance of an Extension DLL.

strExtensionDllPathSpecifies the file path to the extension DLL.
The handle to a new instance of Extension is returned.

Definition at line 3456 of file CudaDnn.cs.

◆ CreateFilterDesc()

long MyCaffe.common.CudaDnn< T >.CreateFilterDesc ( )

Create a new instance of a filter descriptor for use with NVIDIA's cuDnn.

The filter descriptor handle is returned.

Definition at line 3668 of file CudaDnn.cs.

◆ CreateImageOp()

long MyCaffe.common.CudaDnn< T >.CreateImageOp ( int  nNum,
double  dfBrightnessProb,
double  dfBrightnessDelta,
double  dfContrastProb,
double  dfContrastLower,
double  dfContrastUpper,
double  dfSaturationProb,
double  dfSaturationLower,
double  dfSaturationUpper,
long  lRandomSeed = 0 

Create a new ImageOp used to perform image operations on the GPU.

nNumSpecifies the number of items (usually the blob.num).
dfBrightnessProbSpecifies the brightness probability [0,1].
dfBrightnessDeltaSpecifies the brightness delta.
dfContrastProbSpecifies the contrast probability [0,1]
dfContrastLowerSpecifies the contrast lower bound value.
dfContrastUpperSpecifies the contrast upper bound value.
dfSaturationProbSpecifies the saturation probability [0,1]
dfSaturationLowerSpecifies the saturation lower bound value.
dfSaturationUpperSpecifies the saturation upper bound value.
lRandomSeedOptionally, specifies the random seed or 0 to ignore (default = 0).
A handle to the ImageOp is returned.

Definition at line 3153 of file CudaDnn.cs.

◆ CreateLayerNorm()

long MyCaffe.common.CudaDnn< T >.CreateLayerNorm ( int  nGpuID,
int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
float  fEps = 1e-10f 

Create the Cuda version of LayerNorm

nGpuIDSpecifies the GPUID to use.
nCountSpecifies the total number of items in the input (and output).
nOuterNumSpecifies the outer number of items (e.g., num)
nChannelsSpecifies the number of channels in the data.
nInnerNumSpecifies the spatial dimentions of the inner data.
fEpsOptionally, specifies the epsilon value to avoid numeric issues (default = 1e-10).
The handle to the LayerNorm configuration. This handle is used with all other layer norm functions.

Definition at line 5828 of file CudaDnn.cs.

◆ CreateLRNDesc()

long MyCaffe.common.CudaDnn< T >.CreateLRNDesc ( )

Create a new instance of a LRN descriptor for use with NVIDIA's cuDnn.

The LRN descriptor handle is returned.

Definition at line 4308 of file CudaDnn.cs.

◆ CreateMemoryPointer()

long MyCaffe.common.CudaDnn< T >.CreateMemoryPointer ( long  hData,
long  lOffset,
long  lCount 

Creates a memory pointer into an already existing block of GPU memory.

hDataSpecifies a handle to the GPU memory.
lOffsetSpecifies the offset into the GPU memory (in items, not bytes), where the pointer is to start.
lCountSpecifies the number of items (not bytes) in the 'virtual' memory block pointed to by the memory pointer.
A handle to the memory pointer is returned. Handles to memory poitners can be used like any other handle to GPU memory.

Definition at line 3028 of file CudaDnn.cs.

◆ CreateMemoryTest()

long MyCaffe.common.CudaDnn< T >.CreateMemoryTest ( out ulong  ulTotalNumBlocks,
out double  dfMemAllocatedInGB,
out ulong  ulMemStartAddr,
out ulong  ulBlockSize,
double  dfPctToAllocate = 1.0 

Creates a new memory test on the current GPU.

ulTotalNumBlocksReturns the total number of blocks available to test.
dfMemAllocatedInGBReturns the total amount of allocated memory, specified in GB.
ulMemStartAddrReturns the start address of the memory test.
ulBlockSizeReturns the block size of the memory to be tested.
dfPctToAllocateSpecifies the percentage of avaiable memory to test, where 1.0 = 100%.
A handle to the memory test is returned.

Definition at line 3069 of file CudaDnn.cs.

◆ CreateNCCL()

long MyCaffe.common.CudaDnn< T >.CreateNCCL ( int  nDeviceId,
int  nCount,
int  nRank,
Guid  guid 

Create an instance of NVIDIA's NCCL 'Nickel'

nDeviceIdSpecifies the device where this instance of NCCL is going to run.
nCountSpecifies the total number of NCCL instances used.
nRankSpecifies the zero-based rank of this instance of NCCL.
guidSpecifies the unique Guid for this isntance of NCCL.
The handle to a new instance of NCCL is returned.

Definition at line 3297 of file CudaDnn.cs.

◆ CreatePCA()

long MyCaffe.common.CudaDnn< T >.CreatePCA ( int  nMaxIterations,
int  nM,
int  nN,
int  nK,
long  hData,
long  hScoresResult,
long  hLoadsResult,
long  hResiduals = 0,
long  hEigenvalues = 0 

Creates a new PCA instance and returns the handle to it.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

nMaxIterationsSpecifies the number of iterations to run.
nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K less than or equal to N).
hDataSpecifies a handle to the data allocated using AllocatePCAData.
hScoresResultSpecifies a handle to the data allocated using AllocatePCAScores.
hLoadsResultSpecifies a handle to the data allocated using AllocatePCALoads.
hResidualsSpecifies a handle to the data allocated using AllocatePCAData.
hEigenvaluesSpecifies a handle to the data allocated using AllocatePCAEigenvalues.

Definition at line 5392 of file CudaDnn.cs.

◆ CreatePoolingDesc()

long MyCaffe.common.CudaDnn< T >.CreatePoolingDesc ( )

Create a new instance of a pooling descriptor for use with NVIDIA's cuDnn.

The pooling descriptor handle is returned.

Definition at line 4037 of file CudaDnn.cs.

◆ CreateRnn8()

long MyCaffe.common.CudaDnn< T >.CreateRnn8 ( )

Create the RNN8.

A handle to the RNN8 is returned.

Definition at line 5160 of file CudaDnn.cs.

◆ CreateRnnDataDesc()

long MyCaffe.common.CudaDnn< T >.CreateRnnDataDesc ( )

Create the RNN Data Descriptor.

A handle to the RNN Data descriptor is returned.

Definition at line 4652 of file CudaDnn.cs.

◆ CreateRnnDesc()

long MyCaffe.common.CudaDnn< T >.CreateRnnDesc ( )

Create the RNN Descriptor.

A handle to the RNN descriptor is returned.

Definition at line 4733 of file CudaDnn.cs.

◆ CreateSSD()

long MyCaffe.common.CudaDnn< T >.CreateSSD ( int  nNumClasses,
bool  bShareLocation,
int  nLocClasses,
int  nBackgroundLabelId,
bool  bUseDiffcultGt,
SSD_MINING_TYPE  miningType,
SSD_MATCH_TYPE  matchType,
float  fOverlapThreshold,
bool  bUsePriorForMatching,
SSD_CODE_TYPE  codeType,
bool  bEncodeVariantInTgt,
bool  bBpInside,
bool  bIgnoreCrossBoundaryBbox,
bool  bUsePriorForNms,
float  fNegPosRatio,
float  fNegOverlap,
int  nSampleSize,
bool  bMapObjectToAgnostic,
bool  bNmsParam,
float?  fNmsThreshold = null,
int?  nNmsTopK = null,
float?  fNmsEta = null 

Create an instance of the SSD GPU support.

nNumClassesSpecifies the number of classes.
bShareLocationSpecifies whether or not to share the location.
nLocClassesSpecifies the number of location classes.
nBackgroundLabelIdSpecifies the background label ID.
bUseDiffcultGtSpecifies whether or not to use difficult ground truths.
miningTypeSpecifies the mining type to use.
matchTypeSpecifies the matching method to use.
fOverlapThresholdSpecifies the overlap threshold for each box.
bUsePriorForMatchingSpecifies whether or not to use priors for matching.
codeTypeSpecifies the code type to use.
bEncodeVariantInTgtSpecifies whether or not to encode the variant in the target.
bBpInsideSpecifies whether or not the BP is inside or not.
bIgnoreCrossBoundaryBboxSpecifies whether or not to ignore cross boundary boxes.
bUsePriorForNmsSpecifies whether or not to use priors for NMS.
confLossTypeSpecifies the confidence loss type.
locLossTypeSpecifies the location loss type.
fNegPosRatioSpecifies the negative/positive ratio to use.
fNegOverlapSpecifies the negative overlap to use.
nSampleSizeSpecifies the sample size.
bMapObjectToAgnosticSpecifies whether or not to map objects to agnostic or not.
bNmsParamSpecifies whether or not the NMS parameters are specified.
fNmsThresholdSpecifies the NMS threshold, which is only used when the 'bNmsParam' = true.
nNmsTopKSpecifies the NMS top-k selection, which is only used when the 'bNmsParam' = true.
fNmsEtaSpecifies the NMS eta, which is only used when the 'bNmsParam' = true.
A handle to the SSD instance is returned.

Definition at line 5482 of file CudaDnn.cs.

◆ CreateStream()

long MyCaffe.common.CudaDnn< T >.CreateStream ( bool  bNonBlocking = false,
int  nIndex = -1 

Create a new stream on the current GPU.

(the default) the created stream is a 'blocking' stream, otherwise it is an asynchronous, non-blocking stream.
nIndexSpecifies an index for the stream where indexed streams are shared when the index = 0 or greater.
The handle to the stream is returned.

Definition at line 3209 of file CudaDnn.cs.

◆ CreateTensorDesc()

long MyCaffe.common.CudaDnn< T >.CreateTensorDesc ( )

Create a new instance of a tensor descriptor for use with NVIDIA's cuDnn.

The tensor descriptor handle is returned.

Definition at line 3518 of file CudaDnn.cs.

◆ crop_bwd()

void MyCaffe.common.CudaDnn< T >.crop_bwd ( int  nCount,
int  nNumAxes,
long  hSrcStrides,
long  hDstStrides,
long  hOffsets,
long  hBottomDiff,
long  hTopDiff 

Performs the crop backward operation.

nCountSpecifies the count.
nNumAxesSpecifies the number of axes in the bottom.
hSrcStridesSpecifies a handle to the GPU memory containing the source strides.
hDstStridesSpecifies a handle to the GPU memory containing the destination strides.
hOffsetsSpecifies a handle to the GPU memory containing the offsets.
hBottomDiffSpecifies a handle to the bottom data in GPU memory.
hTopDiffSpecifies a handle to the top data in GPU memory.

Definition at line 9830 of file CudaDnn.cs.

◆ crop_fwd()

void MyCaffe.common.CudaDnn< T >.crop_fwd ( int  nCount,
int  nNumAxes,
long  hSrcStrides,
long  hDstStrides,
long  hOffsets,
long  hBottomData,
long  hTopData 

Performs the crop forward operation.

nCountSpecifies the count.
nNumAxesSpecifies the number of axes in the bottom.
hSrcStridesSpecifies a handle to the GPU memory containing the source strides.
hDstStridesSpecifies a handle to the GPU memory containing the destination strides.
hOffsetsSpecifies a handle to the GPU memory containing the offsets.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9812 of file CudaDnn.cs.

◆ debug()

void MyCaffe.common.CudaDnn< T >.debug ( )

The debug function is uses only during debugging the debug version of the low-level DLL.

Definition at line 10637 of file CudaDnn.cs.

◆ denan()

void MyCaffe.common.CudaDnn< T >.denan ( int  n,
long  hX,
double  dfReplacement 

Replaces all NAN values witin X with a replacement value.

nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
dfReplacementSpecifies the replacement value.

Definition at line 7963 of file CudaDnn.cs.

◆ DeriveBatchNormDesc()

void MyCaffe.common.CudaDnn< T >.DeriveBatchNormDesc ( long  hFwdScaleBiasMeanVarDesc,
long  hFwdBottomDesc,
long  hBwdScaleBiasMeanVarDesc,
long  hBwdBottomDesc,

Derive the batch norm descriptors for both the forward and backward passes.

hFwdScaleBiasMeanVarDescSpecifies a handle to the scale bias mean var tensor descriptor for the forward pass.
hFwdBottomDescSpecifies a handle to the forward bottom tensor descriptor.
hBwdScaleBiasMeanVarDescSpecifies a handle to the scale bias mean var tensor descriptor for the backward pass.
hBwdBottomDescSpecifies a handle to the backward bottom tensor descriptor.

Definition at line 4132 of file CudaDnn.cs.

◆ DeviceCanAccessPeer()

bool MyCaffe.common.CudaDnn< T >.DeviceCanAccessPeer ( int  nSrcDeviceID,
int  nPeerDeviceID 

Query whether or not two devices can access each other via peer-to-peer memory copies.

nSrcDeviceIDSpecifies the device id of the source.
nPeerDeviceIDSpecifies the device id of the peer to the source device.
is returned if the source device can access the peer device via peer-to-peer communcation,

Definition at line 2240 of file CudaDnn.cs.

◆ DeviceDisablePeerAccess()

void MyCaffe.common.CudaDnn< T >.DeviceDisablePeerAccess ( int  nPeerDeviceID)

Disables peer-to-peer access between the current device used by the CudaDnn instance and a peer device.

nPeerDeviceIDSpecifies the device id of the peer device.

Definition at line 2270 of file CudaDnn.cs.

◆ DeviceEnablePeerAccess()

void MyCaffe.common.CudaDnn< T >.DeviceEnablePeerAccess ( int  nPeerDeviceID)

Enables peer-to-peer access between the current device used by the CudaDnn instance and a peer device.

nPeerDeviceIDSpecifies the device id of the peer device.

Definition at line 2258 of file CudaDnn.cs.

◆ DisableGhostMemory()

void MyCaffe.common.CudaDnn< T >.DisableGhostMemory ( )

Disables the ghost memory, if enabled.

Definition at line 1775 of file CudaDnn.cs.

◆ Dispose() [1/2]

void MyCaffe.common.CudaDnn< T >.Dispose ( )

Disposes this instance freeing up all of its host and GPU memory.

Definition at line 1629 of file CudaDnn.cs.

◆ Dispose() [2/2]

virtual void MyCaffe.common.CudaDnn< T >.Dispose ( bool  bDisposing)

Disposes this instance freeing up all of its host and GPU memory.

bDisposingWhen true, specifies that the call is from a Dispose call.

Definition at line 1612 of file CudaDnn.cs.

◆ DistortImage()

void MyCaffe.common.CudaDnn< T >.DistortImage ( long  h,
int  nCount,
int  nNum,
int  nDim,
long  hX,
long  hY 

Distort an image using the ImageOp handle provided.

hSpecifies a handle to the ImageOp that defines how the image will be distorted.
nCountSpecifies the total number of data elements within 'hX' and 'hY'.
nNumSpecifies the number of items to be distorted (typically blob.num) in 'hX' and 'hY'.
nDimSpecifies the dimension of each item.
hXSpecifies a handle to the GPU memory containing the source data to be distorted.
hYSpecifies a handle to the GPU memory containing the destination of the distortion.

Definition at line 3188 of file CudaDnn.cs.

◆ div()

void MyCaffe.common.CudaDnn< T >.div ( int  n,
long  hA,
long  hB,
long  hY 

Divides each element of A by each element of B and places the result in Y.

Y = A / B (element by element)

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7420 of file CudaDnn.cs.

◆ divbsx()

void MyCaffe.common.CudaDnn< T >.divbsx ( int  n,
long  hA,
int  nAOff,
long  hX,
int  nXOff,
int  nC,
int  nSpatialDim,
bool  bTranspose,
long  hB,
int  nBOff 

Divide a matrix by a vector.

nSpecifies the number of items.
hASpecifies the matrix to divide.
nAOffSpecifies the offset to apply to the GPU memory of hA.
hXSpecifies the divisor vector.
nXOffSpecifies the offset to apply to the GPU memory of hX.
nCSpecifies the number of channels.
nSpatialDimSpecifies the spatial dimension.
bTransposeSpecifies whether or not to transpose the matrix.
hBSpecifies the output matrix.
nBOffSpecifies the offset to apply to the GPU memory of hB.

Definition at line 6671 of file CudaDnn.cs.

◆ DivisiveNormalizationBackward()

void MyCaffe.common.CudaDnn< T >.DivisiveNormalizationBackward ( long  hCuDnn,
long  hNormDesc,
long  hBottomDataDesc,
long  hBottomData,
long  hTopDiff,
long  hTemp1,
long  hTemp2,
long  hBottomDiffDesc,
long  hBottomDiff 

Performs a Devisive Normalization backward pass.

See What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures? by Jarrett, et al.

hCuDnnSpecifies a handle to the instance of cuDnn.
hNormDescSpecifies a handle to an LRN descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTemp1Temporary data in GPU memory.
hTemp2Temporary data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4433 of file CudaDnn.cs.

◆ DivisiveNormalizationForward()

void MyCaffe.common.CudaDnn< T >.DivisiveNormalizationForward ( long  hCuDnn,
long  hNormDesc,
long  hBottomDataDesc,
long  hBottomData,
long  hTemp1,
long  hTemp2,
long  hTopDataDesc,
long  hTopData 

Performs a Devisive Normalization forward pass.

See What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures? by Jarrett, et al.

hCuDnnSpecifies a handle to the instance of cuDnn.
hNormDescSpecifies a handle to an LRN descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTemp1Temporary data in GPU memory.
hTemp2Temporary data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4408 of file CudaDnn.cs.

◆ dot()

T MyCaffe.common.CudaDnn< T >.dot ( int  n,
long  hX,
long  hY,
int  nXOff = 0,
int  nYOff = 0 

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.
The dot product is returned as a type 'T'.

Definition at line 6847 of file CudaDnn.cs.

◆ dot_double()

double MyCaffe.common.CudaDnn< T >.dot_double ( int  n,
long  hX,
long  hY 

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
The dot product is returned as a type

Definition at line 6815 of file CudaDnn.cs.

◆ dot_float()

float MyCaffe.common.CudaDnn< T >.dot_float ( int  n,
long  hX,
long  hY 

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
The dot product is returned as a type

Definition at line 6830 of file CudaDnn.cs.

◆ dropout_bwd()

void MyCaffe.common.CudaDnn< T >.dropout_bwd ( int  nCount,
long  hTopDiff,
long  hMask,
uint  uiThreshold,
long  hBottomDiff 

Performs a dropout backward pass in Cuda.

See also
Improving neural networks by preventing co-adaptation of feature detectors by Hinton, et al., 2012
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.
uiThresholdSpecifies the threshold value: when mask value are less than the threshold, the data item is 'dropped out' by setting the data item to zero.
fScaleSpecifies a scale value applied to each item that is not dropped out.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9484 of file CudaDnn.cs.

◆ dropout_fwd()

void MyCaffe.common.CudaDnn< T >.dropout_fwd ( int  nCount,
long  hBottomData,
long  hMask,
uint  uiThreshold,
long  hTopData 

Performs a dropout forward pass in Cuda.

See also
Improving neural networks by preventing co-adaptation of feature detectors by Hinton, et al., 2012
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.
uiThresholdSpecifies the threshold value: when mask value are less than the threshold, the data item is 'dropped out' by setting the data item to zero.
fScaleSpecifies a scale value applied to each item that is not dropped out.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9464 of file CudaDnn.cs.

◆ DropoutBackward()

void MyCaffe.common.CudaDnn< T >.DropoutBackward ( long  hCuDnn,
long  hDropoutDesc,
long  hTopDesc,
long  hTop,
long  hBottomDesc,
long  hBottom,
long  hReserved 

Performs a dropout backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
hDropoutDescSpecifies a handle to the dropout descriptor.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopSpecifies a handle to the top data in GPU memory.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomSpecifies a handle to the bottom data in GPU memory.
hReservedSpecifies a handle to the reseved data in GPU memory.

Definition at line 4296 of file CudaDnn.cs.

◆ DropoutForward()

void MyCaffe.common.CudaDnn< T >.DropoutForward ( long  hCuDnn,
long  hDropoutDesc,
long  hBottomDesc,
long  hBottomData,
long  hTopDesc,
long  hTopData,
long  hReserved 

Performs a dropout forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
hDropoutDescSpecifies a handle to the dropout descriptor.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hReservedSpecifies a handle to the reseved data in GPU memory.

Definition at line 4278 of file CudaDnn.cs.

◆ elu_bwd()

void MyCaffe.common.CudaDnn< T >.elu_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomData,
long  hBottomDiff,
double  dfAlpha 

Performs a Exponential Linear Unit (ELU) backward pass in Cuda.

See also
Deep Residual Networks with Exponential Linear Unit by Shah, et al., 2016
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
dfAlphaSpecifies the alpha value.

Definition at line 9444 of file CudaDnn.cs.

◆ elu_fwd()

void MyCaffe.common.CudaDnn< T >.elu_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
double  dfAlpha 

Performs a Exponential Linear Unit (ELU) forward pass in Cuda.

Calculates $ f(x) = (x > 0) ? x : \alpha * (e^x - 1) $

See also
Deep Residual Networks with Exponential Linear Unit by Shah, et al., 2016
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
dfAlphaSpecifies the alpha value.

Definition at line 9424 of file CudaDnn.cs.

◆ EluBackward()

void MyCaffe.common.CudaDnn< T >.EluBackward ( long  hCuDnn,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
long  hBottomDiffDesc,
long  hBottomDiff 

Perform a Elu backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4513 of file CudaDnn.cs.

◆ EluForward()

void MyCaffe.common.CudaDnn< T >.EluForward ( long  hCuDnn,
long  hBottomDataDesc,
long  hBottomData,
long  hTopDataDesc,
long  hTopData 

Perform a Elu forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4491 of file CudaDnn.cs.

◆ embed_bwd()

void MyCaffe.common.CudaDnn< T >.embed_bwd ( int  nCount,
long  hBottomData,
long  hTopDiff,
int  nM,
int  nN,
int  nK,
long  hWeightDiff 

Performs the backward pass for embed

nCountSpecifies the number of items in the bottom data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hWeightDiffSpecifies a handle to the weight diff in GPU memory.

Definition at line 8781 of file CudaDnn.cs.

◆ embed_fwd()

void MyCaffe.common.CudaDnn< T >.embed_fwd ( int  nCount,
long  hBottomData,
long  hWeight,
int  nM,
int  nN,
int  nK,
long  hTopData 

Performs the forward pass for embed

nCountSpecifies the number of items in the bottom data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hWeightSpecifies a handle to the weight data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 8763 of file CudaDnn.cs.

◆ erf() [1/3]

double MyCaffe.common.CudaDnn< T >.erf ( double  dfVal)

Calculates the erf() function.

dfValSpecifies the input value.
The erf result is returned.

Definition at line 6986 of file CudaDnn.cs.

◆ erf() [2/3]

float MyCaffe.common.CudaDnn< T >.erf ( float  fVal)

Calculates the erf() function.

fValSpecifies the input value.
The erf result is returned.

Definition at line 6996 of file CudaDnn.cs.

◆ erf() [3/3]

T MyCaffe.common.CudaDnn< T >.erf ( fVal)

Calculates the erf() function.

fValSpecifies the input value.
The erf result is returned.

Definition at line 7006 of file CudaDnn.cs.

◆ exp() [1/2]

void MyCaffe.common.CudaDnn< T >.exp ( int  n,
long  hA,
long  hY 

Calculates the exponent value of A and places the result in Y.

$ f(x) = exp(x) $

nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7454 of file CudaDnn.cs.

◆ exp() [2/2]

void MyCaffe.common.CudaDnn< T >.exp ( int  n,
long  hA,
long  hY,
int  nAOff,
int  nYOff,
double  dfBeta 

Calculates the exponent value of A * beta and places the result in Y.

$ f(x) = exp(x * \beta) $

nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nAOffSpecifies an offset (in items, not bytes) into the memory of A.
nYOffSpecifies an offset (in items, not bytes) into the memory of Y.
dfBetaSpecifies the scalar as type

Definition at line 7471 of file CudaDnn.cs.

◆ fill()

void MyCaffe.common.CudaDnn< T >.fill ( int  n,
int  nDim,
long  hSrc,
int  nSrcOff,
int  nCount,
long  hDst 

Fill data from the source data 'n' times in the destination.

nSpecifies the number of times to copy the source data.
nDimSpecifies the number of source items to copy.
hSrcSpecifies a handle to the GPU memory of the source data.
nSrcOffSpecifies an offset into the GPU memory where the source data copy starts.
nCountSpecifies the total number of items in the destination. This value must be >= n * nDim.
hDstSpecifies the handle to the GPU memory where the data is to be copied.

Definition at line 6199 of file CudaDnn.cs.

◆ FreeConvolutionDesc()

void MyCaffe.common.CudaDnn< T >.FreeConvolutionDesc ( long  h)

Free a convolution descriptor instance.

hSpecifies the handle to the convolution descriptor instance.

Definition at line 3765 of file CudaDnn.cs.

◆ FreeCuDNN()

void MyCaffe.common.CudaDnn< T >.FreeCuDNN ( long  h)

Free an instance of cuDnn.

hSpecifies the handle to cuDnn.

Definition at line 3281 of file CudaDnn.cs.

◆ FreeDropoutDesc()

void MyCaffe.common.CudaDnn< T >.FreeDropoutDesc ( long  h)

Free a dropout descriptor instance.

hSpecifies the handle to the dropout descriptor instance.

Definition at line 4221 of file CudaDnn.cs.

◆ FreeExtension()

void MyCaffe.common.CudaDnn< T >.FreeExtension ( long  hExtension)

Free an instance of an Extension.

hExtensionSpecifies the handle to the Extension.

Definition at line 3474 of file CudaDnn.cs.

◆ FreeFilterDesc()

void MyCaffe.common.CudaDnn< T >.FreeFilterDesc ( long  h)

Free a filter descriptor instance.

hSpecifies the handle to the filter descriptor instance.

Definition at line 3686 of file CudaDnn.cs.

◆ FreeHostBuffer()

void MyCaffe.common.CudaDnn< T >.FreeHostBuffer ( long  hMem)

Free previously allocated host memory.

hMemSpecifies the handle to the host memory.

Definition at line 2602 of file CudaDnn.cs.

◆ FreeImageOp()

void MyCaffe.common.CudaDnn< T >.FreeImageOp ( long  h)

Free an image op, freeing up all GPU memory used.

hSpecifies the handle to the image op.

Definition at line 3171 of file CudaDnn.cs.

◆ FreeLayerNorm()

void MyCaffe.common.CudaDnn< T >.FreeLayerNorm ( long  hLayerNorm)

Free the instance of LayerNorm GPU support.

hLayerNormSpecifies the handle to the LayerNorm instance.

Definition at line 5846 of file CudaDnn.cs.

◆ FreeLRNDesc()

void MyCaffe.common.CudaDnn< T >.FreeLRNDesc ( long  h)

Free a LRN descriptor instance.

hSpecifies the handle to the LRN descriptor instance.

Definition at line 4326 of file CudaDnn.cs.

◆ FreeMemory()

void MyCaffe.common.CudaDnn< T >.FreeMemory ( long  hMem)

Free previously allocated GPU memory.

hMemSpecifies the handle to the GPU memory.

Definition at line 2517 of file CudaDnn.cs.

◆ FreeMemoryPointer()

void MyCaffe.common.CudaDnn< T >.FreeMemoryPointer ( long  hData)

Frees a memory pointer.

hDataSpecifies the handle to the memory pointer.

Definition at line 3046 of file CudaDnn.cs.

◆ FreeMemoryTest()

void MyCaffe.common.CudaDnn< T >.FreeMemoryTest ( long  h)

Free a memory test, freeing up all GPU memory used.

hSpecifies the handle to the memory test.

Definition at line 3095 of file CudaDnn.cs.

◆ FreeNCCL()

void MyCaffe.common.CudaDnn< T >.FreeNCCL ( long  hNccl)

Free an instance of NCCL.

hNcclSpecifies the handle to NCCL.

Definition at line 3355 of file CudaDnn.cs.

◆ FreePCA()

void MyCaffe.common.CudaDnn< T >.FreePCA ( long  hPCA)

Free the PCA instance associated with handle.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

hPCASpecifies a handle to the PCA instance to free.

Definition at line 5446 of file CudaDnn.cs.

◆ FreePoolingDesc()

void MyCaffe.common.CudaDnn< T >.FreePoolingDesc ( long  h)

Free a pooling descriptor instance.

hSpecifies the handle to the pooling descriptor instance.

Definition at line 4055 of file CudaDnn.cs.

◆ FreeRnn8()

void MyCaffe.common.CudaDnn< T >.FreeRnn8 ( long  h)

Free an existing RNN8.

hSpecifies the handle to the RNN8 created with CreateRnn8

Definition at line 5178 of file CudaDnn.cs.

◆ FreeRnnDataDesc()

void MyCaffe.common.CudaDnn< T >.FreeRnnDataDesc ( long  h)

Free an existing RNN Data descriptor.

hSpecifies the handle to the RNN Data descriptor created with CreateRnnDataDesc

Definition at line 4672 of file CudaDnn.cs.

◆ FreeRnnDesc()

void MyCaffe.common.CudaDnn< T >.FreeRnnDesc ( long  h)

Free an existing RNN descriptor.

hSpecifies the handle to the RNN descriptor created with CreateRnnDesc

Definition at line 4751 of file CudaDnn.cs.

◆ FreeSSD()

void MyCaffe.common.CudaDnn< T >.FreeSSD ( long  hSSD)

Free the instance of SSD GPU support.

hSSDSpecifies the handle to the SSD instance.

Definition at line 5637 of file CudaDnn.cs.

◆ FreeStream()

void MyCaffe.common.CudaDnn< T >.FreeStream ( long  h)

Free a stream.

hSpecifies the handle to the stream.

Definition at line 3227 of file CudaDnn.cs.

◆ FreeTensorDesc()

void MyCaffe.common.CudaDnn< T >.FreeTensorDesc ( long  h)

Free a tensor descriptor instance.

hSpecifies the handle to the tensor descriptor instance.

Definition at line 3536 of file CudaDnn.cs.

◆ gather_bwd()

void MyCaffe.common.CudaDnn< T >.gather_bwd ( int  nCount,
long  hTop,
long  hBottom,
int  nAxis,
int  nDim,
int  nDimAtAxis,
int  nM,
int  nN,
long  hIdx 

Performs a gather backward pass where data at specifies indexes along a given axis are copied to the output data.

nCountSpecifies the number of items.
hTopSpecifies the input data.
hBottomSpecifies the output data.
nAxisSpecifies the axis along which to copy.
nDimSpecifies the dimension of each item at each index.
nDimAtAxisSpecifies the dimension at the axis.
nMSpecifies the M dimension.
nNSpecifies the M dimension.
hIdxSpecifies the indexes of the data to gather.

Definition at line 10122 of file CudaDnn.cs.

◆ gather_fwd()

void MyCaffe.common.CudaDnn< T >.gather_fwd ( int  nCount,
long  hBottom,
long  hTop,
int  nAxis,
int  nDim,
int  nDimAtAxis,
int  nM,
int  nN,
long  hIdx 

Performs a gather forward pass where data at specifies indexes along a given axis are copied to the output data.

nCountSpecifies the number of items.
hBottomSpecifies the input data.
hTopSpecifies the output data.
nAxisSpecifies the axis along which to copy.
nDimSpecifies the dimension of each item at each index.
nDimAtAxisSpecifies the dimension at the axis.
nMSpecifies the M dimension.
nNSpecifies the M dimension.
hIdxSpecifies the indexes of the data to gather.

Definition at line 10102 of file CudaDnn.cs.

◆ gaussian_blur()

void MyCaffe.common.CudaDnn< T >.gaussian_blur ( int  n,
int  nChannels,
int  nHeight,
int  nWidth,
double  dfSigma,
long  hX,
long  hY 

The gaussian_blur runs a Gaussian blurring operation over each channel of the data using the sigma.

The gaussian blur operation runs a 3x3 patch, initialized with the gaussian distribution using the formula $ G(x, y) = \frac{1}{{2\pi\sigma^2 }}e^{{{ - \left( {x^2 - y^2 } \right) } \mathord{\left/ {\vphantom {{ - \left( {x^2 - y^2 } \right) } {2\sigma ^2 }}} \right. \kern-\nulldelimiterspace} {2\sigma ^2 }}} $

See also
Gaussian Blur on Wikipedia for more information.
nSpecifies the number of items in the memory of 'X'.
nChannelsSpecifies the number of channels (i.e. 3 for RGB, 1 for B/W).
nHeightSpecifies the height of each item.
nWidthSpecifies the width of each item.
dfSigmaSpecifies the sigma used in the gaussian blur.
hXSpecifies a handle to GPU memory containing the source data to blur.
hYSpecifies a handle to GPU memory where the blurred information is placed.

Definition at line 10980 of file CudaDnn.cs.

◆ geam() [1/3]

void MyCaffe.common.CudaDnn< T >.geam ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
double  fAlpha,
long  hA,
long  hB,
double  fBeta,
long  hC 

Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A, B and C.
nSpecifies the height (number of rows) of A, B and C.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
hASpecifies a handle to the data for A in GPU memory.
hBSpecifies a handle to the data for B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type
hCSpecifies a handle to the data for C in GPU memory.

Definition at line 6366 of file CudaDnn.cs.

◆ geam() [2/3]

void MyCaffe.common.CudaDnn< T >.geam ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
float  fAlpha,
long  hA,
long  hB,
float  fBeta,
long  hC 

Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A, B and C.
nSpecifies the height (number of rows) of A, B and C.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
hASpecifies a handle to the data for A in GPU memory.
hBSpecifies a handle to the data for B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type
hCSpecifies a handle to the data for C in GPU memory.

Definition at line 6386 of file CudaDnn.cs.

◆ geam() [3/3]

void MyCaffe.common.CudaDnn< T >.geam ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
long  hA,
long  hB,
long  hC,
int  nAOffset = 0,
int  nBOffset = 0,
int  nCOffset = 0 

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type 'T'.
hCSpecifies a handle to the data for matrix C in GPU memory.
nAOffsetSpecifies an offset (in items, not bytes) into the memory of A.
nBOffsetSpecifies an offset (in items, not bytes) into the memory of B.
nCOffsetSpecifies an offset (in items, not bytes) into the memory of C.

Definition at line 6409 of file CudaDnn.cs.

◆ gelu_bwd()

void MyCaffe.common.CudaDnn< T >.gelu_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData,
bool  bEnableBertVersion 

Performs a GELU backward pass in Cuda.

Computes the GELU gradient. When bEnableBertVersion=false (default) Computes the GELU non-linearity $ y = cdf + x * pdf $ where $ cdf = 0.5 * (1.0 + erf(x / sqrt(2.0))) $ $ pdf = 1.0 / sqrt(2.0 * PI) * exp(-0.5 * x^2) $

with $ y' = cdf + x * pdf $

See also
On the GELU Activation Function

When bEnableBertVersion=true, $ y' = 0.5 * tanh(0.797885 * (x + 0.044715 * x^3)) + (0.0535161 * x^3 + 0.398942 * x) * sech^2(0.797885 * (x + 0.044715 * x^3)) + 0.5 $ Note, see Wolfram Alpha with 'derivative of d/dx = 0.5 * x * (1.0 + tanh(sqrt(2.0/PI) * (x + 0.044715 * x^3)))'

See also
Github - Karpathy: NewGELU, line 21 by Karpathy, 2022.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.
bEnableBertVersionSpecifies to use the BERT version, or default version.

Definition at line 9098 of file CudaDnn.cs.

◆ gelu_fwd()

void MyCaffe.common.CudaDnn< T >.gelu_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
bool  bEnableBertVersion 

Performs a GELU forward pass in Cuda.

When bEnableBertVersion=false (default) Computes the GELU non-linearity $ y = cdf + x * pdf $ where $ cdf = 0.5 * (1.0 + erf(x / sqrt(2.0))) $ $ pdf = 1.0 / sqrt(2.0 * PI) * exp(-0.5 * x^2) $

with $ y' = cdf + x * pdf $

See also
On the GELU Activation Function

When bEnableBertVersion=True Computes the GELU non-linearity $ f(x) =y = 0.5 * (1.0 + tanh(sqrt(2.0/PI) * (x + 0.044715 * x^3))) $.

See also
Github - Karpathy: NewGELU, line 21 by Karpathy, 2022.
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
bEnableBertVersionSpecifies to use the BERT version or the default version.

Definition at line 9064 of file CudaDnn.cs.

◆ gemm() [1/5]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
double  fAlpha,
long  hA,
long  hB,
double  fBeta,
long  hC 

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
hASpecifies a handle to the data for A in GPU memory.
hBSpecifies a handle to the data for B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type
hCSpecifies a handle to the data for C in GPU memory.

Definition at line 6236 of file CudaDnn.cs.

◆ gemm() [2/5]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
double  fAlpha,
long  hA,
long  hB,
double  fBeta,
long  hC,
uint  lda,
uint  ldb,
uint  ldc 

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type 'T'.
hCSpecifies a handle to the data for matrix C in GPU memory.
ldaSpecifies the leading dimension of A.
ldbSpecifies the leading dimension of B.
ldcSpecifies the leading dimension of C.

Definition at line 6312 of file CudaDnn.cs.

◆ gemm() [3/5]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
double  fAlpha,
long  hA,
long  hB,
double  fBeta,
long  hC,
uint  lda,
uint  ldb,
uint  ldc,
uint  stridea,
uint  strideb,
uint  stridec,
uint  batch_count 

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type 'T'.
hCSpecifies a handle to the data for matrix C in GPU memory.
ldaSpecifies the leading dimension of A.
ldbSpecifies the leading dimension of B.
ldcSpecifies the leading dimension of C.
strideaSpecifies the stride of matrix A
stridebSpecifies the stride of matrix B
stridecSpecifies the stride of matrix C
batch_countSpecifies the number of matricies.

Definition at line 6343 of file CudaDnn.cs.

◆ gemm() [4/5]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
float  fAlpha,
long  hA,
long  hB,
float  fBeta,
long  hC 

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type
hCSpecifies a handle to the data for matrix C in GPU memory.

Definition at line 6257 of file CudaDnn.cs.

◆ gemm() [5/5]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
long  hA,
long  hB,
long  hC,
int  nAOffset = 0,
int  nBOffset = 0,
int  nCOffset = 0,
int  nGroups = 1,
int  nGroupOffsetA = 0,
int  nGroupOffsetB = 0,
int  nGroupOffsetC = 0 

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type 'T'.
hCSpecifies a handle to the data for matrix C in GPU memory.
nAOffsetSpecifies an offset (in items, not bytes) into the memory of A.
nBOffsetSpecifies an offset (in items, not bytes) into the memory of B.
nCOffsetSpecifies an offset (in items, not bytes) into the memory of C.
nGroupsOptionally, specifies the number of groups (default = 1).
nGroupOffsetAOptionally, specifies an offset multiplied by the current group 'g' and added to the AOffset (default = 0).
nGroupOffsetBOptionally, specifies an offset multiplied by the current group 'g' and added to the BOffset (default = 0).
nGroupOffsetCOptionally, specifies an offset multiplied by the current group 'g' and added to the COffset (default = 0).

Definition at line 6285 of file CudaDnn.cs.

◆ gemv() [1/3]

void MyCaffe.common.CudaDnn< T >.gemv ( bool  bTransA,
int  m,
int  n,
double  fAlpha,
long  hA,
long  hX,
double  fBeta,
long  hY 

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
mSpecifies the width (number of columns) of A.
nSpecifies the height (number of rows) of A.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
hASpecifies a handle to the data for matrix A in GPU memory.
hXSpecifies a handle to the data for vector x in GPU memory.
fBetaSpecifies a scalar multiplied by y where the scalar is of type
hYSpecifies a handle to the data for vectory y in GPU memory.

Definition at line 6431 of file CudaDnn.cs.

◆ gemv() [2/3]

void MyCaffe.common.CudaDnn< T >.gemv ( bool  bTransA,
int  m,
int  n,
float  fAlpha,
long  hA,
long  hX,
float  fBeta,
long  hY 

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
mSpecifies the width (number of columns) of A.
nSpecifies the height (number of rows) of A.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
hASpecifies a handle to the data for matrix A in GPU memory.
hXSpecifies a handle to the data for vector x in GPU memory.
fBetaSpecifies a scalar multiplied by y where the scalar is of type
hYSpecifies a handle to the data for vectory y in GPU memory.

Definition at line 6450 of file CudaDnn.cs.

◆ gemv() [3/3]

void MyCaffe.common.CudaDnn< T >.gemv ( bool  bTransA,
int  m,
int  n,
long  hA,
long  hX,
long  hY,
int  nAOffset = 0,
int  nXOffset = 0,
int  nYOffset = 0 

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

bTransASpecifies whether or not to transpose A.
mSpecifies the width (number of columns) of A.
nSpecifies the height (number of rows) of A.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hXSpecifies a handle to the data for vector X in GPU memory.
fBetaSpecifies a scalar multiplied by Y where the scalar is of type 'T'
hYSpecifies a handle to the data for vectory y in GPU memory.
nAOffsetSpecifies an offset (in items, not bytes) into the memory of A.
nXOffsetSpecifies an offset (in items, not bytes) into the memory of X.
nYOffsetSpecifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6472 of file CudaDnn.cs.

◆ ger() [1/3]

void MyCaffe.common.CudaDnn< T >.ger ( int  m,
int  n,
double  fAlpha,
long  hX,
long  hY,
long  hA 

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

mSpecifies the length of X and rows in A (m x n).
nSpecifies the length of Y and cols in A (m x n).
fAlphaSpecifies a scalar multiplied by y where the scalar is of type 'T'.
hXSpecifies a handle to the data for matrix X (m in length) in GPU memory.
hYSpecifies a handle to the data for vector Y (n in length) in GPU memory.
hASpecifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 6492 of file CudaDnn.cs.

◆ ger() [2/3]

void MyCaffe.common.CudaDnn< T >.ger ( int  m,
int  n,
float  fAlpha,
long  hX,
long  hY,
long  hA 

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

mSpecifies the length of X and rows in A (m x n).
nSpecifies the length of Y and cols in A (m x n).
fAlphaSpecifies a scalar multiplied by y where the scalar is of type 'T'.
hXSpecifies a handle to the data for matrix X (m in length) in GPU memory.
hYSpecifies a handle to the data for vector Y (n in length) in GPU memory.
hASpecifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 6509 of file CudaDnn.cs.

◆ ger() [3/3]

void MyCaffe.common.CudaDnn< T >.ger ( int  m,
int  n,
long  hX,
long  hY,
long  hA 

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

mSpecifies the length of X and rows in A (m x n).
nSpecifies the length of Y and cols in A (m x n).
fAlphaSpecifies a scalar multiplied by y where the scalar is of type 'T'.
hXSpecifies a handle to the data for matrix X (m in length) in GPU memory.
hYSpecifies a handle to the data for vector Y (n in length) in GPU memory.
hASpecifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 6526 of file CudaDnn.cs.

◆ get()

T[] MyCaffe.common.CudaDnn< T >.get ( int  nCount,
long  hHandle,
int  nIdx = -1 

Queries the GPU memory by copying it into an array of type 'T'.

nCountSpecifies the number of items.
hHandleSpecifies a handle to GPU memory.
nIdxWhen -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.
An array of
is returned.

Definition at line 5985 of file CudaDnn.cs.

◆ get_double()

double[] MyCaffe.common.CudaDnn< T >.get_double ( int  nCount,
long  hHandle,
int  nIdx = -1 

Queries the GPU memory by copying it into an array of


nCountSpecifies the number of items.
hHandleSpecifies a handle to GPU memory.
nIdxWhen -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.
An array of
is returned.

Definition at line 5961 of file CudaDnn.cs.

◆ get_float()

float[] MyCaffe.common.CudaDnn< T >.get_float ( int  nCount,
long  hHandle,
int  nIdx = -1 

Queries the GPU memory by copying it into an array of


nCountSpecifies the number of items.
hHandleSpecifies a handle to GPU memory.
nIdxWhen -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.
An array of
is returned.

Definition at line 5973 of file CudaDnn.cs.

◆ GetConvolutionInfo()

void MyCaffe.common.CudaDnn< T >.GetConvolutionInfo ( long  hCuDnn,
long  hBottomDesc,
long  hFilterDesc,
long  hConvDesc,
long  hTopDesc,
ulong  lWorkspaceSizeLimitInBytes,
bool  bUseTensorCores,
out CONV_FWD_ALGO  algoFwd,
out ulong  lWsSizeFwd,
out CONV_BWD_FILTER_ALGO  algoBwdFilter,
out ulong  lWsSizeBwdFilter,
out CONV_BWD_DATA_ALGO  algoBwdData,
out ulong  lWsSizeBwdData,

Queryies the algorithms and workspace sizes used for a given convolution descriptor.

hCuDnnSpecifies a handle to the instance of cuDnn.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hFilterDescSpecifies a handle to the filter descriptor.
hConvDescSpecifies a handle to the convolution descriptor.
hTopDescSpecifies a handle to the top tensor descriptor.
lWorkspaceSizeLimitInBytesSpecifies the workspace limits (in bytes).
bUseTensorCoresSpecifies whether or not to use tensor cores (this parameter must match the setting of the 'bUseTensorCores' specified in the 'SetConvolutionDesc' method.
algoFwdReturns the algorithm used for the convolution foward.
lWsSizeFwdReturns the workspace size (in bytes) for the convolution foward.
algoBwdFilterReturns the algorithm used for the backward filter.
lWsSizeBwdFilterReturns the workspace size (int bytes) for the backward filter.
algoBwdDataReturns the algorithm for the backward data.
lWsSizeBwdDataReturns the workspace (in bytes) for the backward data.
preferredFwdAlgoOptionally, specifies a preferred forward algo to attempt to use for forward convolution. The new algo is only used if the current device supports it.

Definition at line 3810 of file CudaDnn.cs.

◆ GetCudaDnnDllPath()

static string MyCaffe.common.CudaDnn< T >.GetCudaDnnDllPath ( )

Returns the path to the CudaDnnDll module to use for low level CUDA processing.

The CudaDnnDll path is returned.

Definition at line 1638 of file CudaDnn.cs.

◆ GetDeviceCount()

int MyCaffe.common.CudaDnn< T >.GetDeviceCount ( )

Query the number of devices (gpu's) installed.

The number of GPU's is returned.

Definition at line 2127 of file CudaDnn.cs.

◆ GetDeviceID()

int MyCaffe.common.CudaDnn< T >.GetDeviceID ( )

Returns the current device id set within Cuda.

The device id.

Definition at line 2013 of file CudaDnn.cs.

◆ GetDeviceInfo()

string MyCaffe.common.CudaDnn< T >.GetDeviceInfo ( int  nDeviceID,
bool  bVerbose = false 

Query the device information of a device.

nDeviceIDSpecifies the device id.
bVerboseWhen true, more detailed information is returned.

Definition at line 2064 of file CudaDnn.cs.

◆ GetDeviceMemory()

double MyCaffe.common.CudaDnn< T >.GetDeviceMemory ( out double  dfFree,
out double  dfUsed,
out bool  bCudaCallUsed,
int  nDeviceID = -1 

Queries the amount of total, free and used memory on a given GPU.

dfFreeSpecifies the amount of free memory in GB.
dfUsedSpecifies the amount of used memory in GB.
bCudaCallUsedSpecifies whether or not the used memory is an estimate calculated using the Low-Level Cuda DNN Dll handle table.
nDeviceIDSpecifies the specific device id to query, or if -1, uses calculates an estimate of the memory used using the current low-level Cuda DNN Dll handle table.
The device's total amount of memory in GB is returned.

Definition at line 2182 of file CudaDnn.cs.

◆ GetDeviceName()

string MyCaffe.common.CudaDnn< T >.GetDeviceName ( int  nDeviceID)

Query the name of a device.

nDeviceIDSpecifies the device id.
The name of the GPU at the device id is returned.

Definition at line 2035 of file CudaDnn.cs.

◆ GetDeviceP2PInfo()

string MyCaffe.common.CudaDnn< T >.GetDeviceP2PInfo ( int  nDeviceID)

Query the peer-to-peer information of a device.

nDeviceIDSpecifies the device id.
The peer-to-per information of the GPU at the device id is returned.

Definition at line 2049 of file CudaDnn.cs.

◆ GetDropoutInfo()

void MyCaffe.common.CudaDnn< T >.GetDropoutInfo ( long  hCuDnn,
long  hBottomDesc,
out ulong  ulStateCount,
out ulong  ulReservedCount 

Query the dropout state and reserved counts.

hCuDnnSpecifies a handle to the instance of cuDnn.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
ulStateCountReturns the state count.
ulReservedCountReturns the reserved count.

Definition at line 4252 of file CudaDnn.cs.

◆ GetHostBufferCapacity()

long MyCaffe.common.CudaDnn< T >.GetHostBufferCapacity ( long  hMem)

Returns the host memory capacity.

hMemSpecfies the host memory.
The current host memory capacity is returned.

Definition at line 2621 of file CudaDnn.cs.

◆ GetHostMemory()

T[] MyCaffe.common.CudaDnn< T >.GetHostMemory ( long  hMem)

Retrieves the host memory as an array of type 'T'

hMemSpecifies the handle to the host memory.
An array of type 'T' is returned.

Definition at line 2662 of file CudaDnn.cs.

◆ GetHostMemoryDouble()

double[] MyCaffe.common.CudaDnn< T >.GetHostMemoryDouble ( long  hMem)

Retrieves the host memory as an array of doubles.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the host memory.
An array of doubles is returned.

Definition at line 2641 of file CudaDnn.cs.

◆ GetHostMemoryFloat()

float[] MyCaffe.common.CudaDnn< T >.GetHostMemoryFloat ( long  hMem)

Retrieves the host memory as an array of floats.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the host memory.
An array of floats is returned.

Definition at line 2652 of file CudaDnn.cs.

◆ GetMemory()

T[] MyCaffe.common.CudaDnn< T >.GetMemory ( long  hMem,
long  lCount = -1 

Retrieves the GPU memory as an array of type 'T'

hMemSpecifies the handle to the GPU memory.
lCountOptionally, specifies a count of items to retrieve.
An array of type 'T' is returned.

Definition at line 2700 of file CudaDnn.cs.

◆ GetMemoryDouble()

double[] MyCaffe.common.CudaDnn< T >.GetMemoryDouble ( long  hMem,
long  lCount = -1 

Retrieves the GPU memory as an array of doubles.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the GPU memory.
lCountOptionally, specifies a count of items to retrieve.
An array of double is returned.

Definition at line 2677 of file CudaDnn.cs.

◆ GetMemoryFloat()

float[] MyCaffe.common.CudaDnn< T >.GetMemoryFloat ( long  hMem,
long  lCount = -1 

Retrieves the GPU memory as an array of float.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the GPU memory.
lCountOptionally, specifies a count of items to retrieve.
An array of float is returned.

Definition at line 2689 of file CudaDnn.cs.

◆ GetMultiGpuBoardGroupID()

int MyCaffe.common.CudaDnn< T >.GetMultiGpuBoardGroupID ( int  nDeviceID)

Query the mutli-gpu board group id for a device.

nDeviceIDSpecifies the device id.
The mutli-gpu board group id is returned.

Definition at line 2109 of file CudaDnn.cs.

◆ GetRequiredCompute()

string MyCaffe.common.CudaDnn< T >.GetRequiredCompute ( out int  nMinMajor,
out int  nMinMinor 

The GetRequiredCompute function returns the Major and Minor compute values required by the current CudaDNN DLL used.

nMinMajorSpecifies the minimum required major compute value.
nMinMinorSpecifies the minimum required minor compute value.

Together the Major.Minor compute values define the minimum required compute for the CudaDNN DLL used.

The path to the CudaDNN dll in use is returned.

Definition at line 2216 of file CudaDnn.cs.

◆ GetRnn8MemorySizes()

void MyCaffe.common.CudaDnn< T >.GetRnn8MemorySizes ( long  hCuDnn,
long  hRnn,
out ulong  szWtCount,
out ulong  szWorkSize,
out ulong  szReservedSize 

Returns the memory sizes required for the RNN8.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnSpecifies the handle to the RNN8 created with CreateRnn8.
szWtCountReturns the required weight count (in items).
szWorkSizeReturns the rquired work size (in bytes).
szReservedSizeReturns the required reserved size (in bytes).

Definition at line 5221 of file CudaDnn.cs.

◆ GetRnnLinLayerParams()

void MyCaffe.common.CudaDnn< T >.GetRnnLinLayerParams ( long  hCuDnn,
long  hRnnDesc,
int  nLayer,
long  hXDesc,
long  hWtDesc,
long  hWtData,
int  nLinLayer,
out int  nWtCount,
out long  hWt,
out int  nBiasCount,
out long  hBias 

Returns the linear layer parameters (weights).

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
nLayerSpecifies the current layer index.
hXDescSpecifies the input data elelement descriptor.
hWtDescSpecifies the weight descriptor.
hWtDataSpecifies the weight memory containing all weights.
nLinLayerSpecifies the linear layer index (e.g. LSTM has 8 linear layers, RNN has 2)
nWtCountReturns the number of weight items.
hWtReturns a handle to the weight GPU memory.
nBiasCountReturns the number of bias items.
hBiasReturns a handle to the bias GPU memory.

Definition at line 4837 of file CudaDnn.cs.

◆ GetRnnParamCount()

int MyCaffe.common.CudaDnn< T >.GetRnnParamCount ( long  hCuDnn,
long  hRnnDesc,
long  hXDesc 

Returns the RNN parameter count.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
hXDescSpecifies the handle to the first X descriptor.
The number of parameters (weights) is returned.

Definition at line 4785 of file CudaDnn.cs.

◆ GetRnnWorkspaceCount()

ulong MyCaffe.common.CudaDnn< T >.GetRnnWorkspaceCount ( long  hCuDnn,
long  hRnnDesc,
long  hXDesc,
out ulong  nReservedCount 

Returns the workspace and reserved counts.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
hXDescSpecifies a handle to the data descriptor created with CreateRnnDataDesc.
nReservedCountReturns the reserved count needed.
Returns the workspace count needed.

Definition at line 4807 of file CudaDnn.cs.

◆ hamming_distance()

double MyCaffe.common.CudaDnn< T >.hamming_distance ( int  n,
double  dfThreshold,
long  hA,
long  hB,
long  hY,
int  nOffA = 0,
int  nOffB = 0,
int  nOffY = 0 

The hamming_distance calculates the Hamming Distance between X and Y both of length n.

To calculate the hamming distance first, X and Y are bitified where each element is converted to 1 if > than the threshold, or 0 otherwise. Next, the bitified versions of X and Y are subtracted from one another, and the Asum of the result is returned, which is the number of bits that are different, thus the Hamming distance.

nSpecifies the number of elements to compare in both X and Y.
dfThresholdSpecifies the threshold used to 'bitify' both X and Y
hASpecifies the handle to the GPU memory containing the first vector to compare.
hBSpecifies the handle to the GPU memory containing the second vector to compare.
hYSpecifies the handle to the GPU memory where the hamming difference (bitified A - bitified B) is placed.
nOffAOptionally, specifies an offset into the GPU memory of A, the default is 0.
nOffBOptionally, specifies an offset into the GPU memory of B, the default is 0.
nOffYOptionally, specifies an offset into the GPU memory of Y, the default is 0.
The hamming distance is returned.

Definition at line 11005 of file CudaDnn.cs.

◆ im2col()

void MyCaffe.common.CudaDnn< T >.im2col ( long  hDataIm,
int  nDataImOffset,
int  nChannels,
int  nHeight,
int  nWidth,
int  nKernelH,
int  nKernelW,
int  nPadH,
int  nPadW,
int  nStrideH,
int  nStrideW,
int  nDilationH,
int  nDilationW,
long  hDataCol,
int  nDataColOffset 

Rearranges image blocks into columns.

hDataImSpecifies a handle to the image block in GPU memory.
nDataImOffsetSpecifies an offset into the image block memory.
nChannelsSpecifies the number of channels in the image.
nHeightSpecifies the height of the image.
nWidthSpecifies the width of the image.
nKernelHSpecifies the kernel height.
nKernelWSpecifies the kernel width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nDilationHSpecifies the dilation along the height.
nDilationWSpecifies the dilation along the width.
hDataColSpecifies a handle to the column data in GPU memory.
nDataColOffsetSpecifies an offset into the column memory.

Definition at line 7989 of file CudaDnn.cs.

◆ im2col_nd()

void MyCaffe.common.CudaDnn< T >.im2col_nd ( long  hDataIm,
int  nDataImOffset,
int  nNumSpatialAxes,
int  nImCount,
int  nChannelAxis,
long  hImShape,
long  hColShape,
long  hKernelShape,
long  hPad,
long  hStride,
long  hDilation,
long  hDataCol,
int  nDataColOffset 

Rearranges image blocks into columns.

hDataImSpecifies a handle to the image block in GPU memory.
nDataImOffsetSpecifies an offset into the image block memory.
nNumSpatialAxesSpecifies the number of spatial axes.
nImCountSpecifies the number of kernels.
nChannelAxisSpecifies the axis containing the channel.
hImShapeSpecifies a handle to the image shape data in GPU memory.
hColShapeSpecifies a handle to the column shape data in GPU memory.
hKernelShapeSpecifies a handle to the kernel shape data in GPU memory.
hPadSpecifies a handle to the pad data in GPU memory.
hStrideSpecifies a handle to the stride data in GPU memory.
hDilationSpecifies a handle to the dilation data in GPU memory.
hDataColSpecifies a handle to the column data in GPU memory.
nDataColOffsetSpecifies an offset into the column memory.

Definition at line 8013 of file CudaDnn.cs.

◆ InitializeRnn8Weights()

void MyCaffe.common.CudaDnn< T >.InitializeRnn8Weights ( long  hCuDnn,
long  hRnn,
long  hWt,
double  fWtVal,
double  fWtVal2,
double  fBiasVal,
double  fBiasVal2 

Initialize the RNN8 weights

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnSpecifies the handle to the RNN8 created with CreateRnn8.
hWtSpecifies the handle to the GPU data containing the weights to be initialized.
wtFtSpecifies the weight filler type.
fWtValSpecifies the weight filler value.
fWtVal2Specifies a secondary weight filler value.
biasFtSpecifies the bias filler type.
fBiasValSpecifies the bias filler value.
fBiasVal2Specifies a secondary bias filler value.

Definition at line 5251 of file CudaDnn.cs.

◆ interp2()

void MyCaffe.common.CudaDnn< T >.interp2 ( int  nChannels,
long  hData1,
int  nX1,
int  nY1,
int  nHeight1,
int  nWidth1,
int  nHeight1A,
int  nWidth1A,
long  hData2,
int  nX2,
int  nY2,
int  nHeight2,
int  nWidth2,
int  nHeight2A,
int  nWidth2A,
bool  bBwd = false 

Interpolates between two sizes within the spatial dimensions.

nChannelsSpecifies the channels (usually num * channels)
hData1Specifies the input data when bBwd=false and the output data when bBwd=true.
nX1Specifies the offset along the x axis for data1.
nY1Specifies the offset along the y axis for data1.
nHeight1Specifies the effective height for data1.
nWidth1Specifies the effective width for data1.
nHeight1ASpecifies the input height for data1.
nWidth1ASpecifies the input width for data1.
hData2Specifies the output data when bBwd=false and the input data when bBwd=true.
nX2Specifies the offset along the x axis for data2.
nY2Specifies the offset along the y axis for data2.
nHeight2Specifies the effective height for data2.
nWidth2Specifies the effective width for data2.
nHeight2ASpecifies the output height for data2.
nWidth2ASpecifies the output width for data2.
bBwdOptionally, specifies to perform the backward operation from data2 to data1, otherwise the operation performs on data1 to data2. (default = false).

Definition at line 7138 of file CudaDnn.cs.

◆ IsRnn8Supported()

bool MyCaffe.common.CudaDnn< T >.IsRnn8Supported ( )

Returns whether or not RNN8 is supported.

Definition at line 5142 of file CudaDnn.cs.

◆ KernelAdd()

void MyCaffe.common.CudaDnn< T >.KernelAdd ( int  nCount,
long  hA,
long  hDstKernel,
long  hB,
long  hC 

Add memory from one kernel to memory residing on another kernel.

nCountSpecifies the number of items within both A and B.
hASpecifies the handle to the memory A.
hDstKernelSpecifies the kernel where the memory B and the desitnation memory C reside.
hBSpecifies the handle to the memory B (for which A will be added).
hCSpecifies the destination data where A+B will be placed.

Definition at line 1848 of file CudaDnn.cs.

◆ KernelCopy()

void MyCaffe.common.CudaDnn< T >.KernelCopy ( int  nCount,
long  hSrc,
int  nSrcOffset,
long  hDstKernel,
long  hDst,
int  nDstOffset,
long  hHostBuffer,
long  hHostKernel = -1,
long  hStream = -1,
long  hSrcKernel = -1 

Copy memory from the look-up tables in one kernel to another.

nCountSpecifies the number of items to copy.
hSrcSpecifies the handle to the source memory.
nSrcOffsetSpecifies the offset (in items, not bytes) from which to start the copy in the source memory.
hDstKernelSpecifies the destination kernel holding the look-up table and memory where the data is to be copied.
hDstSpecifies the handle to the destination memory where the data is to be copied.
nDstOffsetSpecifies the offset (in items, not bytes) where the copy to to be placed within the destination data.
hHostBufferSpecifies the handle to the host buffer to be used when transfering the data from one kernel to another.
hHostKernelOptionally, specifies the handle to the kernel holding the look-up table for the host buffer.
hStreamOptionally, specifies the handle to the CUDA stream to use for the transfer.
hSrcKernelOptionally, specifies the handle to the source kernel.

Definition at line 1829 of file CudaDnn.cs.

◆ KernelCopyNccl()

long MyCaffe.common.CudaDnn< T >.KernelCopyNccl ( long  hSrcKernel,
long  hSrcNccl 

Copies an Nccl handle from one kernel to the current kernel of the current CudaDnn instance.

Nccl handles are created on the main Kernel, but when used must transferred to the destination kernel (running on a different thread) where the secondary Nccl handle is used.

hSrcKernelSpecifies the source kernel (typically where the Nccl handle was created).
hSrcNcclSpecifies the source Nccl handle to be copied.

Definition at line 1866 of file CudaDnn.cs.

◆ LayerNormBackward()

void MyCaffe.common.CudaDnn< T >.LayerNormBackward ( long  hLayerNorm,
long  hYdata,
long  hYdiff,
long  hXdiff 

Run the LayerNorm backward pass.

hLayerNormSpecifies the handle to the LayerNorm instance.
hYdataSpecifies the normalized output data.
hYdiffSpecifies the input diff to be un-normalized.
hXdiffSpecifies the un-normalized output diff.

Definition at line 5875 of file CudaDnn.cs.

◆ LayerNormForward()

void MyCaffe.common.CudaDnn< T >.LayerNormForward ( long  hLayerNorm,
long  hXdata,
long  hYdata 

Run the LayerNorm forward pass.

hLayerNormSpecifies the handle to the LayerNorm instance.
hXdataSpecifies the input data to be normalized.
hYdataSpecifies the normalized output data.

Definition at line 5860 of file CudaDnn.cs.

◆ lecun_bwd()

void MyCaffe.common.CudaDnn< T >.lecun_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData 

Performs the LeCun's Tanh function backward

Computes the LeCun non-linearity $ y = 1.7159 * tanh(2/3 * x) $ $ y' = 1.7159 * 2/3 * (1 - tanh(2/3 * x)^2) $

See also
Lecun's Tanh by PapersWithCode.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.

Definition at line 9225 of file CudaDnn.cs.

◆ lecun_fwd()

void MyCaffe.common.CudaDnn< T >.lecun_fwd ( int  nCount,
long  hBottomData,
long  hTopData 

Performs the LeCun's Tanh function forward

Computes the LeCun non-linearity $ y = 1.7159 * tanh(2/3 * x) $ $ y' = 1.7159 * 2/3 * (1 - tanh(2/3 * x)^2) $

See also
Lecun's Tanh by PapersWithCode.
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9203 of file CudaDnn.cs.

◆ log() [1/2]

void MyCaffe.common.CudaDnn< T >.log ( int  n,
long  hA,
long  hY 

Calculates the log value of A and places the result in Y.

$ f(x) = log(x) $

nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7488 of file CudaDnn.cs.

◆ log() [2/2]

void MyCaffe.common.CudaDnn< T >.log ( int  n,
long  hA,
long  hY,
double  dfBeta,
double  dfAlpha = 0 

Calculates the log value of (A * beta) + alpha, and places the result in Y.

$ f(x) = \ln((x * \beta) + \alpha) $

nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dfBetaSpecifies the scalar as type
that is multiplied with the log.
dfAlphaOptionally, specifies a scalar added to the value before taking the log.

Definition at line 7504 of file CudaDnn.cs.

◆ lrn_computediff()

void MyCaffe.common.CudaDnn< T >.lrn_computediff ( int  nCount,
long  hBottomData,
long  hTopData,
long  hScaleData,
long  hTopDiff,
int  nNum,
int  nChannels,
int  nHeight,
int  nWidth,
int  nSize,
long  hBottomDiff 

Computes the diff used to calculate the LRN cross channel backward pass in Cuda.

nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hScaleDataSpecifies a handle to the scale data in GPU memory.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nNumSpecifies the number of input items.
nChannelsSpecifies the number of channels per input item.
nHeightSpecifies the height of each input item.
nWidthSpecifies the width of each input item.
fNegativeBetaSpecifies the negative beta value.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 10184 of file CudaDnn.cs.

◆ lrn_computeoutput()

void MyCaffe.common.CudaDnn< T >.lrn_computeoutput ( int  nCount,
long  hBottomData,
long  hScaleData,
long  hTopData 

Computes the output used to calculate the LRN cross channel forward pass in Cuda.

nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
hScaleDataSpecifies a handle to the scale data in GPU memory.
fNegativeBetaSpecifies the negative beta value.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 10159 of file CudaDnn.cs.

◆ lrn_fillscale()

void MyCaffe.common.CudaDnn< T >.lrn_fillscale ( int  nCount,
long  hBottomData,
int  nNum,
int  nChannels,
int  nHeight,
int  nWidth,
int  nSize,
long  hScaleData 

Performs the fill scale operation used to calculate the LRN cross channel forward pass in Cuda.

nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
nNumSpecifies the number of input items.
nChannelsSpecifies the number of channels per input item.
nHeightSpecifies the height of each input item.
nWidthSpecifies the width of each input item.
fAlphaOverSizeSpecifies the alpha value over the size.
fKSpecifies the k value.
hScaleDataSpecifies a handle to the scale data in GPU memory.

Definition at line 10143 of file CudaDnn.cs.

◆ LRNCrossChannelBackward()

void MyCaffe.common.CudaDnn< T >.LRNCrossChannelBackward ( long  hCuDnn,
long  hNormDesc,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
long  hBottomDiffDesc,
long  hBottomDiff 

Perform LRN cross channel backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
hNormDescSpecifies a handle to an LRN descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4384 of file CudaDnn.cs.

◆ LRNCrossChannelForward()

void MyCaffe.common.CudaDnn< T >.LRNCrossChannelForward ( long  hCuDnn,
long  hNormDesc,
long  hBottomDesc,
long  hBottomData,
long  hTopDesc,
long  hTopData 

Perform LRN cross channel forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
hNormDescSpecifies a handle to an LRN descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4361 of file CudaDnn.cs.

◆ lstm_bwd()

void MyCaffe.common.CudaDnn< T >.lstm_bwd ( int  t,
int  nN,
int  nH,
int  nI,
double  dfClippingThreshold,
long  hWeight_h,
long  hClipData,
int  nClipOffset,
long  hTopDiff,
int  nTopOffset,
long  hCellData,
long  hCellDiff,
int  nCellOffset,
long  hPreGateDiff,
int  nPreGateOffset,
long  hGateData,
long  hGateDiff,
int  nGateOffset,
long  hCT1Data,
int  nCT1Offset,
long  hDHT1Diff,
int  nDHT1Offset,
long  hDCT1Diff,
int  nDCT1Offset,
long  hHtoHData,
long  hContextDiff = 0,
long  hWeight_c = 0 

Peforms the simple LSTM backward pass in Cuda.

See LSTM with Working Memory by Pulver, et al., 2016

tSpecifies the step within the sequence.
nNSpecifies the batch size.
nHSpecifies the number of hidden units.
nISpecifies the number the input size.
hWeight_hSpecifies a handle to the GPU memory holding the 'h' weights.
hClipDataSpecifies a handle to the GPU memory holding the clip data.
nClipOffsetSpecifies the clip offset for this step within the sequence.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top diff memory.
hCellDataSpecifies a handle to the GPU memory holding the 'c_t' data.
hCellDiffSpecifies a handle to the GPU memory holding the 'c_t' gradients.
nCellOffsetSpecifies the c_t offset for this step within the sequence.
hPreGateDiffSpecifies a handle to the GPU memory holding the pre-gate gradients.
nPreGateOffsetSpecifies the pre-gate offset for this step within the sequence.
hGateDataSpecifies a handle to the GPU memory holding the gate data.
hGateDiffSpecifies a handle to the GPU memory holding the gate gradients.
nGateOffsetSpecifies the gate data offset for this step within the sequence.
hCT1DataSpecifies a handle to the GPU memory holding the CT1 data.
nCT1OffsetSpecifies the CT1 offset for this step within the sequence.
hDHT1DiffSpecifies a handle to the GPU DHT1 gradients.
nDHT1OffsetSpecifies the DHT1 offset for this step within the sequence.
hDCT1DiffSpecifies a handle to the DCT1 gradients.
nDCT1OffsetSpecifies the DCT1 offset for this step within the sequence.
hHtoHDataSpecifies a handle to the GPU memory holding the H to H data.
hContextDiffOptionally, specifies the handle to the GPU memory holding the context diff, or 0 when not used.
hWeight_cOptionally, specifies the handle to the GPU memory holding the 'c' weights, or 0 when not used.

Definition at line 10413 of file CudaDnn.cs.

◆ lstm_fwd()

void MyCaffe.common.CudaDnn< T >.lstm_fwd ( int  t,
int  nN,
int  nH,
int  nI,
long  hWeight_h,
long  hWeight_i,
long  hClipData,
int  nClipOffset,
long  hTopData,
int  nTopOffset,
long  hCellData,
int  nCellOffset,
long  hPreGateData,
int  nPreGateOffset,
long  hGateData,
int  nGateOffset,
long  hHT1Data,
int  nHT1Offset,
long  hCT1Data,
int  nCT1Offset,
long  hHtoGateData,
long  hContext = 0,
long  hWeight_c = 0,
long  hCtoGetData = 0 

Peforms the simple LSTM foward pass in Cuda.

See LSTM with Working Memory by Pulver, et al., 2016

tSpecifies the step within the sequence.
nNSpecifies the batch size.
nHSpecifies the number of hidden units.
nISpecifies the number the input size.
hWeight_hSpecifies a handle to the GPU memory holding the 'h' weights.
hWeight_iSpecifies a handle to the GPU memory holding the 'i' weights.
hClipDataSpecifies a handle to the GPU memory holding the clip data.
nClipOffsetSpecifies the clip offset for this step within the sequence.
hTopDataSpecifies a handle to the top data in GPU memory.
nTopOffsetSpecifies an offset into the top data memory.
hCellDataSpecifies a handle to the GPU memory holding the 'c_t' data.
nCellOffsetSpecifies the c_t offset for this step within the sequence.
hPreGateDataSpecifies a handle to the GPU memory holding the pre-gate data.
nPreGateOffsetSpecifies the pre-gate offset for this step within the sequence.
hGateDataSpecifies a handle to the GPU memory holding the gate data.
nGateOffsetSpecifies the gate data offset for this step within the sequence.
hHT1DataSpecifies a handle to the GPU memory holding the HT1 data.
nHT1OffsetSpecifies the HT1 offset for this step within the sequence.
hCT1DataSpecifies a handle to the GPU memory holding the CT1 data.
nCT1OffsetSpecifies the CT1 offset for this step within the sequence.
hHtoGateDataSpecifies a handle to the GPU memory holding the H to Gate data.
hContextOptionally, specifies the attention context, or 0 when not used.
hWeight_cOptionally, specifies the attention context weights, or 0 when not used.
hCtoGetDataOptionally, specifies the attention context to gate data, or 0 when not used.

Definition at line 10372 of file CudaDnn.cs.

◆ lstm_unit_bwd()

void MyCaffe.common.CudaDnn< T >.lstm_unit_bwd ( int  nCount,
int  nHiddenDim,
int  nXCount,
long  hC_prev,
long  hX_acts,
long  hC,
long  hH,
long  hCont,
long  hC_diff,
long  hH_diff,
long  hC_prev_diff,
long  hX_acts_diff,
long  hX_diff 

Peforms the simple LSTM backward pass in Cuda for a given LSTM unit.

See LSTM with Working Memory by Pulver, et al., 2016

hC_prev_diffNEEDS REVIEW
hX_acts_diffNEEDS REVIEW

Definition at line 10463 of file CudaDnn.cs.

◆ lstm_unit_fwd()

void MyCaffe.common.CudaDnn< T >.lstm_unit_fwd ( int  nCount,
int  nHiddenDim,
int  nXCount,
long  hX,
long  hX_acts,
long  hC_prev,
long  hCont,
long  hC,
long  hH 

Peforms the simple LSTM foward pass in Cuda for a given LSTM unit.

See LSTM with Working Memory by Pulver, et al., 2016


Definition at line 10436 of file CudaDnn.cs.

◆ mask() [1/3]

void MyCaffe.common.CudaDnn< T >.mask ( int  n,
int  nMaskDim,
double  fSearch,
double  fReplace,
long  hX,
long  hMask,
long  hY 

Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

nSpecifies the number of items.
nMaskDimSpecifies the number of items in the mask.
fSearchSpecifies the value within the mask to replace.
fReplaceSpecifies the replacement value.
hXSpecifies a handle to the GPU memory of the source.
hMaskSpecifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hYSpecifies a handle to the GPU memory of the destination.

Definition at line 7048 of file CudaDnn.cs.

◆ mask() [2/3]

void MyCaffe.common.CudaDnn< T >.mask ( int  n,
int  nMaskDim,
float  fSearch,
float  fReplace,
long  hX,
long  hMask,
long  hY 

Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

nSpecifies the number of items.
nMaskDimSpecifies the number of items in the mask.
fSearchSpecifies the value within the mask to replace.
fReplaceSpecifies the replacement value.
hXSpecifies a handle to the GPU memory of the source.
hMaskSpecifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hYSpecifies a handle to the GPU memory of the destination.

Definition at line 7063 of file CudaDnn.cs.

◆ mask() [3/3]

void MyCaffe.common.CudaDnn< T >.mask ( int  n,
int  nMaskDim,
long  hX,
long  hMask,
long  hY 

Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

nSpecifies the number of items.
nMaskDimSpecifies the number of items in the mask.
fSearchSpecifies the value within the mask to replace.
fReplaceSpecifies the replacement value.
hXSpecifies a handle to the GPU memory of the source.
hMaskSpecifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hYSpecifies a handle to the GPU memory of the destination.

Definition at line 7030 of file CudaDnn.cs.

◆ mask_batch() [1/3]

void MyCaffe.common.CudaDnn< T >.mask_batch ( int  n,
int  nBatch,
int  nMaskDim,
double  fSearch,
double  fReplace,
long  hX,
long  hMask,
long  hY 

Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

nSpecifies the number of items.
nBatchSpecifies the batch size.
nMaskDimSpecifies the number of items in the mask.
fSearchSpecifies the value within the mask to replace.
fReplaceSpecifies the replacement value.
hXSpecifies a handle to the GPU memory of the source.
hMaskSpecifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hYSpecifies a handle to the GPU memory of the destination.

Definition at line 7098 of file CudaDnn.cs.

◆ mask_batch() [2/3]

void MyCaffe.common.CudaDnn< T >.mask_batch ( int  n,
int  nBatch,
int  nMaskDim,
float  fSearch,
float  fReplace,
long  hX,
long  hMask,
long  hY 

Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

nSpecifies the number of items.
nBatchSpecifies the batch size.
nMaskDimSpecifies the number of items in the mask.
fSearchSpecifies the value within the mask to replace.
fReplaceSpecifies the replacement value.
hXSpecifies a handle to the GPU memory of the source.
hMaskSpecifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hYSpecifies a handle to the GPU memory of the destination.

Definition at line 7114 of file CudaDnn.cs.

◆ mask_batch() [3/3]

void MyCaffe.common.CudaDnn< T >.mask_batch ( int  n,
int  nBatch,
int  nMaskDim,
long  hX,
long  hMask,
long  hY 

Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

nSpecifies the number of items.
nBatchSpecifies the batch size.
nMaskDimSpecifies the number of items in the mask.
fSearchSpecifies the value within the mask to replace.
fReplaceSpecifies the replacement value.
hXSpecifies a handle to the GPU memory of the source.
hMaskSpecifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hYSpecifies a handle to the GPU memory of the destination.

Definition at line 7079 of file CudaDnn.cs.

◆ math_bwd()

void MyCaffe.common.CudaDnn< T >.math_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData,
MATH_FUNCTION  function 

Performs a Math function backward pass in Cuda.

nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.
functionSpecifies the mathematical function to use.

Definition at line 8966 of file CudaDnn.cs.

◆ math_fwd()

void MyCaffe.common.CudaDnn< T >.math_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
MATH_FUNCTION  function 

Performs a Math function forward pass in Cuda.

Calculation $ Y[i] = function(X[i]) $

nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
functionSpecifies the mathematical function to use.

Definition at line 8949 of file CudaDnn.cs.

◆ matmul()

void MyCaffe.common.CudaDnn< T >.matmul ( uint  nOuterCount,
int  m,
int  n,
int  k,
long  hA,
long  hB,
long  hC,
double  dfScale = 1.0,
bool  bTransA = false,
bool  bTransB = false 

Perform matmul operation hC = matmul(hA, hB), where hA, hB and hC are all in row-major format.

nOuterCountSpecifies the outer count (e.g. batch * channels)
mSpecifies the
hASpecifies the handle to GPU memory holding the mxk matrix A (in row-major format)
hBSpecifies the handle to GPU memory holding the kxn matrix B (in row-major format)
hCSpecifies the handle to GPU memory holding the mxn matrix C (in row-major format) where the result is placed.
dfScaleSpecifies the scale value applied to matrix B in hB (default = 1.0)
bTransASpecifies to transpose matrix A (default = false).
bTransBSpecifies to transpose matrix B (default = false).
See also
How to transpose a matrix in CUDA/cublas

Definition at line 6695 of file CudaDnn.cs.

◆ matrix_meancenter_by_column()

void MyCaffe.common.CudaDnn< T >.matrix_meancenter_by_column ( int  nWidth,
int  nHeight,
long  hA,
long  hB,
long  hY,
bool  bNormalize = false 

Mean center the data by columns, where each column is summed and then subtracted from each column value.

nWidthNumber of columns in the matrix (dimension D)
nHeightNumber of rows in the matrix (dimension N)
hAInput data matrix - N x D matrix (N rows, D columns)
hBColumn sums vector - D x 1 vector containing the sum of each column.
hYOutput data matrix - N x D matrix (N rows, D columns) containing mean centering of the input data matrix.
bNormalizeWhen true, each data item is divided by N to normalize each row item by column.

Definition at line 10725 of file CudaDnn.cs.

◆ max() [1/2]

void MyCaffe.common.CudaDnn< T >.max ( int  n,
long  hA,
long  hB,
long  hY 

Calculates the max of A and B and places the result in Y. This max is only computed on a per item basis, so the shape of Y = the shape of A and B and Y(0) contains the max of A(0) and B(0), etc.

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7669 of file CudaDnn.cs.

◆ max() [2/2]

double MyCaffe.common.CudaDnn< T >.max ( int  n,
long  hA,
out long  lPos,
int  nAOff = 0,
long  hWork = 0 

Finds the maximum value of A.

This function uses NVIDIA's Thrust.

nSpecifies the number of items (not bytes) in the vectors A.
hASpecifies a handle to the vector A in GPU memory.
lPosReturns the position of the maximum value.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A (default = 0).
hWorkOptionally, specifies the handle to GPU memory in the size of A, which when specified is used in the extended version of max val. The extended version does not use thrust, and does not calculate 'lPos', which is always returned as -1 when using the extended version. (default = 0, use non extended version)
The maximum value is returned as type

Definition at line 7724 of file CudaDnn.cs.

◆ max_bwd() [1/2]

void MyCaffe.common.CudaDnn< T >.max_bwd ( int  n,
long  hAdata,
long  hBdata,
long  hYdiff,
long  hAdiff,
long  hBdiff 

Propagates the Y diff back to the max of A or B and places the result in A if its data has the max, or B if its data has the max.

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hAdataSpecifies a handle to the data vector A in GPU memory.
hBdataSpecifies a handle to the data vector B in GPU memory.
hYdiffSpecifies a handle to the diff vector Y in GPU memory.
hAdiffSpecifies a handle to the mutable diff vector A in GPU memory.
hBdiffSpecifies a handle to the mutable diff vector B in GPU memory.

Definition at line 7686 of file CudaDnn.cs.

◆ max_bwd() [2/2]

void MyCaffe.common.CudaDnn< T >.max_bwd ( int  nCount,
long  hTopDiff,
int  nIdx,
long  hMask,
long  hBottomDiff 

Performs a max backward pass in Cuda.

nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nIdxSpecifies the blob index used to test the mask.
hMaskSpecifies a handle to the mask data in GPU.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9758 of file CudaDnn.cs.

◆ max_fwd()

void MyCaffe.common.CudaDnn< T >.max_fwd ( int  nCount,
long  hBottomDataA,
long  hBottomDataB,
int  nIdx,
long  hTopData,
long  hMask 

Performs a max forward pass in Cuda.

Calculation: $ Y[i] = max(A[i], B[i]) $

nCountSpecifies the number of items.
hBottomDataASpecifies a handle to the Bottom A data in GPU memory.
hBottomDataBSpecifies a handle to the Bottom B data in GPU memory.
nIdxSpecifies the blob index used to set the mask.
hTopDataSpecifies a handle to the Top data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU.

Definition at line 9742 of file CudaDnn.cs.

◆ mean_error_loss_bwd()

void MyCaffe.common.CudaDnn< T >.mean_error_loss_bwd ( int  nCount,
long  hPredicted,
long  hTarget,
long  hBottomDiff,

Performs a Mean Error Loss backward pass in Cuda.

The gradient is set to: +1 when predicted greater than target, -1 when predicted less than target, 0 when predicted equal to target. if propagate_down[1] == true.

See also
Mean Absolute Error (MAE) derivative
nCountSpecifies the number of items.
hPredictedSpecifies a handle to the predicted data in GPU memory.
hTargetSpecifies a handle to the target data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
merrSpecifies the type of mean error to run.

Definition at line 8991 of file CudaDnn.cs.

◆ min() [1/2]

void MyCaffe.common.CudaDnn< T >.min ( int  n,
long  hA,
long  hB,
long  hY 

Calculates the min of A and B and places the result in Y. This min is only computed on a per item basis, so the shape of Y = the shape of A and B and Y(0) contains the min of A(0) and B(0), etc.

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7702 of file CudaDnn.cs.

◆ min() [2/2]

double MyCaffe.common.CudaDnn< T >.min ( int  n,
long  hA,
out long  lPos,
int  nAOff = 0,
long  hWork = 0 

Finds the minimum value of A.

This function uses NVIDIA's Thrust.

nSpecifies the number of items (not bytes) in the vectors A.
hASpecifies a handle to the vector A in GPU memory.
lPosReturns the position of the minimum value.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A (default = 0).
hWorkOptionally, specifies the handle to GPU memory in the size of A, which when specified is used in the extended version of max val. The extended version does not use thrust, and does not calculate 'lPos', which is always returned as -1 when using the extended version. (default = 0, use non extended version)
The minimum value is returned as type

Definition at line 7772 of file CudaDnn.cs.

◆ min_bwd()

void MyCaffe.common.CudaDnn< T >.min_bwd ( int  nCount,
long  hTopDiff,
int  nIdx,
long  hMask,
long  hBottomDiff 

Performs a min backward pass in Cuda.

nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nIdxSpecifies the blob index used to test the mask.
hMaskSpecifies a handle to the mask data in GPU.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9794 of file CudaDnn.cs.

◆ min_fwd()

void MyCaffe.common.CudaDnn< T >.min_fwd ( int  nCount,
long  hBottomDataA,
long  hBottomDataB,
int  nIdx,
long  hTopData,
long  hMask 

Performs a min forward pass in Cuda.

Calculation: $ Y[i] = min(A[i], B[i]) $

nCountSpecifies the number of items.
hBottomDataASpecifies a handle to the Bottom A data in GPU memory.
hBottomDataBSpecifies a handle to the Bottom B data in GPU memory.
nIdxSpecifies the blob index used to set the mask.
hTopDataSpecifies a handle to the Top data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU.

Definition at line 9778 of file CudaDnn.cs.

◆ minmax() [1/2]

Tuple< double, double, double, double > MyCaffe.common.CudaDnn< T >.minmax ( int  n,
long  hA,
long  hWork1,
long  hWork2,
bool  bDetectNans = false,
int  nAOff = 0 

Finds the minimum and maximum values within A.

nSpecifies the number of items (not bytes) in the vector A.
hASpecifies a handle to the vector A in GPU memory.
hWork1Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
hWork2Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
bDetectNansOptionally, specifies whether or not to detect Nans.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A.
A four element tuple is returned where the first item contains the minimum, the second item contains the maximum, the third contains the number of NaN values and the fourth contains the number of Infinity values.
When calling this function with
hA = 0
the function instead returns the required size of hWork1, hWork2, 0, 0 (in items, not bytes).

Definition at line 7818 of file CudaDnn.cs.

◆ minmax() [2/2]

void MyCaffe.common.CudaDnn< T >.minmax ( int  n,
long  hA,
long  hWork1,
long  hWork2,
int  nK,
long  hMin,
long  hMax,
bool  bNonZeroOnly 

Finds up to 'nK' minimum and maximum values within A.

nSpecifies the number of items (not bytes) in the vector A.
hASpecifies a handle to the vector A in GPU memory.
hWork1Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
hWork2Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
nKSpecifies the number of min and max values to find.
hMinSpecifies a handle to host memory allocated with AllocHostBuffer in the length 'nK' where the min values are placed.
hMaxSpecifies a handle to host memory allocated with AllocHostBuffer in the length 'nK' where the min values are placed.
bNonZeroOnlySpecifies whether or not to exclude zero from the min and max calculations.

Definition at line 7843 of file CudaDnn.cs.

◆ mish_bwd()

void MyCaffe.common.CudaDnn< T >.mish_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData,
double  dfThreshold,
int  nMethod = 0 

Performs a Mish backward pass in Cuda.

Computes the mish gradient $ f(x)' = \frac{ exp(x) * (4*e^x * x + 4*x + 6*e^x + 4*e^2x + e^3x + 4) }{ (2*e^x + e^2x + 2)^2 } $ Note, see Wolfram Alpha with 'derivative of x * tanh(ln(1 + e^x))'

See also
Mish: A Self Regularized Non-Monotonic Neural Activation Function by Diganta Misra, 2019.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.
dfThresholdSpecifies the threshold value.
nMethodOptionally, specifies to run the new implementation when > 0.

Definition at line 9035 of file CudaDnn.cs.

◆ mish_fwd()

void MyCaffe.common.CudaDnn< T >.mish_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
double  dfThreshold 

Performs a Mish forward pass in Cuda.

Computes the mish non-linearity $ f(x) = x * tanh(ln( 1 + e^x )) $.

See also
Mish: A Self Regularized Non-Monotonic Neural Activation Function by Diganta Misra, 2019.
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
dfThresholdSpecifies the threshold value.

Definition at line 9011 of file CudaDnn.cs.

◆ mul()

void MyCaffe.common.CudaDnn< T >.mul ( int  n,
long  hA,
long  hB,
long  hY,
int  nAOff = 0,
int  nBOff = 0,
int  nYOff = 0 

Multiplies each element of A with each element of B and places the result in Y.

Y = A * B (element by element)

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A.
nBOffOptionally, specifies an offset (in items, not bytes) into the memory of B.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 7334 of file CudaDnn.cs.

◆ mul_scalar() [1/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar ( int  n,
double  fAlpha,
long  hY 

Mutlipy each element of Y by a scalar.

Y = Y * alpha

nSpecifies the number of items (not bytes) in the vectors Y.
fAlphaSpecifies the scalar in type
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7374 of file CudaDnn.cs.

◆ mul_scalar() [2/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar ( int  n,
float  fAlpha,
long  hY 

Mutlipy each element of Y by a scalar.

Y = Y * alpha

nSpecifies the number of items (not bytes) in the vectors Y.
fAlphaSpecifies the scalar in type
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7388 of file CudaDnn.cs.

◆ mul_scalar() [3/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar ( int  n,
long  hY 

Mutlipy each element of Y by a scalar.

Y = Y * alpha

nSpecifies the number of items (not bytes) in the vectors Y.
fAlphaSpecifies the scalar in type 'T'.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7402 of file CudaDnn.cs.

◆ mulbsx()

void MyCaffe.common.CudaDnn< T >.mulbsx ( int  n,
long  hA,
int  nAOff,
long  hX,
int  nXOff,
int  nC,
int  nSpatialDim,
bool  bTranspose,
long  hB,
int  nBOff 

Multiply a matrix with a vector.

nSpecifies the number of items.
hASpecifies the matrix to multiply.
nAOffSpecifies the offset to apply to the GPU memory of hA.
hXSpecifies the vector to multiply.
nXOffSpecifies the offset to apply to the GPU memory of hX.
nCSpecifies the number of channels.
nSpatialDimSpecifies the spatial dimension.
bTransposeSpecifies whether or not to transpose the matrix.
hBSpecifies the output matrix.
nBOffSpecifies the offset to apply to the GPU memory of hB.

Definition at line 6650 of file CudaDnn.cs.

◆ NcclAllReduce()

void MyCaffe.common.CudaDnn< T >.NcclAllReduce ( long  hNccl,
long  hStream,
long  hX,
int  nCount,
double  dfScale = 1.0 

Performs a reduction on all NCCL instances as specified by the reduction operation.

See Fast Multi-GPU collectives with NCCL.

hNcclSpecifies a handle to an NCCL instance.
hStreamSpecifies a handle to the stream to use for synchronization.
hXSpecifies a handle to the GPU data to reduce with the other instances of NCCL.
nCountSpecifies the number of items (not bytes) in the data.
opSpecifies the reduction operation to perform.
dfScaleOptionally, specifies a scaling to be applied to the final reduction.

Definition at line 3442 of file CudaDnn.cs.

◆ NcclBroadcast()

void MyCaffe.common.CudaDnn< T >.NcclBroadcast ( long  hNccl,
long  hStream,
long  hX,
int  nCount 

Broadcasts a block of GPU data to all NCCL instances.

See Fast Multi-GPU collectives with NCCL.

hNcclSpecifies a handle to an NCCL instance.
hStreamSpecifies a handle to the stream to use for synchronization.
hXSpecifies a handle to the GPU data to be broadcasted (or recieved).
nCountSpecifies the number of items (not bytes) in the data.

Definition at line 3421 of file CudaDnn.cs.

◆ NcclInitializeMultiProcess()

void MyCaffe.common.CudaDnn< T >.NcclInitializeMultiProcess ( long  hNccl)

Initializes a set of NCCL instances for use in different processes.

See Fast Multi-GPU collectives with NCCL.

hNcclSpecifies the handle of NCCL to initialize.

Definition at line 3403 of file CudaDnn.cs.

◆ NcclInitializeSingleProcess()

void MyCaffe.common.CudaDnn< T >.NcclInitializeSingleProcess ( params long[]  rghNccl)

Initializes a set of NCCL instances for use in a single process.

See Fast Multi-GPU collectives with NCCL.

rghNcclSpecifies the array of NCCL handles that will be working together.

Definition at line 3370 of file CudaDnn.cs.

◆ nesterov_update()

void MyCaffe.common.CudaDnn< T >.nesterov_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData,

Perform the Nesterov update

See Lecture 6c The momentum method by Hinton, et al., 2012, and Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent by Botev, et al., 2016

nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryDataSpecifies a handle to the history data in GPU memory.
fMomentumSpecifies the momentum value.
fLocalRateSpecifies the local learning rate.

Definition at line 10223 of file CudaDnn.cs.

◆ nllloss_bwd()

void MyCaffe.common.CudaDnn< T >.nllloss_bwd ( int  nCount,
long  hTopData,
long  hLabel,
long  hBottomDiff,
int  nOuterNum,
int  nDim,
int  nInnerNum,
long  hCounts,
int?  nIgnoreLabel 

Performs NLL Loss backward pass in Cuda.

nCountSpecifies the number of items.
hTopDataSpecifies a handle to the top data in GPU memory.
hLabelSpecifies a handle to the label data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hCountsSpecifies a handle to the counts in GPU memory.
nIgnoreLabelOptionally, specifies a label to ignore.

Definition at line 9707 of file CudaDnn.cs.

◆ nllloss_fwd()

void MyCaffe.common.CudaDnn< T >.nllloss_fwd ( int  nCount,
long  hProbData,
long  hLabel,
long  hLossData,
int  nOuterNum,
int  nDim,
int  nInnerNum,
long  hCounts,
int?  nIgnoreLabel 

Performs NLL Loss forward pass in Cuda.

nCountSpecifies the number of items.
hProbDataSpecifies a handle to the probability data in GPU memory.
hLabelSpecifies a handle to the label data in GPU memory.
hLossDataSpecifies a handle to the loss data in GPU memory.
hCountsSpecifies a handle to the counts in GPU memory.
nIgnoreLabelOptionally, specifies a label to ignore.

Definition at line 9673 of file CudaDnn.cs.

◆ permute()

void MyCaffe.common.CudaDnn< T >.permute ( int  nCount,
long  hBottom,
bool  bFwd,
long  hPermuteOrder,
long  hOldSteps,
long  hNewSteps,
int  nNumAxes,
long  hTop 

Performs data permutation on the input and reorders the data which is placed in the output.

nCountSpecifies the number of items.
hBottomSpecifies the input data.
bFwdSpecifies whether or not this is a forward (true) or backwards (true) operation.
hPermuteOrderSpecifies the permuation order values in GPU memory.
hOldStepsSpecifies the old step values in GPU memory.
hNewStepsSpecifies the new step values in GPU memory.
nNumAxesSpecifies the number of axes.
hTopSpecifies the output data.

Definition at line 10082 of file CudaDnn.cs.

◆ pooling_bwd()

void MyCaffe.common.CudaDnn< T >.pooling_bwd ( POOLING_METHOD  method,
int  nCount,
long  hTopDiff,
int  num,
int  nChannels,
int  nHeight,
int  nWidth,
int  nPooledHeight,
int  nPooledWidth,
int  nKernelH,
int  nKernelW,
int  nStrideH,
int  nStrideW,
int  nPadH,
int  nPadW,
long  hBottomDiff,
long  hMask,
long  hTopMask 

Performs the backward pass for pooling using Cuda

methodSpecifies the pooling method.
nCountSpecifies the number of items in the bottom data.
hTopDiffSpecifies a handle to the top diff in GPU memory.
numSpecifies the number of inputs.
nChannelsSpecifies the number of channels per input.
nHeightSpecifies the height of each input.
nWidthSpecifies the width of each input.
nPooledHeightSpecifies the height of the pooled data.
nPooledWidthSpecifies the width of the pooled data.
nKernelHSpecifies the height of the pooling kernel.
nKernelWSpecifies the width of the pooling kernel.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.
hTopMaskSpecifies a handle to the top mask data in GPU memory.

Definition at line 8839 of file CudaDnn.cs.

◆ pooling_fwd()

void MyCaffe.common.CudaDnn< T >.pooling_fwd ( POOLING_METHOD  method,
int  nCount,
long  hBottomData,
int  num,
int  nChannels,
int  nHeight,
int  nWidth,
int  nPooledHeight,
int  nPooledWidth,
int  nKernelH,
int  nKernelW,
int  nStrideH,
int  nStrideW,
int  nPadH,
int  nPadW,
long  hTopData,
long  hMask,
long  hTopMask 

Performs the forward pass for pooling using Cuda

methodSpecifies the pooling method.
nCountSpecifies the number of items in the bottom data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
numSpecifies the number of inputs.
nChannelsSpecifies the number of channels per input.
nHeightSpecifies the height of each input.
nWidthSpecifies the width of each input.
nPooledHeightSpecifies the height of the pooled data.
nPooledWidthSpecifies the width of the pooled data.
nKernelHSpecifies the height of the pooling kernel.
nKernelWSpecifies the width of the pooling kernel.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
hTopDataSpecifies a handle to the top data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.
hTopMaskSpecifies a handle to the top mask data in GPU memory.

Definition at line 8810 of file CudaDnn.cs.

◆ PoolingBackward()

void MyCaffe.common.CudaDnn< T >.PoolingBackward ( long  hCuDnn,
long  hPoolingDesc,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
long  hBottomDiffDesc,
long  hBottomDiff 

Perform a pooling backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
hPoolingDescSpecifies a handle to the pooling descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4116 of file CudaDnn.cs.

◆ PoolingForward()

void MyCaffe.common.CudaDnn< T >.PoolingForward ( long  hCuDnn,
long  hPoolingDesc,
long  hBottomDesc,
long  hBottomData,
long  hTopDesc,
long  hTopData 

Perform a pooling forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
hPoolingDescSpecifies a handle to the pooling descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4093 of file CudaDnn.cs.

◆ powx() [1/3]

void MyCaffe.common.CudaDnn< T >.powx ( int  n,
long  hA,
double  fAlpha,
long  hY,
int  nAOff = 0,
int  nYOff = 0 

Calculates the A raised to the power alpha and places the result in Y.

$ f(x) = x^\alpha $

nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
fAlphaSpecifies the scalar in type
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies the offset for hA memory (default = 0).
nYOffOptionally, specifies the offset for hY memory (default = 0).

Definition at line 7524 of file CudaDnn.cs.

◆ powx() [2/3]

void MyCaffe.common.CudaDnn< T >.powx ( int  n,
long  hA,
float  fAlpha,
long  hY,
int  nAOff = 0,
int  nYOff = 0 

Calculates the A raised to the power alpha and places the result in Y.

$ f(x) = x^\alpha $

nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
fAlphaSpecifies the scalar in type
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies the offset for hA memory (default = 0).
nYOffOptionally, specifies the offset for hY memory (default = 0).

Definition at line 7541 of file CudaDnn.cs.

◆ powx() [3/3]

void MyCaffe.common.CudaDnn< T >.powx ( int  n,
long  hA,
long  hY,
int  nAOff = 0,
int  nYOff = 0 

Calculates the A raised to the power alpha and places the result in Y.

$ f(x) = x^\alpha $

nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
fAlphaSpecifies the scalar in type 'T'.
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies the offset for hA memory (default = 0).
nYOffOptionally, specifies the offset for hY memory (default = 0).

Definition at line 7558 of file CudaDnn.cs.

◆ prelu_bwd()

void MyCaffe.common.CudaDnn< T >.prelu_bwd ( int  nCount,
int  nChannels,
int  nDim,
long  hTopDiff,
long  hBottomData,
long  hBottomDiff,
long  hSlopeData,
int  nDivFactor 

Performs Parameterized Rectifier Linear Unit (ReLU) backward pass in Cuda.

See also
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
nCountSpecifies the number of items.
nChannelsSpecifies the channels per input.
nDimSpecifies the dimension of each input.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hSlopeDataSpecifies a handle to the slope data in GPU memory.
nDivFactorSpecifies the div factor applied to the channels.

Definition at line 9585 of file CudaDnn.cs.

◆ prelu_bwd_param()

void MyCaffe.common.CudaDnn< T >.prelu_bwd_param ( int  nCDim,
int  nNum,
int  nTopOffset,
long  hTopDiff,
long  hBottomData,
long  hBackBuffDiff 

Performs Parameterized Rectifier Linear Unit (ReLU) backward param pass in Cuda.

See also
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBackBuffDiffSpecifies a handle to the back buffer diff in GPU memory.

Definition at line 9562 of file CudaDnn.cs.

◆ prelu_fwd()

void MyCaffe.common.CudaDnn< T >.prelu_fwd ( int  nCount,
int  nChannels,
int  nDim,
long  hBottomData,
long  hTopData,
long  hSlopeData,
int  nDivFactor 

Performs Parameterized Rectifier Linear Unit (ReLU) forward pass in Cuda.

Calculation $ f(x) = (x > 0) ? x : x * slopeData $

See also
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
nCountSpecifies the number of items.
nChannelsSpecifies the channels per input.
nDimSpecifies the dimension of each input.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hSlopeDataSpecifies a handle to the slope data in GPU memory.
nDivFactorSpecifies the div factor applied to the channels.

Definition at line 9540 of file CudaDnn.cs.

◆ relu_bwd()

void MyCaffe.common.CudaDnn< T >.relu_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,

Performs a Rectifier Linear Unit (ReLU) backward pass in Cuda.

See also
Rectifier, and
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
fNegativeSlopeSpecifies the negative slope.

Definition at line 9404 of file CudaDnn.cs.

◆ relu_fwd()

void MyCaffe.common.CudaDnn< T >.relu_fwd ( int  nCount,
long  hBottomData,
long  hTopData,

Performs a Rectifier Linear Unit (ReLU) forward pass in Cuda.

Calculation $ f(x) = (x > 0) ? x : x * negativeSlope $

See also
Rectifier, and
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
fNegativeSlopeSpecifies the negative slope.

Definition at line 9383 of file CudaDnn.cs.

◆ ReLUBackward()

void MyCaffe.common.CudaDnn< T >.ReLUBackward ( long  hCuDnn,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
long  hBottomDiffDesc,
long  hBottomDiff 

Perform a ReLU backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4598 of file CudaDnn.cs.

◆ ReLUForward()

void MyCaffe.common.CudaDnn< T >.ReLUForward ( long  hCuDnn,
long  hBottomDataDesc,
long  hBottomData,
long  hTopDataDesc,
long  hTopData 

Perform a ReLU forward pass.

See Rectifier Nonlinearities Improve Neural Network Acoustic Models by Maas, A. L., Hannun, A. Y., and Ng, A. Y. (2013), In ICML Workshop on Deep Learning for Audio, Speech, and Language Processing.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4576 of file CudaDnn.cs.

◆ ReportMemory()

void MyCaffe.common.CudaDnn< T >.ReportMemory ( Log  log,
string  strLocation 

Report the memory use on the current GPU managed by the CudaDnn object.

logSpecifies the output log.
strLocationSpecifies the location of the memory test.

Definition at line 11236 of file CudaDnn.cs.

◆ ResetDevice()

void MyCaffe.common.CudaDnn< T >.ResetDevice ( )

Reset the current device.

IMPORTANT: This function will delete all memory and state information on the current device, which may cause other CudaDnn instances using the same device, to fail. For that reason, it is recommended to only call this function when testing.

Definition at line 2079 of file CudaDnn.cs.

◆ ResetGhostMemory()

void MyCaffe.common.CudaDnn< T >.ResetGhostMemory ( )

Resets the ghost memory by enabling it if this instance was configured to use ghost memory.

Definition at line 1783 of file CudaDnn.cs.

◆ rmsprop_update()

void MyCaffe.common.CudaDnn< T >.rmsprop_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData,

Perform the RMSProp update

See Lecture 6e rmsprop: Divide the gradient by a running average of its recent magnitude by Tieleman and Hinton, 2012, and RMSProp and equilibrated adaptive learning rates for non-convex optimization by Dauphin, et al., 2015

nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryDataSpecifies a handle to the history data in GPU memory.
fRmsDecaySpecifies the decay value used by the Solver. MeanSquare(t) = 'rms_decay' * MeanSquare(t-1) + (1 - 'rms_decay') * SquareGradient(t).
fDeltaSpecifies the numerical stability factor.
fLocalRateSpecifies the local learning rate.

Definition at line 10334 of file CudaDnn.cs.

◆ rng_bernoulli() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli ( int  n,
double  fNonZeroProb,
long  hY 

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fNonZeroProbSpecifies the probability that a given value is set to non zero.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8631 of file CudaDnn.cs.

◆ rng_bernoulli() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli ( int  n,
float  fNonZeroProb,
long  hY 

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fNonZeroProbSpecifies the probability that a given value is set to non zero.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8645 of file CudaDnn.cs.

◆ rng_bernoulli() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli ( int  n,
long  hY 

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fNonZeroProbSpecifies the probability that a given value is set to non zero.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8659 of file CudaDnn.cs.

◆ rng_gaussian() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian ( int  n,
double  fMu,
double  fSigma,
long  hY 

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fMuSpecifies the mean of the distribution with a type of
fSigmaSpecifies the standard deviation of the distribution with a type of
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8578 of file CudaDnn.cs.

◆ rng_gaussian() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian ( int  n,
float  fMu,
float  fSigma,
long  hY 

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fMuSpecifies the mean of the distribution with a type of
fSigmaSpecifies the standard deviation of the distribution with a type of
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8593 of file CudaDnn.cs.

◆ rng_gaussian() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian ( int  n,
long  hY 

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fMuSpecifies the mean of the distribution with a type of 'T'.
fSigmaSpecifies the standard deviation of the distribution with a type of 'T'.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8608 of file CudaDnn.cs.

◆ rng_setseed()

void MyCaffe.common.CudaDnn< T >.rng_setseed ( long  lSeed)

Sets the random number generator seed used by random number operations.

This function uses NVIDIA's cuRand

lSeedSpecifies the random number generator seed.

Definition at line 8506 of file CudaDnn.cs.

◆ rng_uniform() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform ( int  n,
double  fMin,
double  fMax,
long  hY 

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fMinSpecifies the minimum value of the distribution with a type of
fMaxSpecifies the maximum value of the distribution with a type of
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8524 of file CudaDnn.cs.

◆ rng_uniform() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform ( int  n,
float  fMin,
float  fMax,
long  hY 

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fMinSpecifies the minimum value of the distribution with a type of
fMaxSpecifies the maximum value of the distribution with a type of
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8539 of file CudaDnn.cs.

◆ rng_uniform() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform ( int  n,
long  hY 

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

nSpecifies the number of items (not bytes) in the vector X.
fMinSpecifies the minimum value of the distribution with a type of 'T'.
fMaxSpecifies the maximum value of the distribution with a type of 'T'.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8554 of file CudaDnn.cs.

◆ Rnn8Backward()

void MyCaffe.common.CudaDnn< T >.Rnn8Backward ( long  hCuDnn,
long  hRnn,
long  hY,
long  hdY,
long  hX,
long  hdX,
long  hhX,
long  hdhY,
long  hdhX,
long  hcX,
long  hdcY,
long  hdcX,
long  hWt,
long  hdWt,
long  hWork,
long  hReserved 

Calculate the backward pass through the RNN8 for both data and weights.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnSpecifies the handle to the RNN8 created with CreateRnn8.
hYSpecifies a handle to the GPU memory of shape (SeqLen, BatchSize, Outputs) containing the outputs from the forward.
hdYSpecifies a handle to the GPU memory of shape (SeqLen, BatchSize, Outputs) containing the inbound gradients for Y.
hXSpecifies a handle to the GPU memory of shape (SeqLen, BatchSize, Inputs) containing the inputs.
hdXSpecifies a handle to the GPU memory of shape (SeqLen, BatchSize, Outputs) where the outbound, calculated gradients for X are placed.
hhXSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the hidden inputs.
hdhYSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the inbound gradients for hidden.
hdhXSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) where the outbound, calculated gradients for hidden are placed.
hcXSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the hidden cell inputs.
hdcYSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the inbound sgradients for the cell hidden.
hdcXSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) where the outbound, calculated gradients for cell hidden are placed.
hWtSpecifies a handle to the GPU memory of size szWt calculated with GetRnn8MemorySizes, containing the weights.
hdWtSpecifies a handle to the GPU memory of size szWt calculated with GetRnn8MemorySizes, where the weight gradients are placed.
hWorkSpecifies a handle to the GPU memory of size szWork calculated with GetRnn8MemorySizes, used as temporary work data.
hReservedSpecifies a handle to the GPU memory of size szReserved calculated with GetRnn8MemorySizes, used as temporary reserve data.

Definition at line 5300 of file CudaDnn.cs.

◆ Rnn8Forward()

void MyCaffe.common.CudaDnn< T >.Rnn8Forward ( long  hCuDnn,
long  hRnn,
long  hX,
long  hY,
long  hhX,
long  hhY,
long  hcX,
long  hcY,
long  hWts,
long  hWork,
long  hReserved 

Calculate the forward pass through the RNN8.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnSpecifies the handle to the RNN8 created with CreateRnn8.
hXSpecifies a handle to the GPU memory of shape (SeqLen, BatchSize, Inputs) containing the inputs.
hYSpecifies a handle to the GPU memory of shape (SeqLen, BatchSize, Outputs) where the outputs are placed.
hhXSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the hidden inputs.
hhYSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) where the hidden outputs are placed.
hcXSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the hidden cell inputs.
hcYSpecifies a handle to the GPU memory of shape (BatchSize, Hidden) where the hidden cell outputs are placed.
hWtsSpecifies a handle to the GPU memory of size szWt calculated with GetRnn8MemorySizes, containing the weights.
hWorkSpecifies a handle to the GPU memory of size szWork calculated with GetRnn8MemorySizes, used as temporary work data.
hReservedSpecifies a handle to the GPU memory of size szReserved calculated with GetRnn8MemorySizes, used as temporary reserve data.

Definition at line 5273 of file CudaDnn.cs.

◆ RnnBackwardData()

void MyCaffe.common.CudaDnn< T >.RnnBackwardData ( long  hCuDnn,
long  hRnnDesc,
long  hYDesc,
long  hYData,
long  hYDiff,
long  hHyDesc,
long  hHyDiff,
long  hCyDesc,
long  hCyDiff,
long  hWtDesc,
long  hWtData,
long  hHxDesc,
long  hHxData,
long  hCxDesc,
long  hCxData,
long  hXDesc,
long  hXDiff,
long  hdHxDesc,
long  hHxDiff,
long  hdCxDesc,
long  hCxDiff,
long  hWorkspace,
ulong  nWsCount,
long  hReserved,
ulong  nResCount 

Run the RNN backward pass through the data.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
hYDescSpecifies a handle to the output data descriptor.
hYDataSpecifies a handle to the output GPU data.
hYDiffSpecifies a handle to the output GPU gradients.
hHyDescSpecifies a handle to the output hidden descriptor.
hHyDiffSpecifies a handle to the output hidden gradients.
hCyDescSpecifies a handle to the output cont descriptor.
hCyDiffSpecifies a handle to the output cont gradients.
hWtDescSpecifies a handle to the weight descriptor.
hWtDataSpecifies a handle to the weight data.
hHxDescSpecifies a handle to the hidden data descriptor.
hHxDataSpecifies a handle to the hidden GPU data.
hCxDescSpecifies a handle to the cont data descriptor.
hCxDataSpecifies a handle to the cont GPU data.
hXDescSpecifies a handle to the input data descriptor.
hXDiffSpecifies a handle to the input GPU gradients.
hdHxDescSpecifies a handle to the input hidden descriptor for the gradients.
hHxDiffSpecifis a handle to the input hidden GPU gradients.
hdCxDescSpecifies a handle to the input cont descriptor of the gradients.
hCxDiffSpecifies a handle to the input cont GPU gradients.
hWorkspaceSpecifies a handle to the workspace GPU memory.
nWsCountSpecifies the number of items within the workspace.
hReservedSpecifies a handle to the reserved GPU memory.
nResCountSpecifies the number of items within the reserved memory.

Definition at line 4981 of file CudaDnn.cs.

◆ RnnBackwardWeights()

void MyCaffe.common.CudaDnn< T >.RnnBackwardWeights ( long  hCuDnn,
long  hRnnDesc,
long  hXDesc,
long  hXData,
long  hHxDesc,
long  hHxData,
long  hYDesc,
long  hYData,
long  hWorkspace,
ulong  nWsCount,
long  hWtDesc,
long  hWtDiff,
long  hReserved,
ulong  nResCount 

Run the RNN backward pass on the weights.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
hXDescSpecifies a handle to the input data descriptor.
hXDataSpecifies a handle to the input GPU data.
hHxDescSpecifies a handle to the hidden data descriptor.
hHxDataSpecifies a handle to the hidden GPU data.
hYDescSpecifies a handle to the output data descriptor.
hYDataSpecifies a handle to the output GPU data.
hWorkspaceSpecifies a handle to the workspace GPU memory.
nWsCountSpecifies the number of items within the workspace.
hWtDescSpecifies a handle to the weight descriptor.
hWtDiffSpecifies a handle to the weight gradients.
hReservedSpecifies a handle to the reserved GPU memory.
nResCountSpecifies the number of items within the reserved memory.

Definition at line 5080 of file CudaDnn.cs.

◆ RnnForward()

void MyCaffe.common.CudaDnn< T >.RnnForward ( long  hCuDnn,
long  hRnnDesc,
long  hXDesc,
long  hXData,
long  hHxDesc,
long  hHxData,
long  hCxDesc,
long  hCxData,
long  hWtDesc,
long  hWtData,
long  hYDesc,
long  hYData,
long  hHyDesc,
long  hHyData,
long  hCyDesc,
long  hCyData,
long  hWorkspace,
ulong  nWsCount,
long  hReserved,
ulong  nResCount,
bool  bTraining 

Run the RNN through a forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
hXDescSpecifies a handle to the input data descriptor.
hXDataSpecifies a handle to the input GPU data.
hHxDescSpecifies a handle to the hidden data descriptor.
hHxDataSpecifies a handle to the hidden GPU data.
hCxDescSpecifies a handle to the cont data descriptor.
hCxDataSpecifies a handle to the cont GPU data.
hWtDescSpecifies a handle to the weight descriptor.
hWtDataSpecifies a handle to the weight data.
hYDescSpecifies a handle to the output data descriptor.
hYDataSpecifies a handle to the output GPU data.
hHyDescSpecifies a handle to the output hidden descriptor.
hHyDataSpecifies a handle to the output hidden data.
hCyDescSpecifies a handle to the output cont descriptor.
hCyDataSpecifies a handle to the output cont data.
hWorkspaceSpecifies a handle to the workspace GPU memory.
nWsCountSpecifies the number of items within the workspace.
hReservedSpecifies a handle to the reserved GPU memory.
nResCountSpecifies the number of items within the reserved memory.
bTrainingSpecifies the whether the forward pass is during taining or not.

Definition at line 4881 of file CudaDnn.cs.

◆ RunExtension()

T[] MyCaffe.common.CudaDnn< T >.RunExtension ( long  hExtension,
long  lfnIdx,
T[]  rgParam 

Run a function on the extension specified.

hExtensionSpecifies the handle to the extension created with CreateExtension.
lfnIdxSpecifies the extension function to run.
rgParamSpecifies the parameters to pass to the extension.
The values returned by the extension are returned.

Definition at line 3489 of file CudaDnn.cs.

◆ RunMemoryTest()

T[] MyCaffe.common.CudaDnn< T >.RunMemoryTest ( long  h,
ulong  ulBlockStartOffset,
ulong  ulBlockCount,
bool  bVerbose,
bool  bWrite,
bool  bReadWrite,
bool  bRead 

The RunMemoryTest method runs the memory test from the block start offset through the block count on the memory previously allocated using CreateMemoryTest.

hSpecifies the handle to the memory test data.
typeSpecifies the type of memory test to run.
ulBlockStartOffsetSpecifies the block start offset (offset into the total blocks returned by CreateMemoryTest).
ulBlockCountSpecifies the number of blocks to test.
bVerboseWhen disabled, the memory test is just run once and the number of errors is returned. When eanbled, the memory test is run twice and the erroring adresses are returned along with the error count.
The format of the array returned is as follows: rg[0] - specifies the starting memory address used for this memory test run. rg[1] - specifies the number of addresses over which the test was run (specified in 1 byte increments). rg[2] - specifies the number of errors found. rg[3, ...] - specifies the erroring addresses (specified in 1-bit increments)
bWriteSpecifies to perform a write test.
bReadWriteSpecifies to perform a read/write test.
bReadSpecifies to peroform a read test.

Definition at line 3123 of file CudaDnn.cs.

◆ RunPCA()

bool MyCaffe.common.CudaDnn< T >.RunPCA ( long  hPCA,
int  nSteps,
out int  nCurrentK,
out int  nCurrentIteration 

Runs a number of steps of the iterative PCA algorithm.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

hPCASpecifies a handle to the PCA instance to use.
nStepsSpecifies the number of steps to run.
nCurrentKReturns the current component value.
nCurrentIterationReturns the current iteration.
is returned when the maximum number of iterations have been run as specified in CreatePCA.

Definition at line 5417 of file CudaDnn.cs.

◆ scal() [1/3]

void MyCaffe.common.CudaDnn< T >.scal ( int  n,
double  fAlpha,
long  hX,
int  nXOff = 0 

Scales the data in X by a scaling factor.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type
hXSpecifies a handle to the vector X in GPU memory.
nXOffSpecifies an offset (in items, not bytes) into the memory of X.

Definition at line 6767 of file CudaDnn.cs.

◆ scal() [2/3]

void MyCaffe.common.CudaDnn< T >.scal ( int  n,
float  fAlpha,
long  hX,
int  nXOff = 0 

Scales the data in X by a scaling factor.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type
hXSpecifies a handle to the vector X in GPU memory.
nXOffSpecifies an offset (in items, not bytes) into the memory of X.

Definition at line 6782 of file CudaDnn.cs.

◆ scal() [3/3]

void MyCaffe.common.CudaDnn< T >.scal ( int  n,
long  hX,
int  nXOff = 0 

Scales the data in X by a scaling factor.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type 'T'.
hXSpecifies a handle to the vector X in GPU memory.
nXOffSpecifies an offset (in items, not bytes) into the memory of X.

Definition at line 6797 of file CudaDnn.cs.

◆ scale() [1/3]

void MyCaffe.common.CudaDnn< T >.scale ( int  n,
double  fAlpha,
long  hX,
long  hY 

Scales the values in X and places them in Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scale value in type
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6925 of file CudaDnn.cs.

◆ scale() [2/3]

void MyCaffe.common.CudaDnn< T >.scale ( int  n,
float  fAlpha,
long  hX,
long  hY 

Scales the values in X and places them in Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scale value in type
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6940 of file CudaDnn.cs.

◆ scale() [3/3]

void MyCaffe.common.CudaDnn< T >.scale ( int  n,
long  hX,
long  hY,
int  nXOff = 0,
int  nYOff = 0 

Scales the values in X and places them in Y.

This function uses NVIDIA's cuBlas.

nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scale value in type 'T'.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6957 of file CudaDnn.cs.

◆ scale_fwd()

void MyCaffe.common.CudaDnn< T >.scale_fwd ( int  nCount,
long  hX,
long  hScaleData,
int  nScaleDim,
int  nInnerDim,
long  hY,
long  hBiasData = 0 

Performs a scale forward pass in Cuda.

Calculation: $ f(x) = \begin{cases} x * scaleData[(i / nInnerDim) \mod nScaleDim], & \text{if } hBias == 0\\ x * scaleData[(i / nInnerDim) \mod nScaleDim] + biasData[(i / nInnerDim) \mod nScaleDim] & \text{otherwise} \end{cases} $

nCountSpecifies the number of items.
hXSpecifies the input data X in GPU memory.
hYSpecifies the output data Y in GPU memory.
hBiasDataOptionally, specifies the bias data in GPU memory.

Definition at line 9983 of file CudaDnn.cs.

◆ scale_to_range()

void MyCaffe.common.CudaDnn< T >.scale_to_range ( int  n,
long  hX,
long  hY,
double  fMin,
double  fMax 

Scales the values in X and places the result in Y (can also run inline where X = Y).

nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
fMinSpecifies the minimum of the new range.
fMaxSpecifies the maximum of the new range.

Definition at line 6973 of file CudaDnn.cs.

◆ serf_bwd()

void MyCaffe.common.CudaDnn< T >.serf_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData,
double  dfThreshold 

Performs a Serf backward pass in Cuda.

Computes the serf gradient $ f(x)' = \text{erf}\left(\log \left(e^x+1\right)\right)+\frac{2 x e^{x-\log^2\left(e^x+1\right)}}{\sqrt{\pi } \left(e^x+1\right)} $

See also
Serf: Towards better training of deep neural networks using log-Softplus ERror activation Function by Sayan Nag and Mayukh Bhattacharyya, 2021.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.
dfThresholdSpecifies the threshold value.

Definition at line 9267 of file CudaDnn.cs.

◆ serf_fwd()

void MyCaffe.common.CudaDnn< T >.serf_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
double  dfThreshold 

Performs a Serf forward pass in Cuda.

Computes the serf non-linearity $ f(x) = x erf(\ln( 1 + \exp(x) )) $.

See also
Serf: Towards better training of deep neural networks using log-Softplus ERror activation Function by Sayan Nag and Mayukh Bhattacharyya, 2021.
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
dfThresholdSpecifies the threshold value.

Definition at line 9245 of file CudaDnn.cs.

◆ set() [1/3]

void MyCaffe.common.CudaDnn< T >.set ( int  nCount,
long  hHandle,
double  fVal,
int  nIdx = -1 

Set the values of GPU memory to a specified value of type


nCountSpecifies the number of items to set.
hHandleSpecifies a handle to the memory on the GPU.
fValSpecifies the value to set.
nIdxWhen -1, all values in the GPU memory are set to the fVal value, otherwise, only the value at the index nIdx is set to the value.

Definition at line 5897 of file CudaDnn.cs.

◆ set() [2/3]

void MyCaffe.common.CudaDnn< T >.set ( int  nCount,
long  hHandle,
float  fVal,
int  nIdx = -1 

Set the values of GPU memory to a specified value of type


nCountSpecifies the number of items to set.
hHandleSpecifies a handle to the memory on the GPU.
fValSpecifies the value to set.
nIdxWhen -1, all values in the GPU memory are set to the fVal value, otherwise, only the value at the index nIdx is set to the value.

Definition at line 5909 of file CudaDnn.cs.

◆ set() [3/3]

void MyCaffe.common.CudaDnn< T >.set ( int  nCount,
long  hHandle,
int  nIdx = -1,
int  nXOff = 0 

Set the values of GPU memory to a specified value of type 'T'.

nCountSpecifies the number of items to set.
hHandleSpecifies a handle to the memory on the GPU.
fValSpecifies the value to set.
nIdxWhen -1, all values in the GPU memory are set to the fVal value, otherwise, only the value at the index nIdx is set to the value.
nXOffOptionally specifies an offset into the GPU memory where the set starts.

Definition at line 5922 of file CudaDnn.cs.

◆ set_bounds()

void MyCaffe.common.CudaDnn< T >.set_bounds ( int  n,
double  dfMin,
double  dfMax,
long  hX 

Set the bounds of all items within the data to a set range of values.

nSpecifies the number of items.
dfMinSpecifies the minimum value.
dfMaxSpecifies the maximum value.
hXSpecifies a handle to the GPU data to be bound.

Definition at line 6732 of file CudaDnn.cs.

◆ SetConvolutionDesc()

void MyCaffe.common.CudaDnn< T >.SetConvolutionDesc ( long  hHandle,
int  hPad,
int  wPad,
int  hStride,
int  wStride,
int  hDilation,
int  wDilation,
bool  bUseTensorCores,
bool  bHalf = false 

Set the values of a convolution descriptor.

hHandleSpecifies the handle to the convolution descriptor.
hPadSpecifies the pad applied to the height.
wPadSpecifies the pad applied to the width.
hStrideSpecifies the stride of the height.
wStrideSpecifies the stride of the width.
hDilationSpecifies the dilation of the height (default = 1).
wDilationSpecifies the dilation of the width (default = 1).
bUseTensorCoresOptionally, specifies whether or not to use the Tensor Cores (if available).
bHalfOptionally, specifies whether or not to use the FP16 half data type.

Definition at line 3785 of file CudaDnn.cs.

◆ SetDefaultCudaPath()

static void MyCaffe.common.CudaDnn< T >.SetDefaultCudaPath ( string  strPath)

Used to optionally set the default path to the Low-Level Cuda Dnn DLL file.

strPathSpecifies the file path to the Low-Level Cuda Dnn DLL file to use.

Definition at line 1890 of file CudaDnn.cs.

◆ SetDeviceID()

void MyCaffe.common.CudaDnn< T >.SetDeviceID ( int  nDeviceID = -1,
long?  lSeed = null 

Set the device ID used by the current instance of CudaDnn.

nDeviceIDSpecifies the zero-based device (GPU) id. When -1, the device ID is set to the device ID used to create the instance of CudaDnn.
flagsOptionally, specifies the initialization flags.
lSeedOptionally, specifies the random number generator seed.

Definition at line 1960 of file CudaDnn.cs.

◆ SetDropoutDesc()

void MyCaffe.common.CudaDnn< T >.SetDropoutDesc ( long  hCuDnn,
long  hDropoutDesc,
double  dfDropout,
long  hStates,
long  lSeed 

Set the dropout descriptor values.

hCuDnnSpecifies a handle to the instance of cuDnn.
hDropoutDescSpecifies a handle to the dropout descriptor.
dfDropoutSpecifies the droput probability (0.5 = 50%).
hStatesSpecifies a handle to the state data in GPU memory.
lSeedSpecifies the random number-generator seed.

Definition at line 4237 of file CudaDnn.cs.

◆ SetFilterDesc()

void MyCaffe.common.CudaDnn< T >.SetFilterDesc ( long  hHandle,
int  n,
int  c,
int  h,
int  w,
bool  bHalf = false 

Sets the values of a filter descriptor.

hHandleSpecifies the handle to the filter descriptor.
nSpecifies the number of items.
cSpecifies the number of channels in each item.
hSpecifies the height of each item.
wSpecifies the width of each item.
bHalfOptionally, specifies whether or not to use the FP16 half data type.

Definition at line 3735 of file CudaDnn.cs.

◆ SetFilterNdDesc()

void MyCaffe.common.CudaDnn< T >.SetFilterNdDesc ( long  hHandle,
int[]  rgDim,
bool  bHalf = false 

Sets the values of a filter descriptor.

hHandleSpecifies the handle to the filter descriptor.
rgDimSpecifies the dimensions of the data.
bHalfOptionally, specifies whether or not to use the FP16 half data type.

Definition at line 3700 of file CudaDnn.cs.

◆ SetHostMemory()

void MyCaffe.common.CudaDnn< T >.SetHostMemory ( long  hMem,
T[]  rgSrc 

Copies an array of type 'T' into a block of already allocated host memory.

hMemSpecifies the handle to the host memory.
rgSrcSpecifies the array of type 'T' to copy.

Definition at line 2995 of file CudaDnn.cs.

◆ SetLRNDesc()

void MyCaffe.common.CudaDnn< T >.SetLRNDesc ( long  hHandle,
uint  nSize,
double  fAlpha,
double  fBeta,
double  fK 

Set the LRN descriptor values.

hHandleSpecifies a handle to an LRN descriptor.
nSizeSpecifies the normalization window width. Default = 5.
fAlphaSpecifies the alpha variance. Caffe default = 1.0; cuDnn default = 1e-4.
fBetaSpecifies the beta power parameter. Caffe and cuDnn default = 0.75.
fKSpecifies the normalization 'k' parameter. Caffe default = 1.0; cuDnn default = 2.0.

Definition at line 4342 of file CudaDnn.cs.

◆ SetMemory() [1/5]

void MyCaffe.common.CudaDnn< T >.SetMemory ( long  hMem,
double[]  rgSrc,
long  hStream = 0 

Copies an array of double into a block of already allocated GPU memory.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the GPU memory.
rgSrcSpecifies the array of double to copy.
hStreamOptionally specifies the stream to use for the copy operation.

Definition at line 2757 of file CudaDnn.cs.

◆ SetMemory() [2/5]

void MyCaffe.common.CudaDnn< T >.SetMemory ( long  hMem,
float[]  rgSrc,
long  hStream = 0 

Copies an array of float into a block of already allocated GPU memory.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the GPU memory.
rgSrcSpecifies the array of float to copy.
hStreamOptionally specifies the stream to use for the copy operation.

Definition at line 2769 of file CudaDnn.cs.

◆ SetMemory() [3/5]

void MyCaffe.common.CudaDnn< T >.SetMemory ( long  hMem,
List< double >  rg 

Copies a list of doubles into a block of already allocated GPU memory.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the GPU memory.
rgSpecifies the list of doubles to copy.

Definition at line 2734 of file CudaDnn.cs.

◆ SetMemory() [4/5]

void MyCaffe.common.CudaDnn< T >.SetMemory ( long  hMem,
List< float >  rg 

Copies a list of float into a block of already allocated GPU memory.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the GPU memory.
rgSpecifies the list of float to copy.

Definition at line 2745 of file CudaDnn.cs.

◆ SetMemory() [5/5]

void MyCaffe.common.CudaDnn< T >.SetMemory ( long  hMem,
T[]  rgSrc,
long  hStream = 0,
int  nCount = -1 

Copies an array of type 'T' into a block of already allocated GPU memory.

hMemSpecifies the handle to the GPU memory.
rgSrcSpecifies the array of type 'T' to copy.
hStreamOptionally specifies the stream to use for the copy operation.
nCountOptionally, specifies a count of items to retrieve.

Definition at line 2781 of file CudaDnn.cs.

◆ SetMemoryAt() [1/3]

void MyCaffe.common.CudaDnn< T >.SetMemoryAt ( long  hMem,
double[]  rgSrc,
int  nOffset 

Copies an array of double into a block of already allocated GPU memory starting at a specific offset.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the GPU memory.
rgSrcSpecifies the array of double to copy.
nOffsetSpecifies offset within the GPU memory from where the copy is to start.

Definition at line 2860 of file CudaDnn.cs.

◆ SetMemoryAt() [2/3]

void MyCaffe.common.CudaDnn< T >.SetMemoryAt ( long  hMem,
float[]  rgSrc,
int  nOffset 

Copies an array of float into a block of already allocated GPU memory starting at a specific offset.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

hMemSpecifies the handle to the GPU memory.
rgSrcSpecifies the array of float to copy.
nOffsetSpecifies offset within the GPU memory from where the copy is to start.

Definition at line 2872 of file CudaDnn.cs.

◆ SetMemoryAt() [3/3]

void MyCaffe.common.CudaDnn< T >.SetMemoryAt ( long  hMem,
T[]  rgSrc,
int  nOffset 

Copies an array of type 'T' into a block of already allocated GPU memory starting at a specific offset.

hMemSpecifies the handle to the GPU memory.
rgSrcSpecifies the array of type 'T' to copy.
nOffsetSpecifies offset within the GPU memory from where the copy is to start.

Definition at line 2883 of file CudaDnn.cs.

◆ SetPixel()

T[] MyCaffe.common.CudaDnn< T >.SetPixel ( long  hMem,
int  nCount,
bool  bReturnOriginal,
int  nOffset,
params Tuple< int, T >[]  rgPixel 

Set a pixel value where each pixel is defined a set index, value tuple.

hMemSpecifies the memory where the values are set.
nCountSpecifies the number of allocated items in the memory.
bReturnOriginalSpecifies whether or not to return the original values (before setting).
nOffsetSpecifies the offset of where the first pixel data starts.
rgPixelSpecifies the pixel values.
When 'bReturnOriginal' is True, the original values (before setting) are returned.

Definition at line 2933 of file CudaDnn.cs.

◆ SetPoolingDesc()

void MyCaffe.common.CudaDnn< T >.SetPoolingDesc ( long  hHandle,
PoolingMethod  method,
int  h,
int  w,
int  hPad,
int  wPad,
int  hStride,
int  wStride 

Set the values of a pooling descriptor.

hHandleSpecifies the handle to the convolution descriptor.
methodSpecifies the pooling method to use.
hSpecifies the pooling area height.
wSpecifies the pooling area width.
hPadSpecifies the height padding.
wPadSpecifies the width padding.
hStrideSpecifies the height stride.
wStrideSpecifies the width stride.

Definition at line 4074 of file CudaDnn.cs.

◆ SetRandomSeed()

void MyCaffe.common.CudaDnn< T >.SetRandomSeed ( long  lSeed)

Set the random number generator seed.

lSeedSpecifies the seed to set.

Definition at line 1990 of file CudaDnn.cs.

◆ SetRnn8()

void MyCaffe.common.CudaDnn< T >.SetRnn8 ( long  hCuDnn,
long  hRnn,
bool  bTraining,
RNN_MODE  cellMode,
RNN_BIAS_MODE  biasMode,
int  nSequenceLen,
int  nBatchSize,
int  nInputs,
int  nHidden,
int  nOutputs,
int  nProjection,
int  nNumLayers,
float  fDropout,
ulong  lSeed,
bool  bBidirectional = false 

Set the RNN8 parameters.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnSpecifies the handle to the RNN8 created with CreateRnn8.
bTrainingSpecifies true for training and false for inference.
layoutSpecifies the data layout ordering.
cellModeSpecifies the cell mode (RELU, TANH, LSTM or GRU),
biasModeSpecifies the bias mode (default = RNN_DOUBLE_BIAS)
nSequenceLenSpecifies the sequence length.
nBatchSizeSpecifies the batch size.
nInputsSpecifies the number of inputs. X input is of size (SeqLen, BatchSize, Inputs)
nHiddenSpecifies the number of hidden. H and C are of size (BatchSize, Hidden)
nOutputsSpecifies the number of outputs. Y output is of size (SeqLen, BatchSize, Outputs)
nProjectionSpecifies the projection size.
nNumLayersSpecifies the number of layers.
fDropoutSpecifies the dropout ratio.
lSeedSpecifies the dropout seed.
bBidirectionalSpecifies unidirectional (false) or bidirectional (true), (default = false)

Definition at line 5205 of file CudaDnn.cs.

◆ SetRnnDataDesc()

void MyCaffe.common.CudaDnn< T >.SetRnnDataDesc ( long  hRnnDataDesc,
int  nMaxSeqLen,
int  nBatchSize,
int  nVectorSize,
bool  bBidirectional = false,
int[]  rgSeqLen = null 

Sets the RNN Data Descriptor values.

hRnnDataDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
layoutSpecifies the input data layout (either SEQUENCE major or BATCH major).
nMaxSeqLenSpecifies the maximum sequence length.
nBatchSizeSpecifies the batch count.
nVectorSizeSpecifies the input vector count.
bBidirectionalSpecifies whether the Rnn is bidirectional or not (default = false).
rgSeqLenSpecifies the sequence lengths - currently this should be null which sets all sequence lengths to nMaxSeqLen.

Definition at line 4692 of file CudaDnn.cs.

◆ SetRnnDesc()

void MyCaffe.common.CudaDnn< T >.SetRnnDesc ( long  hCuDnn,
long  hRnnDesc,
int  nHiddenCount,
int  nNumLayers,
long  hDropoutDesc,
RNN_MODE  mode,
bool  bUseTensorCores,

Sets the RNN Descriptor values.

hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
nHiddenCountSpecifies the hidden input (typically the input) count.
nNumLayersSpecifies the number of layers.
hDropoutDescSpecifies the handle to the Droput descriptor (or 0 to ignore). The droput descriptor is only used with two or more layers.
modeSpecifies the RNN_MODE (LSTM, RNN_RELU, RNN_TANH) to use.
bUseTensorCoresOptionally, specifies whether or not to use the Tensor Cores (if available).
directionOptionally, specifies the direction of the RNN; Unidirectional or BiDirectional.

Definition at line 4770 of file CudaDnn.cs.

◆ SetTensorDesc() [1/2]

void MyCaffe.common.CudaDnn< T >.SetTensorDesc ( long  hHandle,
int  n,
int  c,
int  h,
int  w,
bool  bHalf = false 

Sets the values of a tensor descriptor.

hHandleSpecifies the handle to the tensor descriptor.
nSpecifies the number of items.
cSpecifies the number of channels in each item.
hSpecifies the height of each item.
wSpecifies the width of each item.
bHalfOptionally, specifies whether or not to use the FP16 half data type.

Definition at line 3599 of file CudaDnn.cs.

◆ SetTensorDesc() [2/2]

void MyCaffe.common.CudaDnn< T >.SetTensorDesc ( long  hHandle,
int  n,
int  c,
int  h,
int  w,
int  nStride,
int  cStride,
int  hStride,
int  wStride,
bool  bHalf = false 

Sets the values of a tensor descriptor.

hHandleSpecifies the handle to the tensor descriptor.
nSpecifies the number of items.
cSpecifies the number of channels in each item.
hSpecifies the height of each item.
wSpecifies the width of each item.
nStrideSpecifies the stride between two images.
cStrideSpecifies the stride between two channels.
hStrideSpecifies the stride between two rows.
wStrideSpecifies the stride between two columns.
bHalfOptionally, specifies whether or not to use the FP16 half data type.

Definition at line 3620 of file CudaDnn.cs.

◆ SetTensorNdDesc()

void MyCaffe.common.CudaDnn< T >.SetTensorNdDesc ( long  hHandle,
int[]  rgDim,
int[]  rgStride,
bool  bHalf = false 

Sets the values of a tensor descriptor.

hHandleSpecifies the handle to the tensor descriptor.
rgDimSpecifies the dimensions of the data.
rgStrideSpecifies the stride of the data.
bHalfOptionally, specifies whether or not to use the FP16 half data type.

Definition at line 3551 of file CudaDnn.cs.

◆ SetupSSD()

void MyCaffe.common.CudaDnn< T >.SetupSSD ( long  hSSD,
int  nNum,
int  nNumPriors,
int  nNumGt 

Setup the SSD GPU support.

hSSDSpecifies the handle to the SSD instance.
nNumSpecifies the number of items.
nNumPriorsSpecifies the number of priors.
nNumGtSpecifies the number of ground truths.

Definition at line 5625 of file CudaDnn.cs.

◆ sgd_update()

void MyCaffe.common.CudaDnn< T >.sgd_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData,

Perform the Stochastic Gradient Descent (SGD) update

See Stochastic Gradient Descent.

nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryDataSpecifies a handle to the history data in GPU memory.
fMomentumSpecifies the momentum value.
fLocalRateSpecifies the local learning rate.

Definition at line 10203 of file CudaDnn.cs.

◆ sigmoid_bwd()

void MyCaffe.common.CudaDnn< T >.sigmoid_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff 

Performs a Sigmoid backward pass in Cuda.

See also
Sigmoid Function.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9341 of file CudaDnn.cs.

◆ sigmoid_cross_entropy_bwd()

void MyCaffe.common.CudaDnn< T >.sigmoid_cross_entropy_bwd ( int  nCount,
int  nIgnoreLabel,
long  hTarget,
long  hBottomDiff 

Performs a sigmoid cross entropy backward pass in Cuda when an ignore label is specified.

nCountSpecifies the number of items.
nIgnoreLabelSpecifies the label to ignore.
hTargetSpecifies a handle to the target data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 10571 of file CudaDnn.cs.

◆ sigmoid_cross_entropy_fwd()

void MyCaffe.common.CudaDnn< T >.sigmoid_cross_entropy_fwd ( int  nCount,
long  hInput,
long  hTarget,
long  hLoss,
bool  bHasIgnoreLabel,
int  nIgnoreLabel,
long  hCountData 

Performs a sigmoid cross entropy forward pass in Cuda.

nCountSpecifies the number of items.
hInputSpecifies a handle to the input data in GPU memory.
hTargetSpecifies a handle to the target data in GPU memory.
hLossSpecifies a handle to the loss data in GPU memory.
bHasIgnoreLabelSpecifies whether or not an ignore label is used.
nIgnoreLabelSpecifies the ignore label which is used when bHasIgnoreLabel is
hCountDataSpecifies a handle to the count data in GPU memory.

Definition at line 10556 of file CudaDnn.cs.

◆ sigmoid_fwd()

void MyCaffe.common.CudaDnn< T >.sigmoid_fwd ( int  nCount,
long  hBottomData,
long  hTopData 

Performs a Sigmoid forward pass in Cuda.

Calcuation $ f(x) = 1.0 / (1.0 + e^-x) $

See also
Sigmoid Function.
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9323 of file CudaDnn.cs.

◆ SigmoidBackward()

void MyCaffe.common.CudaDnn< T >.SigmoidBackward ( long  hCuDnn,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
long  hBottomDiffDesc,
long  hBottomDiff 

Perform a Sigmoid backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4553 of file CudaDnn.cs.

◆ SigmoidForward()

void MyCaffe.common.CudaDnn< T >.SigmoidForward ( long  hCuDnn,
long  hBottomDataDesc,
long  hBottomData,
long  hTopDataDesc,
long  hTopData 

Perform a Sigmoid forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4531 of file CudaDnn.cs.

◆ sign()

void MyCaffe.common.CudaDnn< T >.sign ( int  n,
long  hX,
long  hY,
int  nXOff = 0,
int  nYOff = 0 

Computes the sign of each element of X and places the result in Y.

nSpecifies the number of items (not bytes) in the vectors A and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nXOffSpecifies an offset (in items, not bytes) into the memory of X.
nYOffSpecifies an offset (in items, not bytes) into the memory of Y.

Definition at line 7574 of file CudaDnn.cs.

◆ silu_bwd()

void MyCaffe.common.CudaDnn< T >.silu_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData 

Performs the Sigmoid-weighted Linear Unit (SiLU) activation backward pass in Cuda.

Computes the SiLU non-linearity $ y = x * sigmoid(x) $ $ y' = sigmoid(x) * (1 + x * (1 - sigmoid(x)) $

See also
Brief Review - SiLU: Sigmoid-weighted Linear Unit by Sik-Ho Tsang, 2022, Medium.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.

Definition at line 9140 of file CudaDnn.cs.

◆ silu_fwd()

void MyCaffe.common.CudaDnn< T >.silu_fwd ( int  nCount,
long  hBottomData,
long  hTopData 

Performs the Sigmoid-weighted Linear Unit (SiLU) activation forward pass in Cuda.

Computes the SiLU non-linearity $ y = x * sigmoid(x) $ $ y' = sigmoid(x) * (1 + x * (1 - sigmoid(x)) $

See also
Brief Review - SiLU: Sigmoid-weighted Linear Unit by Sik-Ho Tsang, 2022, Medium.
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9118 of file CudaDnn.cs.

◆ slice_bwd()

void MyCaffe.common.CudaDnn< T >.slice_bwd ( int  nCount,
long  hTopDiff,
int  nNumSlices,
int  nSliceSize,
int  nBottomSliceAxis,
int  nTopSliceAxis,
int  nOffsetSliceAxis,
long  hBottomDiff 

Performs a slice backward pass in Cuda.

nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nNumSlicesSpecifies the number of slices.
nSliceSizeSpecifies the slice size.
nBottomSliceAxisSpecifies the bottom axis to concatenate.
nOffsetSliceAxisNEEDS REVIEW
hBottomDiffSpecifies a handle to the Bottom diff in GPU memory.

Definition at line 9907 of file CudaDnn.cs.

◆ slice_fwd()

void MyCaffe.common.CudaDnn< T >.slice_fwd ( int  nCount,
long  hBottomData,
int  nNumSlices,
int  nSliceSize,
int  nBottomSliceAxis,
int  nTopSliceAxis,
int  nOffsetSliceAxis,
long  hTopData 

Performs a slice forward pass in Cuda.

nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
nNumSlicesSpecifies the number of slices.
nSliceSizeSpecifies the slice size.
nBottomSliceAxisNEEDS REVIEW
nOffsetSliceAxisNEEDS REVIEW
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9888 of file CudaDnn.cs.

◆ smoothl1_bwd()

void MyCaffe.common.CudaDnn< T >.smoothl1_bwd ( int  nCount,
long  hX,
long  hY 

Performs the backward operation for the SmoothL1 loss.

Calculation: f'(x) = x, if |x| lt 1 = sign(x), otherwise

nCountSpecifies the number of items.
hXSpecifies the input data X in GPU memory.
hYSpecifies the output data Y in GPU memory.

Definition at line 10063 of file CudaDnn.cs.

◆ smoothl1_fwd()

void MyCaffe.common.CudaDnn< T >.smoothl1_fwd ( int  nCount,
long  hX,
long  hY 

Performs the forward operation for the SmoothL1 loss.

Calculation: f(x) = 0.5 * x^2, if |x| lt 1 = |x| - 0.5, otherwise

nCountSpecifies the number of items.
hXSpecifies the input data X in GPU memory.
hYSpecifies the output data Y in GPU memory.

Definition at line 10044 of file CudaDnn.cs.

◆ softmax_cross_entropy_bwd()

void MyCaffe.common.CudaDnn< T >.softmax_cross_entropy_bwd ( int  nCount,
int  nIgnoreLabel,
long  hTarget,
long  hBottomDiff 

Performs a softmax cross entropy backward pass in Cuda when an ignore label is specified.

nCountSpecifies the number of items.
nIgnoreLabelSpecifies the label to ignore.
hTargetSpecifies a handle to the target data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 10624 of file CudaDnn.cs.

◆ softmax_cross_entropy_fwd()

void MyCaffe.common.CudaDnn< T >.softmax_cross_entropy_fwd ( int  nCount,
long  hProbData,
long  hLabel,
long  hLossDiff,
long  hLossData,
int  nOuterNum,
int  nDim,
int  nInnerNum,
long  hCounts,
int?  nIgnoreLabel 

Performs a softmax cross entropy forward pass in Cuda.

nCountSpecifies the number of items.
hProbDataSpecifies a handle to the probability data in GPU memory.
hLabelSpecifies a handle to the label data in GPU memory.
hLossDiffSpecifies a handle to the loss diff in GPU memory that is filled with 1's at each 'active' location where loss data is placed.
hLossDataSpecifies a handle to the loss data in GPU memory.
hCountsSpecifies a handle to the counts in GPU memory.
nIgnoreLabelOptionally, specifies a label to ignore.

This forward pass is a helper to perform a part of the NLLLoss portion of the SoftmaxCrossEntropyLoss.

Definition at line 10595 of file CudaDnn.cs.

◆ SoftmaxBackward()

void MyCaffe.common.CudaDnn< T >.SoftmaxBackward ( long  hCuDnn,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDiffDesc,
long  hBottomDiff 

Perform a Softmax backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
algSpecifies the SoftmaxAlgorithm to use (FAST, ACCURATE or LOG).
modeSpecifies the SoftmaxMode to use (INSTANCE across NxCHW, or CHANNEL across NCxHW)
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4640 of file CudaDnn.cs.

◆ SoftmaxForward()

void MyCaffe.common.CudaDnn< T >.SoftmaxForward ( long  hCuDnn,
long  hBottomDataDesc,
long  hBottomData,
long  hTopDataDesc,
long  hTopData 

Perform a Softmax forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
algSpecifies the SoftmaxAlgorithm to use (FAST, ACCURATE or LOG).
modeSpecifies the SoftmaxMode to use (INSTANCE across NxCHW, or CHANNEL across NCxHW)
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4618 of file CudaDnn.cs.

◆ softmaxloss_bwd()

void MyCaffe.common.CudaDnn< T >.softmaxloss_bwd ( int  nCount,
long  hTopData,
long  hLabel,
long  hBottomDiff,
int  nOuterNum,
int  nDim,
int  nInnerNum,
long  hCounts,
int?  nIgnoreLabel 

Performs Softmax Loss backward pass in Cuda.

nCountSpecifies the number of items.
hTopDataSpecifies a handle to the top data in GPU memory.
hLabelSpecifies a handle to the label data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hCountsSpecifies a handle to the counts in GPU memory.
nIgnoreLabelOptionally, specifies a label to ignore.

Definition at line 9639 of file CudaDnn.cs.

◆ softmaxloss_fwd()

void MyCaffe.common.CudaDnn< T >.softmaxloss_fwd ( int  nCount,
long  hProbData,
long  hLabel,
long  hLossData,
int  nOuterNum,
int  nDim,
int  nInnerNum,
long  hCounts,
int?  nIgnoreLabel 

Performs Softmax Loss forward pass in Cuda.

nCountSpecifies the number of items.
hProbDataSpecifies a handle to the probability data in GPU memory.
hLabelSpecifies a handle to the label data in GPU memory.
hLossDataSpecifies a handle to the loss data in GPU memory.
hCountsSpecifies a handle to the counts in GPU memory.
nIgnoreLabelOptionally, specifies a label to ignore.

Definition at line 9605 of file CudaDnn.cs.

◆ softplus_bwd()

void MyCaffe.common.CudaDnn< T >.softplus_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData 

Performs the Softplus function backward, a smooth approximation of the ReLU function

Computes the SoftPlus non-linearity $ y = log(1 + e^x) $ $ y' = sigmoid(x) $

See also
Softplus function - Smooth approximation of the ReLU function by neuralthreds, 2021, Medium.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.

Definition at line 9183 of file CudaDnn.cs.

◆ softplus_fwd()

void MyCaffe.common.CudaDnn< T >.softplus_fwd ( int  nCount,
long  hBottomData,
long  hTopData 

Performs the Softplus function forward, a smooth approximation of the ReLU function

Computes the SoftPlus non-linearity $ y = log(1 + e^x) $ $ y' = sigmoid(x) $

See also
Softplus function - Smooth approximation of the ReLU function by neuralthreds, 2021, Medium.
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9161 of file CudaDnn.cs.

◆ sort()

void MyCaffe.common.CudaDnn< T >.sort ( int  nCount,
long  hY 

Sort the data in the GPU memory specified.

nCountSpecifies the total number of items in the memory.
hYSpecifies the handle to the GPU memory of data to sort.

Definition at line 6212 of file CudaDnn.cs.

◆ sqrt()

void MyCaffe.common.CudaDnn< T >.sqrt ( int  n,
long  hX,
long  hY 

Computes the square root of each element of X and places the result in Y.

nSpecifies the number of items (not bytes) in the vectors A and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7624 of file CudaDnn.cs.

◆ sqrt_scale()

void MyCaffe.common.CudaDnn< T >.sqrt_scale ( int  nCount,
long  hX,
long  hY 

Scale the data by the sqrt of the data. y = sqrt(abs(x)) * sign(x)

nCountSpecifies the number of elements.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7638 of file CudaDnn.cs.

◆ SsdEncodeConfPrediction()

void MyCaffe.common.CudaDnn< T >.SsdEncodeConfPrediction ( long  hSSD,
int  nConfPredCount,
long  hConfPred,
int  nConfGtCount,
long  hConfGt 

Encodes the SSD data into the confidence prediction and confidence ground truths.

hSSDSpecifies the handle to the SSD instance.
nConfPredCountSpecifies the number of confidence prediction items.
hConfPredSpecifies the confidence prediction data in GPU memory.
nConfGtCountSpecifies the confidence ground truth items.
hConfGtSpecifies the confidence ground truth data in GPU memory.

Definition at line 5810 of file CudaDnn.cs.

◆ SsdEncodeLocPrediction()

void MyCaffe.common.CudaDnn< T >.SsdEncodeLocPrediction ( long  hSSD,
int  nLocPredCount,
long  hLocPred,
int  nLocGtCount,
long  hLocGt 

Encodes the SSD data into the location prediction and location ground truths.

hSSDSpecifies the handle to the SSD instance.
nLocPredCountSpecifies the number of location prediction items.
hLocPredSpecifies the location prediction data in GPU memory.
nLocGtCountSpecifies the location ground truth items.
hLocGtSpecifies the location ground truth data in GPU memory.

Definition at line 5794 of file CudaDnn.cs.

◆ SsdMultiBoxLossForward()

int MyCaffe.common.CudaDnn< T >.SsdMultiBoxLossForward ( long  hSSD,
int  nLocDataCount,
long  hLocGpuData,
int  nConfDataCount,
long  hConfGpuData,
int  nPriorDataCount,
long  hPriorGpuData,
int  nGtDataCount,
long  hGtGpuData,
out List< DictionaryMap< List< int > > >  rgAllMatchIndices,
out List< List< int > >  rgrgAllNegIndices,
out int  nNumNegs 

Performs the SSD MultiBoxLoss forward operation.

hSSDSpecifies the handle to the SSD instance.
nLocDataCountSpecifies the number of location data items.
hLocGpuDataSpecifies the handle to the location data in GPU memory.
nConfDataCountSpecifies the number of confidence data items.
hConfGpuDataSpecifies the handle to the confidence data in GPU memory.
nPriorDataCountSpecifies the number of prior box data.
hPriorGpuDataSpecifies the prior box data in GPU memory.
nGtDataCountSpecifies the number of ground truth items.
hGtGpuDataSpecifies the ground truth data in GPU memory.
rgAllMatchIndicesReturns all match indices found.
rgrgAllNegIndicesReturns all neg indices found.
nNumNegsReturns the number of negatives.
The number of matches is returned.

Definition at line 5661 of file CudaDnn.cs.

◆ sub()

void MyCaffe.common.CudaDnn< T >.sub ( int  n,
long  hA,
long  hB,
long  hY,
int  nAOff = 0,
int  nBOff = 0,
int  nYOff = 0,
int  nB = 0 

Subtracts B from A and places the result in Y.

Y = A - B

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A.
nBOffOptionally, specifies an offset (in items, not bytes) into the memory of B.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.
nBOptionally, specifies a number of 'B' items to subtract (default = 0 which causes ALL items in B to be subtracted). When 'nB' > 0, it must be a factor of 'n' and causes that number of B items to be subtracted as a block from A.

Definition at line 7312 of file CudaDnn.cs.

◆ sub_and_dot()

void MyCaffe.common.CudaDnn< T >.sub_and_dot ( int  n,
int  nN,
int  nInnerNum,
long  hA,
long  hB,
long  hY,
int  nAOff,
int  nBOff,
int  nYOff 

Subtracts every nInnterNum element of B from A and performs a dot product on the result.

Y[i] = (A[i] - B[inInnerNum]) * (A[i] - B[inInnerNum])

nSpecifies the number of items (not bytes) in the vectors A, B and Y.
nNSpecifies the inner count.
nInnerNumSpecifies the dimension.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A.
nBOffOptionally, specifies an offset (in items, not bytes) into the memory of B.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 7357 of file CudaDnn.cs.

◆ sum()

void MyCaffe.common.CudaDnn< T >.sum ( int  nCount,
int  nOuterNum,
int  nInnerNum,
long  hX,
long  hY 

Calculates the sum of inner values of X and places the result in Y.

nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of outer items within X.
nInnerNumSpecifies the dimension of items to sum in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 8491 of file CudaDnn.cs.

◆ sumsq()

double MyCaffe.common.CudaDnn< T >.sumsq ( int  n,
long  hW,
long  hA,
int  nAOff = 0 

Calculates the sum of squares of A.

nSpecifies the number of items (not bytes) in the vectors A and W.
hWSpecifies a handle to workspace data in GPU memory.
hASpecifies a handle to the vector A in GPU memory.
nAOffSpecifies an offset (in items, not bytes) into the memory of A.
The sum of squares of A is returned as type

Definition at line 7878 of file CudaDnn.cs.

◆ sumsqdiff()

double MyCaffe.common.CudaDnn< T >.sumsqdiff ( int  n,
long  hW,
long  hA,
long  hB,
int  nAOff = 0,
int  nBOff = 0 

Calculates the sum of squares of differences between A and B

nSpecifies the number of items (not bytes) in the vectors A, B and W.
hWSpecifies a handle to workspace data in GPU memory.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
nAOffSpecifies an offset (in items, not bytes) into the memory of A.
nBOffSpecifies an offset (in items, not bytes) into the memory of B.
The sum of squared differences between A and B are returned as type

Definition at line 7902 of file CudaDnn.cs.

◆ swish_bwd()

void MyCaffe.common.CudaDnn< T >.swish_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hSigmoidOutputData,
long  hBottomDiff,
double  dfBeta 

Performs a Swish backward pass in Cuda.

See also
Activation Functions by Prajit Ramachandran, Barret Zoph, Quoc V. Le., 2017.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hSigmoidOutputDataSpecifies a handle to the sigmoid output data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
dfBetaSpecifies the 'beta' value applied to the output.

Definition at line 9361 of file CudaDnn.cs.

◆ SynchronizeDevice()

void MyCaffe.common.CudaDnn< T >.SynchronizeDevice ( )

Synchronize the operations on the current device.

Definition at line 2093 of file CudaDnn.cs.

◆ SynchronizeStream()

void MyCaffe.common.CudaDnn< T >.SynchronizeStream ( long  h = 0)

Synchronize a stream on the current GPU, waiting for its operations to complete.

hSpecifies the handle to the stream.

Definition at line 3239 of file CudaDnn.cs.

◆ SynchronizeThread()

void MyCaffe.common.CudaDnn< T >.SynchronizeThread ( )

Synchronize all kernel threads on the current GPU.

Definition at line 3250 of file CudaDnn.cs.

◆ tanh_bwd()

void MyCaffe.common.CudaDnn< T >.tanh_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff 

Performs a TanH backward pass in Cuda.

See also
Hyperbolic Function.
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9304 of file CudaDnn.cs.

◆ tanh_fwd()

void MyCaffe.common.CudaDnn< T >.tanh_fwd ( int  nCount,
long  hBottomData,
long  hTopData 

Performs a TanH forward pass in Cuda.

Calculation $ f(x) = tanh(x) $

See also
Hyperbolic Function.
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9286 of file CudaDnn.cs.

◆ TanhBackward()

void MyCaffe.common.CudaDnn< T >.TanhBackward ( long  hCuDnn,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
long  hBottomDiffDesc,
long  hBottomDiff 

Perform a Tanh backward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4473 of file CudaDnn.cs.

◆ TanhForward()

void MyCaffe.common.CudaDnn< T >.TanhForward ( long  hCuDnn,
long  hBottomDataDesc,
long  hBottomData,
long  hTopDataDesc,
long  hTopData 

Perform a Tanh forward pass.

hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4451 of file CudaDnn.cs.

◆ threshold_fwd()

void MyCaffe.common.CudaDnn< T >.threshold_fwd ( int  nCount,
double  dfThreshold,
long  hX,
long  hY 

Performs a threshold pass in Cuda.

Calculation: $ Y[i] = (X[i] > threshold) ? 1 : 0 $

nCountSpecifies the number of items.
dfThresholdSpecifies the threshold value.
hXSpecifies the input data X in GPU memory.
hYSpecifies the output data Y in GPU memory.

Definition at line 10001 of file CudaDnn.cs.

◆ tile_bwd()

void MyCaffe.common.CudaDnn< T >.tile_bwd ( int  nCount,
long  hTopDiff,
int  nTileSize,
int  nTiles,
int  nBottomTileAxis,
long  hBottomDiff 

Performs a tile backward pass in Cuda.

nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTileSizeSpecifies the size of each tile.
nTilesSpecifies the number of tiles.
hBottomDiffSpecifies a handle to the Bottom diff in GPU memory.

Definition at line 9941 of file CudaDnn.cs.

◆ tile_fwd()

void MyCaffe.common.CudaDnn< T >.tile_fwd ( int  nCount,
long  hBottomData,
int  nInnerDim,
int  nTiles,
int  nBottomTileAxis,
long  hTopData 

Performs a tile forward pass in Cuda.

nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
nTilesSpecifies the number of tiles.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 9924 of file CudaDnn.cs.

◆ transpose()

void MyCaffe.common.CudaDnn< T >.transpose ( int  n,
long  hX,
long  hY,
long  hXCounts,
long  hYCounts,
long  hMapping,
int  nNumAxes,
long  hBuffer 

Perform a transpose on X producing Y, similar to the numpy.transpose operation.

nSpecifies the number of items in both hX and hY (must be the same).
hXSpecifies a handle to the input data in gpu memory.
hYSpecifies a handle to the output data in gpu memory.
hXCountsSpecifies a handle to the input counts in gpu memory.
hYCountsSpecifies a handle to the output counts in gpu memory.
hMappingSpecifies a handle to the mappings of each axis.
nNumAxesSpecifies the number of axes.
hBufferSpecifies a handle to the buffer that should have 'n' * nNumAxes number of items.

Definition at line 7862 of file CudaDnn.cs.

◆ transposeHW()

void MyCaffe.common.CudaDnn< T >.transposeHW ( int  n,
int  c,
int  h,
int  w,
long  hSrc,
long  hDst 

Transpose a n*c number of matrices along the height and width dimensions. All matrices are in row-major format.

nSpecifies the number of items (e.g. batches)
cSpecifies the number of channels.
hSpecifies the height.
wSpecifies the width.
hSrcSpecifies a handle to GPU memory of shape (n,c,h,w)
hDstSpecifies a handle to GPU memory of shape (n,c,w,h)

Definition at line 6716 of file CudaDnn.cs.

◆ unpooling_bwd()

void MyCaffe.common.CudaDnn< T >.unpooling_bwd ( POOLING_METHOD  method,
int  nCount,
long  hTopDiff,
int  num,
int  nChannels,
int  nHeight,
int  nWidth,
int  nPooledHeight,
int  nPooledWidth,
int  nKernelH,
int  nKernelW,
int  nStrideH,
int  nStrideW,
int  nPadH,
int  nPadW,
long  hBottomDiff,
long  hMask 

Performs the backward pass for unpooling using Cuda

methodSpecifies the pooling method.
nCountSpecifies the number of items in the bottom data.
hTopDiffSpecifies a handle to the top diff in GPU memory.
numSpecifies the number of inputs.
nChannelsSpecifies the number of channels per input.
nHeightSpecifies the height of each input.
nWidthSpecifies the width of each input.
nPooledHeightSpecifies the height of the pooled data.
nPooledWidthSpecifies the width of the pooled data.
nKernelHSpecifies the height of the pooling kernel.
nKernelWSpecifies the width of the pooling kernel.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.

Definition at line 8895 of file CudaDnn.cs.

◆ unpooling_fwd()

void MyCaffe.common.CudaDnn< T >.unpooling_fwd ( POOLING_METHOD  method,
int  nCount,
long  hBottomData,
int  num,
int  nChannels,
int  nHeight,
int  nWidth,
int  nPooledHeight,
int  nPooledWidth,
int  nKernelH,
int  nKernelW,
int  nStrideH,
int  nStrideW,
int  nPadH,
int  nPadW,
long  hTopData,
long  hMask 

Performs the forward pass for unpooling using Cuda

methodSpecifies the pooling method.
nCountSpecifies the number of items in the bottom data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
numSpecifies the number of inputs.
nChannelsSpecifies the number of channels per input.
nHeightSpecifies the height of each input.
nWidthSpecifies the width of each input.
nPooledHeightSpecifies the height of the pooled data.
nPooledWidthSpecifies the width of the pooled data.
nKernelHSpecifies the height of the pooling kernel.
nKernelWSpecifies the width of the pooling kernel.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
hTopDataSpecifies a handle to the top data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.

Definition at line 8867 of file CudaDnn.cs.

◆ width()

void MyCaffe.common.CudaDnn< T >.width ( int  n,
long  hMean,
long  hMin,
long  hMax,
double  dfAlpha,
long  hWidth 

Calculates the width values.

nSpecifies the number of items.
hMeanSpecifies a handle to the mean values in GPU memory.
hMinSpecifies a handle to the min values in GPU memory.
hMaxSpecifies a handle to the max values in GPU memory.
dfAlphaSpecifies the alpha value.
hWidthSpecifies the GPU memory where the width values are placed.

Definition at line 7925 of file CudaDnn.cs.

Property Documentation

◆ BaseSize

ulong MyCaffe.common.CudaDnn< T >.BaseSize

Returns the base data type size (e.g. float= 4, double = 8).

Definition at line 2428 of file CudaDnn.cs.

◆ DefaultPath

string MyCaffe.common.CudaDnn< T >.DefaultPath

Specifies the default path used t load the Low-Level Cuda DNN Dll file.

Definition at line 1931 of file CudaDnn.cs.

◆ KernelHandle

long MyCaffe.common.CudaDnn< T >.KernelHandle

Returns the Low-Level kernel handle used for this instance. Each Low-Level kernel maintains its own set of look-up tables for memory, streams, cuDnn constructs, etc.

Definition at line 1811 of file CudaDnn.cs.

◆ OriginalDeviceID

int MyCaffe.common.CudaDnn< T >.OriginalDeviceID

Returns the original device ID used to create the instance of CudaDnn.

Definition at line 2004 of file CudaDnn.cs.

◆ Path

string MyCaffe.common.CudaDnn< T >.Path

Specifies the file path used to load the Low-Level Cuda DNN Dll file.

Definition at line 1923 of file CudaDnn.cs.

◆ TotalMemoryUsed

ulong MyCaffe.common.CudaDnn< T >.TotalMemoryUsed

Returns the total amount of GPU memory used by this instance.

Definition at line 1794 of file CudaDnn.cs.

◆ TotalMemoryUsedAsText

string MyCaffe.common.CudaDnn< T >.TotalMemoryUsedAsText

Returns the total amount of memory used.

Definition at line 1802 of file CudaDnn.cs.

The documentation for this class was generated from the following file: