2using System.Collections.Generic;
51 m_blobBias =
new Blob<T>(cuda, log);
56 m_blobBias.Reshape(rgShape);
82 private void fillBias(
Blob<T> b)
88 for (
int i = 0; i<b.
height; i++)
90 for (
int j = i + 1; j < b.
width; j++)
92 rgBiasData[i * b.
width + j] = 0;
122 base.ReInitializeParameters(target);
131 m_colInternalBottom.
Clear();
132 m_colInternalBottom.
Add(bottom);
134 m_colInternalTop.
Clear();
135 m_colInternalTop.
Add(top);
140 m_colInternalBottom.
Clear();
142 for (
int i=0; i<rgBottom.Count; i++)
144 m_colInternalBottom.
Add(rgBottom[i]);
147 m_colInternalTop.
Clear();
148 m_colInternalTop.
Add(top);
160 addInternal(
new List<
Blob<T>> { blobX, blobX, blobX, m_blobBias }, colTop[0]);
161 m_mh_att.
LayerSetUp(m_colInternalBottom, m_colInternalTop);
181 addInternal(
new List<
Blob<T>> { blobX, blobX, blobX, m_blobBias }, colTop[0]);
182 m_mh_att.
Reshape(m_colInternalBottom, m_colInternalTop);
199 addInternal(
new List<
Blob<T>> { blobX, blobX, blobX, m_blobBias }, colTop[0]);
200 m_mh_att.
Forward(m_colInternalBottom, m_colInternalTop);
217 if (rgbPropagateDown[0])
219 List<bool> rgbPropagate =
new List<bool>() {
true,
true };
223 addInternal(
new List<
Blob<T>> { blobX, blobX, blobX, m_blobBias }, colTop[0]);
224 m_mh_att.
Backward(m_colInternalTop, rgbPropagate, m_colInternalBottom);
The Log class provides general output in text form.
The BlobCollection contains a list of Blobs.
void Add(Blob< T > b)
Add a new Blob to the collection.
int Count
Returns the number of items in the collection.
void Clear(bool bDispose=false)
Remove all items from the collection.
The Blob is the main holder of data that moves through the Layers of the Net.
void SetData(T[] rgData, int nCount=-1, bool bSetCount=true)
Sets a number of items within the Blob's data.
int height
DEPRECIATED; legacy shape accessor height: use shape(2) instead.
T[] mutable_cpu_data
Get data from the GPU and bring it over to the host, or Set data from the Host and send it over to th...
int width
DEPRECIATED; legacy shape accessor width: use shape(3) instead.
string Name
Get/set the name of the Blob.
The CudaDnn object is the main interface to the Low-Level Cuda C++ DLL.
An interface for the units of computation which can be composed into a Net.
Log m_log
Specifies the Log for output.
LayerParameter m_param
Specifies the LayerParameter describing the Layer.
void convert(BlobCollection< T > col)
Convert a collection of blobs from / to half size.
abstract void LayerSetUp(BlobCollection< T > colBottom, BlobCollection< T > colTop)
Performs Layer specific setup. Derived layers should override this function as well as the Reshape fu...
bool shareLayerBlob(Blob< T > b, List< int > rgMinShape)
Attempts to share a Layer Blob if another parameter Blob with the same name and acceptable size is fo...
void Backward(BlobCollection< T > colTop, List< bool > rgbPropagateDown, BlobCollection< T > colBottom)
Given the top Blob error gradients, compute the bottom Blob error gradients.
virtual bool ReInitializeParameters(WEIGHT_TARGET target)
Re-initialize the parameters of the layer.
double Forward(BlobCollection< T > colBottom, BlobCollection< T > colTop)
Given the bottom (input) Blobs, this function computes the top (output) Blobs and the loss.
float convertF(T df)
Converts a generic to a float value.
abstract void Reshape(BlobCollection< T > colBottom, BlobCollection< T > colTop)
Adjust the shapes of top blobs and internal buffers to accomodate the shapes of the bottom blobs.
BlobCollection< T > m_colInternalBlobs
Specifies internal blobs used by the layer.
CudaDnn< T > m_cuda
Specifies the CudaDnn connection to Cuda.
LayerParameter.LayerType m_type
Specifies the Layer type.
BlobCollection< T > blobs
Returns the collection of learnable parameter Blobs for the Layer.
BlobCollection< T > internal_blobs
Returns the collection of internal Blobs used by the Layer.
The CausalSelfAttention provides a vanilla multi-head self-attention layer with projection at the end...
CausalSelfAttentionLayer2(CudaDnn< T > cuda, Log log, LayerParameter p)
The CausalSelfAttention constructor.
override int ExactNumTopBlobs
Returns the exact number of required top (output) Blobs: attn
override void backward(BlobCollection< T > colTop, List< bool > rgbPropagateDown, BlobCollection< T > colBottom)
Computes the loss error gradient w.r.t the outputs.
override void Reshape(BlobCollection< T > colBottom, BlobCollection< T > colTop)
Reshape the bottom (input) and top (output) blobs.
override void LayerSetUp(BlobCollection< T > colBottom, BlobCollection< T > colTop)
Setup the layer.
override bool ReInitializeParameters(WEIGHT_TARGET target)
Re-initialize the parameters of the layer.
override void forward(BlobCollection< T > colBottom, BlobCollection< T > colTop)
The forward computation.
override void dispose()
Releases all GPU and host resources used by the Layer.
override int ExactNumBottomBlobs
Returns the exact number of required bottom (input) Blobs: input
override void setup_internal_blobs(BlobCollection< T > col)
Derivative layers should add all internal blobws to the 'col' provided.
The MultiheadAttention provides a vanilla multi-head layer.
Specifies the base parameter for all layers.
string name
Specifies the name of this LayerParameter.
MultiheadAttentionParameter multihead_attention_param
Returns the parameter set when initialized with LayerType.MULTIHEAD_ATTENTION
CausalSelfAttentionParameter causal_self_attention_param
Returns the parameter set when initialized with LayerType.CAUSAL_SELF_ATTENTION
LayerType
Specifies the layer type.
The MyCaffe.basecode contains all generic types used throughout MyCaffe.
The MyCaffe.common namespace contains common MyCaffe classes.
WEIGHT_TARGET
Defines the type of weight to target in re-initializations.
The MyCaffe.fillers namespace contains all fillers including the Filler class.
The MyCaffe.layers.gpt namespace contains all GPT related layers.
The MyCaffe.param namespace contains parameters used to create models.
The MyCaffe namespace contains the main body of MyCaffe code that closesly tracks the C++ Caffe open-...