tensornn.activation
This file contains the activation functions of TensorNN. Activation functions modify their input to create non-linearity in the network. This allows your network to handle more complex problems. They are very similar to a layer.
Classes
Base activation class. |
|
Exponential linear unit is similar to ReLU, but it is not piecewise. |
|
Leaky ReLU is extremely similar to ReLU. |
|
The LeCun Tanh function is a scaled version of the tanh function, such that LecunTanh(1) = 1 and LecunTanh(-1) = -1 |
|
NOTE: THIS IS NOT A GOOD CANDIDATE. |
|
Linear activation function, doesn't change anything. |
|
The rectified linear unit activation function is one of the simplest activation function. |
|
The sigmoid function's output is always between -1 and 1 Formula: |
|
The softmax activation function is most commonly used in the output layer. |
|
The swish activation function is the output of the sigmoid function multiplied by x. |
|
The tanh function is similar to the sigmoid function, but it is always between -1 and 1. |
- class tensornn.activation.Activation
Bases:
ABC,TensorNNObjectBase activation class. All activation classes should inherit from this.
- class tensornn.activation.ELU(a: float = 1)
Bases:
ActivationExponential linear unit is similar to ReLU, but it is not piecewise. Formula:
A*((e^x)-1)| constants: A, e(Euler’s number, 2.718…)Ex, A=1:
[12.319, -91.3, 0.132] -> [12.319, -1, 0.132]
- class tensornn.activation.LeakyReLU(a: float = 0.1)
Bases:
ActivationLeaky ReLU is extremely similar to ReLU. ReLU is LeakyReLU if A was 1. Formula:
if x>=0, x; if x<0, Ax| constants: A(leak)Ex, A=0.1:
[12.319, -91.3, 0.132] -> [12.319, -9.13, 0.132]
- class tensornn.activation.LecunTanh
Bases:
ActivationThe LeCun Tanh function is a scaled version of the tanh function, such that LecunTanh(1) = 1 and LecunTanh(-1) = -1
- class tensornn.activation.NewtonsSerpentine(a: float = 1, b: float = 1)
Bases:
ActivationNOTE: THIS IS NOT A GOOD CANDIDATE. Larger numbers result in a lower value, which means being large doesn’t give importance. Do not use unless you want to have some fun ;)
Formula:
(A*B*x)/(x^2+A^2)| A, B constantsEx, A=1,B=1:
[12.319, -91.3, 0.132] -> [0.08064402, -0.01095159, 0.12973942]
- class tensornn.activation.NoActivation
Bases:
ActivationLinear activation function, doesn’t change anything. Use this if you don’t want an activation.
- class tensornn.activation.ReLU
Bases:
ActivationThe rectified linear unit activation function is one of the simplest activation function. It is a piecewise function. Formula:
if x>=0, x; if x<0, 0Ex:
[12.319, -91.3, 0.132] -> [12.319, 0, 0.132]
- class tensornn.activation.Sigmoid
Bases:
ActivationThe sigmoid function’s output is always between -1 and 1 Formula:
1 / (1+e^(-x))| constants: e(Euler’s number, 2.718…)Ex:
[12.319, -91.3, 0.132] -> [9.99995534e-01, 2.23312895e-40, 5.32952167e-01]
- class tensornn.activation.Softmax
Bases:
ActivationThe softmax activation function is most commonly used in the output layer. If you are using this activation function, you should be using tnn.CategoricalCrossEntropy as your loss function. This is because the softmax function always generates a probability distribution with all values between 0 and 1, and for these types of values, tnn.CategoricalCrossEntropy is the best loss function to use.
The goal of softmax is to convert the predicted values of the network into percentages that add up to 1. Ex. it converts [-1.42, 3.312, 0.192] to [0.00835, 0.94970, 0.41935] which is much easier to understand.
When coming up with a way to write this, a big problem is negative numbers since we can’t have negative numbers in our final output. So how do we get rid of them? Do we clip them to 0? Do we square them? Do we use absolute value? Though all these methods seem nice, they take away from the value of negative numbers. If we clip to 0 then negative numbers are no more than just 0, and squaring or using absolute value will just result in the opposite of what we want (large negative number turns into large positive number). So the most effective way is to use exponentiation. Through exponentiation, negative numbers will be small while positive numbers will be large.
But exponentiation raises a new problem, super large numbers which can cause overflow. Fortunately there is a simple solution, we can convert all the values into non positive values prior to exponentiation. We can do this by subtracting each value by the maximum value of our output. This way our values before exponentiation will range between -inf to 0 and our values after exponentiation will range between 0 (e^-inf) to 1 (e^0).
Finally, to come up with all the percentages we can just figure out how much each value contributes to the final sum, what fraction of the sum does each value make. So we can do each value divided by the total sum.
All steps/TLDR: Starting values (from previous example): [-1.42, 3.312, 0.192] Subtract largest value to make all negative: 3.312 is max so subtract from all values, [-4.732, 0, -3.120] Exponentiation, raise each value to e (e^x): [0.0080884, 1, 0.04415717] Come up with percentages, divide each number by the sum: sum is 1.05224557 so we divide each value by it, [0.00836574, 0.94969828, 0.04193599]
- class tensornn.activation.Swish
Bases:
ActivationThe swish activation function is the output of the sigmoid function multiplied by x. Formula:
x / (1+e^(-x))| constants: e(Euler’s number, 2.718…)Ex:
[12.319, -91.3, 0.132] -> [1.23189450e+01, -2.03884673e-38, 7.03496861e-02]
- class tensornn.activation.Tanh
Bases:
ActivationThe tanh function is similar to the sigmoid function, but it is always between -1 and 1.