Home

Commit Notes [wip]

tinygrad/tinygrad/pull/15463/changes

This commit adds weakint to the promotion lattice, which is an object used for type promotion.

How does it work?

The promotion lattice is a hashmap used to create a dag of dtype nodes. A helper finds the LCA.

Why does this not use the LeetCode algo for LCA?
- The LeetCode algo traverses from root to leaf, this is from leaves to lca.
  - Is this less efficient, but that doesn’t matter because n is small?
    - Yes

Why would the user create a weak int instead of defining their dtype up front?

It is what python int defaults to.
- Why? Why not default python int to something.
  - Annoying upcasting. Tensor([1.0],[2.0]).half + 3 could cast the Tensor’s float 16 to float 32 if you defaulted 3 to a int32 instead of a weakint.

Are there any ML frameworks where all dtypes must be defined or they will error?

There are flags in torch and jax, but otherwise no, would be too annoying.

tinygrad/tinygrad/pull/15356/changes

Refactoring shared logic (bitwise_not() and elementwise.py) that touched Tinygrad’s inverse/min/max functions. These function are interesting because they are dtype dependent and also require an interesting property.

There is not MIN op in Tinygrad. MIN is just inverse(MAX(inverse(X),inverse(Y))). Inverse must therefore be an involution.
bool, float, and int are stored differently in memory. Therefore inversion must be handled differently. (e.g. a bitwise not might be useful in inverting an int, but will scramble a float).

tinygrad/tinygrad/pull/15416/changes

I found that the sign() function in elementwise.py contained a hack similar to that which was removed in the commit I read yesterday. It could be safely removed, which also has the (pleasant?) side effect of no longer causing the function to cast boolean input to integer output.

tinygrad/tinygrad/pull/15367/changes

In order to prevent a run time error from being raised when the user called .gradient(some_tensor) after writing another_tensor.copysign(some_tensor) there was a hack in copysign() that added other.sign()*0 to link other to the rest of the graph so that .backwards could reach other from the root.

Mathematically, if a tensor was not connected to the root via the backwards pass, then even if it’s gradient wasn’t materialized, it’s derivative with respect to the root (the loss), is 0. So the materialize_grads parameter was removed and gradient just returns 0 if something doesn’t exist.