In lecture 16, in minute 27, the professor talks about solving for the least squared error using calculus and taking partial derivatives.
I can't under stand why we set for example d=0 and proceed ? What relation does this have to gradient descent ?

