1 min readNov 6, 2019
Thank you for the great article! I have a simple question. We train 2 matrices w1 and w2. I think all of these should contain information about the relations between words, but when we calculate vectors, only the w1 matrix is used. Why can we drop w2? By the way, why don’t we use the transposed input matrix as the output matrix? Thank you.