saltzero
saltzero copied to clipboard
Retrained Model W/ Tuned Architecture
From your documentation it looks like you tuned the architecture of the model after the model stopped improving. Did you do some form of transfer learning with the existing stalled model to bootstrap the new model with the tuned architecture or did you just start it from scratch?
I upgraded the network architecture a few times. My memory isn't perfectly clear, but I'm fairly certain that most times I bootstrapped it based on the previous games played with the old architecture, and one time I started from scratch.
(Also, I'm guessing that architecture-wise a CNN would still be better than FC despite what I wrote in the readme, since it seems to work fine in chess.)
Alright, thanks. Seems like you're doing what I've seen most people do.
Additionally, I have played around with the interop between C++ and Python. I've found a bit of a performance gain if you use PyObject_CallObject
and manage passing the data in and reading the results out. Then you don't have to do string encoding/decoding. Thought you might be interested. Here's a snippet of my usage below:
void predict(vector<vector<float>> &states) {
float *x = new float[states.size() * stateSize];
for (int i = 0; i < states.size(); i++)
{
std::copy(states[i].begin(), states[i].end(), x + (i * stateSize));
}
Python::mut.lock();
PyObject *queryStringArgs = PyTuple_New(1);
npy_intp dimsX[2] = {(npy_intp)states.size(), (npy_intp)stateSize};
PyArrayObject *pX = (PyArrayObject *)PyArray_SimpleNewFromData(2, dimsX, NPY_FLOAT, (void *)x);
PyTuple_SET_ITEM(queryStringArgs, 0, (PyObject *)pX);
auto queryResult = PyObject_CallObject(pPredictFunc, queryStringArgs);
PyArrayObject* pPolicies = (PyArrayObject*) PyList_GetItem(queryResult, 0);
PyArrayObject* pValues = (PyArrayObject*) PyList_GetItem(queryResult, 1);
auto policiesOut = (float *)PyArray_DATA(pPolicies);
auto valuesOut = (float *)PyArray_DATA(pValues);
// Do something with the output and release the lock
}
A few other questions if you have the time:
- Is there a reason why you didn't include the drawn global boards like you did for curr/other in the state? That would bump your state size up from 189 to 198, but give a more complete representation of the game state.
- What was the final optimal architecture you settled on? Is it the one included in the v0.0.3 release?
Thanks, that's quite interesting!
- You're right, I should probably include drawn global boards in the state. That just didn't cross my mind when I was constructing it.
- Yes, that should be the final architecture (for now).
About the C++ to python interop: I was actually thinking about doing a rewrite with PyTorch (potentially with CNNs instead), since it has way better C++ APIs, and the whole thing could just be done in C++ for maximum efficiency.
Yea, I think even with the more direct cpython interop there's some perf left on the table from my experience. I think moving to a completely c++ based solution would be more optimal.
I've been training up my own UTTT model from my own project. Just did some testing against your v0.0.3 model. I limited both models to 200 playouts, as opposed to the 400 ms time limit in your bot.cpp
.
Mine as cross (w/l/d): 40/0/0 Mine as nought (w/l/d): 20/10/10
Shoot me an email at [email protected] if you're interested in getting access to the model. Would love to chat further about what you've learned.
Neat! I do have another model which is trained slightly more than the last released model, I'll publish that soon as well.
I've just uploaded the last model, this was trained in about February but I didn't get around to ever uploading it. I'll probably start working on the pytorch version soon.