Can I use this to show the pixels which contribute to the text?
Currently I use https://github.com/lluisgomez/text_extraction to determine which pixels contribute to the text. from the paper “Multi-script text extraction from natural scenes” (Gomez & Karatzas) to appear in ICDAR2013 conference.
It gives results such as:

Can I use this code to do the same thing but with better accuracy? If so, how can I do this?
I do not wish to have the bounding boxes of text, I wish to know which pixels contribute to the text.
Quite difficult to compile in windows, I gave up and compiled in Ubuntu, seems machine learning is a lot easier in Ubuntu (Linux). Now the answer to my question seems to be yes but I have to change the demo code. Correct me if I'm wrong, your code seems to have two phases, it finds potential character segments then it works out which ones are really text by using a neural network and data previously found. It seems this is the way all the text proposals in the wild type of code work. With some adjustment of the code I think I can get the same pixels output.
Your analysis of the code is correct (it is two passes). In fact the "basic" idea is quite similar to the one in text_extraction repository. However, this code do not aim to produce well segmented characters... Even if you manage to get the pixels output the results will be much "noisy" than with the other project. The reason is that here the pixels output is not used for OCR, it is used just to create a bounding box that is used to crop the original image and pass it to the CNN.