quantum copied to clipboard
Kernel freeze at tf.keras.Sequential.fit()
What I did?
Link to Colab: https://colab.research.google.com/drive/1g6BFapSuG0-WCQzxlrDsPKCcmaGemB9f?usp=sharing
Please use emails connected to the GitHub account for request - I'll accept it. Notebook is related to my graduation project and I don't want the work to go fully public yet.
I created a custom layer with quantum circuit in quantum_circuit()
to represent 8x8 image - 4 readout qubits with two H gates, connected to 16 qubits by ZZ**(param) gates for each of 4 readouts. (8x8 extension of what can be found in MNIST Classification example.
The image is divided into 4 4x4 pieces, each connected to single readout qubit.
The data is represented similarly to what can be found in the example (X
gate if normalized_color > 0.5
I attached a softmax layer directly to quantum one for classification using tf.keras.Sequential
model, since I want to extend it further - up to all 10 digits.
qnn_model = tf.keras.Sequential([
tf.keras.Input(shape=(), dtype=tf.string, name='q_input'),
tfq.layers.PQC(model_circuit, model_readout, name='quantum'),
tf.keras.layers.Dense(2, activation=tf.keras.activations.softmax, name='softmax'),
Model: "sequential"
Layer (type) Output Shape Param #
quantum (PQC) (None, 4) 64
softmax (Dense) (None, 2) 10
Total params: 74
Trainable params: 74
Non-trainable params: 0
I compiled the model and I tried to fit it.
What was expected to happen?
The model should start to iterate over given number of epochs.
What happened?
Epoch 1/10
is displayed, but nothing else happens.
- The Colab kernel restarts yielding log, that can be found in the Attachements section.
- Using WSL2 local environment I just encountered something I would call 'a kernel freeze'. The cell was trying to run, but there was nothing happening - no CPU, RAM usage. The operation could not have been interrupted - only kernel restart worked.
tensorflow 2.3.1
tensorflow-quantum 0.4.0
for both:
- Google Colab
- Windows Subsystem Linux 2 (Ubuntu 20.04.1 LTS; Windows 10 Pro, build 20270)
No GPU involved.
What I found out?
When I try to run the notebook with compressed_image_size = 4
everything works as intended. I've checked my quantum_circuit()
and it seems to be working as intended for version 8x8 - it generates circuit with desired architecture.
When I tried to trace down the error I found out that:
yields correct epoch
, but the tf.data.Iterator data_iterator
has AttributeErrors like
AttributeError: 'OwnedIterator' object has no attribute '_self_unconditional_checkpoint_dependencies'
AttributeError: 'OwnedIterator' object has no attribute '_self_name_based_restores'
AttributeError("'OwnedIterator' object has no attribute '_self_unconditional_checkpoint_dependencies'")
AttributeError("'OwnedIterator' object has no attribute '_self_unconditional_dependency_names'")
AttributeError("'OwnedIterator' object has no attribute '_self_update_uid'")
I'm not sure if this is relevant.
Dec 15, 2020, 10:41:32 AM | WARNING | WARNING:root:kernel b6193863-8d44-476f-b8cc-eadbe7129967 restarted
Dec 15, 2020, 10:41:32 AM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.133076: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.133022: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1b91640 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.131837: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2199995000 Hz
Dec 15, 2020, 10:40:56 AM | WARNING | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.125112: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.124271: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (0071d832075f): /proc/driver/nvidia/version does not exist
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.123595: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.109400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
Dec 15, 2020, 10:40:53 AM | WARNING | 2020-12-15 09:40:53.250994: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Dec 15, 2020, 10:37:53 AM | WARNING | WARNING:root:kernel b6193863-8d44-476f-b8cc-eadbe7129967 restarted
Dec 15, 2020, 10:37:53 AM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.601416: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.601370: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20c3640 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.600345: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2199995000 Hz
Dec 15, 2020, 10:36:24 AM | WARNING | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.593357: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.592695: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (0071d832075f): /proc/driver/nvidia/version does not exist
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.592632: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.531111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
Dec 15, 2020, 10:36:20 AM | WARNING | 2020-12-15 09:36:20.926549: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Dec 15, 2020, 10:36:01 AM | INFO | Adapting to protocol v5.1 for kernel b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:42 AM | INFO | Adapting to protocol v5.1 for kernel b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:41 AM | INFO | Kernel started: b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:13 AM | INFO | Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Dec 15, 2020, 10:33:13 AM | INFO |
Dec 15, 2020, 10:33:13 AM | INFO | The Jupyter Notebook is running at:
Dec 15, 2020, 10:33:13 AM | INFO | 0 active kernels
Dec 15, 2020, 10:33:13 AM | INFO | Serving notebooks from local directory: /
Dec 15, 2020, 10:33:13 AM | INFO | google.colab serverextension initialized.
Dec 15, 2020, 10:33:13 AM | INFO | Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
Dec 15, 2020, 10:33:13 AM | WARNING | Config option `delete_to_trash` not recognized by `ColabFileContentsManager`.
There is really a lot going on in the code. Do you have any ideas where I could place my breakpoints and focus? Is there any easier way to trace the source of this bug?
I've just sent a request (using my @google.com email). Will be able to look more closely into things once you can share the notebook with me. I will be sure to not share any details of the code here and just focus on the bug itself.
Thanks for the interest!
I've just given you the editor permissions. If you have any questions or concerns fell free to ask.
No problem. So at first glance I think you've solved your own problem in your comment on the side there.
The compressed_image_size
is too big with a value of 8. Quick review on quantum circuit simulation:
Simulating n
qubits takes 2^n
memory. So looking at your code:
=> compressed_image_shape = (8,8)
Then in the line: qubits = cirq.GridQubit.rect(*compressed_image_shape)
=> len(qubits) == 64
Mathing that out really quick gives us a state vector with 2^64
complex amplitudes where one amplitude is 64 bits means you requested 147 Exabytes of RAM. A bit too much :). In general simulations cap out around 30 qubits unless you've got some serious hardware and you might be able to push things up to 35-40.
My guess is that the malloc call didn't fail gracefully on that size which is a bug we should probably look into. Does this help clear things up ?
Yeah. This totally explains the behavior. This was the first thing that came to my mind, but I couldn't find any errors related to hardware, so I assumed everything was correct.
Nevertheless some error message would be really helpful here. It shouldn't pass silently :)
I wanted to contribute and add the error handling, but I got lost in the codebase... 🤯
Anyway... I finished and published the thesis. It even got highlighted by IEEE and there is a followup paper presented on CORES'21 going public soon.
I hope you enjoy it. In case of questions or anything, please contact my by the GitHub email :)
That's awesome! Always happy to see more publications making use of TFQ!
Any updates on this issue @rafalpotempa or can it be closed?