fastpat
fastpat copied to clipboard
Error on Parse Grant
Hi,
Testing your pipeline.
- Fetch_grant.py (trimmed file in meta to have two files) This seem to work and fetch and expanded data in to ~/data directory.
- parse_grant.py is giving me an error:
AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/xxxxxxx/Desktop/patents-master/parse_grant.py'.
Any guidance to resolve?
Andy
Thanks for the feedback! It looks like this is the multiprocessing issues discussed here: https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
From the comments there, it seems like this occurs on Windows when running with IPython/Jupyter. How are you running the script? If you are doing it through IPython or Jupyter, I would try running it directly with pure python.
Let me know how that goes!
Hi,
I am running a Mac and have am running Python 3.9.7. I am trying it directly from the terminal window.
I have CD to the directory and ls reveals the base python code you created along with the directories created. The data has the grant xml files I was able to down load with fetch_grant. directory listing from ls. LICENSE fetch_maint.py load_data.py parse_maint.py README.md fetch_tmapply.py meta parse_tmapply.py data firm_assign.py parse_apply.py parsed fetch_apply.py firm_cites.py parse_assign.py requirements.txt fetch_assign.py firm_cluster.py parse_compu.py tools fetch_grant.py firm_merge.py parse_grant.py
I have executed python parse_grant.py
and alternatively python3 parse_grant
here is the output to the terminal. I terminated with a control c. Hope this is of help.
Andy
(base) andreashegedus@Andys-iMac patents-master % python parse_grant.py
Process SpawnPoolWorker-1:
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/andreashegedus/Desktop/patents-master/parse_grant.py'>
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/andreashegedus/Desktop/patents-master/parse_grant.py'>
^CProcess SpawnPoolWorker-12:
Process SpawnPoolWorker-7:
Process SpawnPoolWorker-6:
Process SpawnPoolWorker-11:
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-9:
Process SpawnPoolWorker-2:
Process SpawnPoolWorker-3:
Process SpawnPoolWorker-10:
Process SpawnPoolWorker-5:
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
KeyboardInterrupt
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, *self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
Traceback (most recent call last):
KeyboardInterrupt
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target( File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
Traceback (most recent call last):
KeyboardInterrupt
KeyboardInterrupt
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
KeyboardInterrupt
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
KeyboardInterrupt
KeyboardInterrupt
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter
return self._semlock.enter()
Traceback (most recent call last):
File "/Users/andreashegedus/Desktop/patents-master/parse_grant.py", line 365, in
Thanks for the info. It seems like this is a multiprocessing
bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.
If you're willing to test it out, just switch to the library
branch of this repo and move your downloaded grant XML files from data
to data/raw
. After installing the requirements.txt packages, you should be able to run
./patcmd parse grant --datadir data
and hopefully it'll work.
Hi Douglas,
I have downloaded the library fork of the code and it is in a directory called patents-library. I created a directory called data and then one called raw in that directory. I have copied the xml files I had previously downloaded. From a terminal (in MacOS) I cd to the patents-library directory and issued the command: ./patcmd parse grant --datadir data I get this result: (base) @.*** patents-library % ./patcmd parse grant --datadir data /Users/andreashegedus/opt/anaconda3/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /Users/andreashegedus/Desktop/patents-library/patents/tools/simcore.pyx tree = Parsing.p_module(s, pxd, full_module_name) Creating directory data/parsed/grant Creating directory data/tables Concat: grant/grant Table "grant/grant" not found Concat: grant/ipc Table "grant/ipc" not found Concat: grant/cite Table "grant/cite" not found (base) @.*** patents-library % ./patcmd parse grant --datadir data Concat: grant/grant Table "grant/grant" not found Concat: grant/ipc Table "grant/ipc" not found Concat: grant/cite Table "grant/cite" not found (base) @.*** patents-library %
Some missing tables as errors.
What would like me to try next?
Andy
Regards,
Andy Hegedus Founder AGH Analytics, LLC
1561 Ralston Ave Burlingame, CA 94010
@.*** M 650.619.1365
linkedin.com/in/andyhegedus https://www.linkedin.com/in/andyhegedus?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3Bd9eKdQVUTFe5KogRBVC%2BDg%3D%3D
On May 30, 2022, at 2:07 AM, Douglas Hanley @.***> wrote:
Thanks for the info. It seems like this is a multiprocessing bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.
If you're willing to test it out, just switch to the library branch of this repo and move your downloaded grant XML files from data to data/raw. After installing the requirements.txt packages, you should be able to run
./patcmd parse grant --datadir data and hopefully it'll work.
— Reply to this email directly, view it on GitHub https://github.com/iamlemec/patents/issues/5#issuecomment-1140899673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKS4KBUSOYGP3UD5YEX5OZDVMSALLANCNFSM5WJJDPJQ. You are receiving this because you authored the thread.
Hi Douglas,
A bit more testing.
Starting from scratch a bit.
Ran
0. Set up the environment with export PATENTS_DATADIR=data
This worked fine.
- Fetch the grant data with
./patcmd fetch grant
Modified grant_files.txt to have only two weeks of data. ran the command and it worked. There is a directory data/raw/grant with 8 files including the base zip files. - Parse the grant data with
./patcmd parse grant
ran this and ran into trouble. It looks like the same multiprocessor issue.
(base) @.*** patents-library % ./patcmd parse grant Process SpawnPoolWorker-4: Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get return _ForkingPickler.loads(res) AttributeError: Can't get attribute 'parse_file_opts' on <module 'patents.parse.grant' from '/Users/andreashegedus/Desktop/patents-library/patents/parse/grant.py'> Process SpawnPoolWorker-2: Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get return _ForkingPickler.loads(res) AttributeError: Can't get attribute 'parse_file_opts' on <module 'patents.parse.grant' from '/Users/andreashegedus/Desktop/patents-library/patents/parse/grant.py'>
Stopped here.
- Cluster firm names with
./patcmd firms cluster --sources grant
- Process citations with
./patcmd firms cites
Regards,
Andy Hegedus Founder AGH Analytics, LLC
1561 Ralston Ave Burlingame, CA 94010
@.*** M 650.619.1365
linkedin.com/in/andyhegedus https://www.linkedin.com/in/andyhegedus?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3Bd9eKdQVUTFe5KogRBVC%2BDg%3D%3D
On May 30, 2022, at 2:50 PM, Andy Hegedus @.***> wrote:
Hi Douglas,
I have downloaded the library fork of the code and it is in a directory called patents-library. I created a directory called data and then one called raw in that directory. I have copied the xml files I had previously downloaded. From a terminal (in MacOS) I cd to the patents-library directory and issued the command: ./patcmd parse grant --datadir data I get this result: (base) @.*** patents-library % ./patcmd parse grant --datadir data /Users/andreashegedus/opt/anaconda3/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /Users/andreashegedus/Desktop/patents-library/patents/tools/simcore.pyx tree = Parsing.p_module(s, pxd, full_module_name) Creating directory data/parsed/grant Creating directory data/tables Concat: grant/grant Table "grant/grant" not found Concat: grant/ipc Table "grant/ipc" not found Concat: grant/cite Table "grant/cite" not found (base) @.*** patents-library % ./patcmd parse grant --datadir data Concat: grant/grant Table "grant/grant" not found Concat: grant/ipc Table "grant/ipc" not found Concat: grant/cite Table "grant/cite" not found (base) @.*** patents-library %
Some missing tables as errors.
What would like me to try next?
Andy
<PastedGraphic-1.png> Regards,
Andy Hegedus Founder AGH Analytics, LLC
1561 Ralston Ave Burlingame, CA 94010
@.*** @.***> M 650.619.1365
<image001.png>
linkedin.com/in/andyhegedus https://www.linkedin.com/in/andyhegedus?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3Bd9eKdQVUTFe5KogRBVC%2BDg%3D%3D
On May 30, 2022, at 2:07 AM, Douglas Hanley @.*** @.***>> wrote:
Thanks for the info. It seems like this is a multiprocessing bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.
If you're willing to test it out, just switch to the library branch of this repo and move your downloaded grant XML files from data to data/raw. After installing the requirements.txt packages, you should be able to run
./patcmd parse grant --datadir data and hopefully it'll work.
— Reply to this email directly, view it on GitHub https://github.com/iamlemec/patents/issues/5#issuecomment-1140899673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKS4KBUSOYGP3UD5YEX5OZDVMSALLANCNFSM5WJJDPJQ. You are receiving this because you authored the thread.