fastpat icon indicating copy to clipboard operation
fastpat copied to clipboard

Error on Parse Grant

Open andyhegedus opened this issue 2 years ago • 5 comments

Hi,

Testing your pipeline.

  1. Fetch_grant.py (trimmed file in meta to have two files) This seem to work and fetch and expanded data in to ~/data directory.
  2. parse_grant.py is giving me an error:

AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/xxxxxxx/Desktop/patents-master/parse_grant.py'.

Any guidance to resolve?

Andy

andyhegedus avatar May 18 '22 18:05 andyhegedus

Thanks for the feedback! It looks like this is the multiprocessing issues discussed here: https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror

From the comments there, it seems like this occurs on Windows when running with IPython/Jupyter. How are you running the script? If you are doing it through IPython or Jupyter, I would try running it directly with pure python.

Let me know how that goes!

iamlemec avatar May 28 '22 22:05 iamlemec

Hi,

I am running a Mac and have am running Python 3.9.7. I am trying it directly from the terminal window.

I have CD to the directory and ls reveals the base python code you created along with the directories created. The data has the grant xml files I was able to down load with fetch_grant. directory listing from ls. LICENSE fetch_maint.py load_data.py parse_maint.py README.md fetch_tmapply.py meta parse_tmapply.py data firm_assign.py parse_apply.py parsed fetch_apply.py firm_cites.py parse_assign.py requirements.txt fetch_assign.py firm_cluster.py parse_compu.py tools fetch_grant.py firm_merge.py parse_grant.py

I have executed python parse_grant.py

and alternatively python3 parse_grant

here is the output to the terminal. I terminated with a control c. Hope this is of help.

Andy

(base) andreashegedus@Andys-iMac patents-master % python parse_grant.py Process SpawnPoolWorker-1: Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get return _ForkingPickler.loads(res) AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/andreashegedus/Desktop/patents-master/parse_grant.py'> Process SpawnPoolWorker-4: Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get return _ForkingPickler.loads(res) AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/andreashegedus/Desktop/patents-master/parse_grant.py'> ^CProcess SpawnPoolWorker-12: Process SpawnPoolWorker-7: Process SpawnPoolWorker-6: Process SpawnPoolWorker-11: Process SpawnPoolWorker-8: Process SpawnPoolWorker-9: Process SpawnPoolWorker-2: Process SpawnPoolWorker-3: Process SpawnPoolWorker-10: Process SpawnPoolWorker-5: Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() KeyboardInterrupt Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() Traceback (most recent call last): KeyboardInterrupt Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target( File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() Traceback (most recent call last): KeyboardInterrupt KeyboardInterrupt File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() KeyboardInterrupt Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() KeyboardInterrupt File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() KeyboardInterrupt KeyboardInterrupt File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 365, in get with self._rlock: File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/synchronize.py", line 95, in enter return self._semlock.enter() Traceback (most recent call last): File "/Users/andreashegedus/Desktop/patents-master/parse_grant.py", line 365, in pool.map(parse_file_opts, file_list, chunksize=1) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 765, in get self.wait(timeout) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 762, in wait self._event.wait(timeout) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/threading.py", line 574, in wait signaled = self._cond.wait(timeout) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/threading.py", line 312, in wait waiter.acquire() KeyboardInterrupt

andyhegedus avatar May 29 '22 23:05 andyhegedus

Thanks for the info. It seems like this is a multiprocessing bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.

If you're willing to test it out, just switch to the library branch of this repo and move your downloaded grant XML files from data to data/raw. After installing the requirements.txt packages, you should be able to run

./patcmd parse grant --datadir data

and hopefully it'll work.

iamlemec avatar May 30 '22 09:05 iamlemec

Hi Douglas,

I have downloaded the library fork of the code and it is in a directory called patents-library. I created a directory called data and then one called raw in that directory. I have copied the xml files I had previously downloaded. From a terminal (in MacOS) I cd to the patents-library directory and issued the command: ./patcmd parse grant --datadir data I get this result: (base) @.*** patents-library % ./patcmd parse grant --datadir data /Users/andreashegedus/opt/anaconda3/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /Users/andreashegedus/Desktop/patents-library/patents/tools/simcore.pyx tree = Parsing.p_module(s, pxd, full_module_name) Creating directory data/parsed/grant Creating directory data/tables Concat: grant/grant Table "grant/grant" not found Concat: grant/ipc Table "grant/ipc" not found Concat: grant/cite Table "grant/cite" not found (base) @.*** patents-library % ./patcmd parse grant --datadir data Concat: grant/grant Table "grant/grant" not found Concat: grant/ipc Table "grant/ipc" not found Concat: grant/cite Table "grant/cite" not found (base) @.*** patents-library %

Some missing tables as errors.

What would like me to try next?

Andy

Regards,

Andy Hegedus Founder AGH Analytics, LLC

1561 Ralston Ave Burlingame, CA 94010

@.*** M 650.619.1365

linkedin.com/in/andyhegedus https://www.linkedin.com/in/andyhegedus?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3Bd9eKdQVUTFe5KogRBVC%2BDg%3D%3D

On May 30, 2022, at 2:07 AM, Douglas Hanley @.***> wrote:

Thanks for the info. It seems like this is a multiprocessing bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.

If you're willing to test it out, just switch to the library branch of this repo and move your downloaded grant XML files from data to data/raw. After installing the requirements.txt packages, you should be able to run

./patcmd parse grant --datadir data and hopefully it'll work.

— Reply to this email directly, view it on GitHub https://github.com/iamlemec/patents/issues/5#issuecomment-1140899673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKS4KBUSOYGP3UD5YEX5OZDVMSALLANCNFSM5WJJDPJQ. You are receiving this because you authored the thread.

andyhegedus avatar May 30 '22 21:05 andyhegedus

Hi Douglas,

A bit more testing.

Starting from scratch a bit.

Ran 0. Set up the environment with export PATENTS_DATADIR=data This worked fine.

  1. Fetch the grant data with ./patcmd fetch grant Modified grant_files.txt to have only two weeks of data. ran the command and it worked. There is a directory data/raw/grant with 8 files including the base zip files.
  2. Parse the grant data with ./patcmd parse grant ran this and ran into trouble. It looks like the same multiprocessor issue.

(base) @.*** patents-library % ./patcmd parse grant Process SpawnPoolWorker-4: Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get return _ForkingPickler.loads(res) AttributeError: Can't get attribute 'parse_file_opts' on <module 'patents.parse.grant' from '/Users/andreashegedus/Desktop/patents-library/patents/parse/grant.py'> Process SpawnPoolWorker-2: Traceback (most recent call last): File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 114, in worker task = get() File "/Users/andreashegedus/opt/anaconda3/lib/python3.9/multiprocessing/queues.py", line 368, in get return _ForkingPickler.loads(res) AttributeError: Can't get attribute 'parse_file_opts' on <module 'patents.parse.grant' from '/Users/andreashegedus/Desktop/patents-library/patents/parse/grant.py'>

Stopped here.

  1. Cluster firm names with ./patcmd firms cluster --sources grant
  2. Process citations with ./patcmd firms cites

Regards,

Andy Hegedus Founder AGH Analytics, LLC

1561 Ralston Ave Burlingame, CA 94010

@.*** M 650.619.1365

linkedin.com/in/andyhegedus https://www.linkedin.com/in/andyhegedus?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3Bd9eKdQVUTFe5KogRBVC%2BDg%3D%3D

On May 30, 2022, at 2:50 PM, Andy Hegedus @.***> wrote:

Hi Douglas,

I have downloaded the library fork of the code and it is in a directory called patents-library. I created a directory called data and then one called raw in that directory. I have copied the xml files I had previously downloaded. From a terminal (in MacOS) I cd to the patents-library directory and issued the command: ./patcmd parse grant --datadir data I get this result: (base) @.*** patents-library % ./patcmd parse grant --datadir data /Users/andreashegedus/opt/anaconda3/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /Users/andreashegedus/Desktop/patents-library/patents/tools/simcore.pyx tree = Parsing.p_module(s, pxd, full_module_name) Creating directory data/parsed/grant Creating directory data/tables Concat: grant/grant Table "grant/grant" not found Concat: grant/ipc Table "grant/ipc" not found Concat: grant/cite Table "grant/cite" not found (base) @.*** patents-library % ./patcmd parse grant --datadir data Concat: grant/grant Table "grant/grant" not found Concat: grant/ipc Table "grant/ipc" not found Concat: grant/cite Table "grant/cite" not found (base) @.*** patents-library %

Some missing tables as errors.

What would like me to try next?

Andy

<PastedGraphic-1.png> Regards,

Andy Hegedus Founder AGH Analytics, LLC

1561 Ralston Ave Burlingame, CA 94010

@.*** @.***> M 650.619.1365

<image001.png>

linkedin.com/in/andyhegedus https://www.linkedin.com/in/andyhegedus?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3Bd9eKdQVUTFe5KogRBVC%2BDg%3D%3D

On May 30, 2022, at 2:07 AM, Douglas Hanley @.*** @.***>> wrote:

Thanks for the info. It seems like this is a multiprocessing bug that kinda shows up in some random subset of platforms and python versions and configurations. I'm actually pretty close to releasing a new version of this that uses a more structured interface. It also runs things through modules, rather than through top-level scripts, so it might actually solve this issue for you.

If you're willing to test it out, just switch to the library branch of this repo and move your downloaded grant XML files from data to data/raw. After installing the requirements.txt packages, you should be able to run

./patcmd parse grant --datadir data and hopefully it'll work.

— Reply to this email directly, view it on GitHub https://github.com/iamlemec/patents/issues/5#issuecomment-1140899673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKS4KBUSOYGP3UD5YEX5OZDVMSALLANCNFSM5WJJDPJQ. You are receiving this because you authored the thread.

andyhegedus avatar May 30 '22 22:05 andyhegedus