pythontex icon indicating copy to clipboard operation
pythontex copied to clipboard

Extracting code from a document into a .py file

Open sergei-mironov opened this issue 3 years ago • 3 comments

Hi. Is there a possibility to extract all Python code from a document into a separate Python file in PythonTex? AFAIK, PyWeave has an execution mode called ptangle which does right that.

The reason why I am asking this is follows: I recently realized that it is really hard to copypaste the Python code from the code boxes of Pdf documents produced by PythonTex. In my case, the document contains line numbers and also I use Evince for viewing pdfs (not sure if the problem is viewer-dependent or not). My planned workaround is to provide users with raw text code files.

sergei-mironov avatar Mar 04 '21 13:03 sergei-mironov

Note, that I tried to parse the .*pytxcode with the 2-liner sed -n '/^=>PYTHONTEX:SETTINGS/q;p' "$1" | sed 's/^=>PYTHONTEX.*//g', but my sed skill is not enough to deal with indentations like this

...
build_rref:RRef = evaluate(stage_build)
print(build_rref)
=>PYTHONTEX#py#stdout#default#3#block#####367#
      print(mklens(build_rref).fetch_ref.rref)   # Unlucky snippet happened to be inside the itemized list of the original document
      print(mklens(build_rref).fetch_ref.url.val)
=>PYTHONTEX#py#stdout#default#4#block#####385#
from subprocess import run, PIPE
print(run([mklens(build_rref).bin.syspath], stdout=PIPE).stdout.decode('utf-8'))
=>PYTHONTEX:SETTINGS#
...

sergei-mironov avatar Mar 04 '21 13:03 sergei-mironov

You could try something like this on the *.pytxcode:

import collections
import pathlib
import sys

sources = collections.defaultdict(list)

pytxcode = sys.argv[1]
with open(pytxcode, encoding='utf8') as f:
    in_source = False
    source_name = None
    for line in f:
        if line.startswith('=>PYTHONTEX#'):
            in_source = True
            source_name = f'source_{line.split("#")[1]}_{line.split("#")[2]}.py'
        elif line.startswith('=>PYTHONTEX') or line.startswith('=>DEPYTHONTEX'):
            in_source = False
        elif in_source:
            sources[source_name].append(line)

source_path = pathlib.Path('pythontex_sources')
if not source_path.is_dir():
    source_path.mkdir()
for source, lines in sources.items():
    with open(source_path / source, 'w', encoding='utf8') as f:
        f.write(''.join(lines))

Usage: python ./extract_source.py ./test.pytxcode

This feature has been on my list of features to add to PythonTeX and Codebraid for a while, so hopefully I'll have time to add built-in support at some point.

gpoore avatar Mar 04 '21 22:03 gpoore

You could try something like this on the *.pytxcode:

Usage: python ./extract_source.py ./test.pytxcode

This feature has been on my list of features to add to PythonTeX and Codebraid for a while, so hopefully I'll have time to add built-in support at some point.

Thank you, this code helped. Here is the updated version

#!/usr/bin/env python3

import collections
import pathlib
import sys

sources = collections.defaultdict(list)

pytxcode = sys.argv[1]
dstsource = sys.argv[2] if len(sys.argv)==3 else "pythontex_sources"
with open(pytxcode, encoding='utf8') as f:
  in_source = False
  spaces_to_trim = None
  source_name = None
  for line in f:
    if line.startswith('=>PYTHONTEX#'):
      in_source = True
      spaces_to_trim = None
      source_name = f'source_{line.split("#")[1]}_{line.split("#")[2]}.py'
    elif line.startswith('=>PYTHONTEX') or line.startswith('=>DEPYTHONTEX'):
      in_source = False
    elif in_source:
      if spaces_to_trim is None:
        # Detect the number of leading spaces to trim using the first line
        spaces_to_trim = 0
        for c in line:
          if c!=' ':
            break
          spaces_to_trim+=1
      if len(line[:spaces_to_trim].strip()) != 0:
        print(f"Can't find {spaces_to_trim} spaces at the beginning of line '{line}'")
      else:
        line=line[spaces_to_trim:]
      sources[source_name].append(line)

if len(sources.keys())==1:
  with open(dstsource, 'w', encoding='utf8') as f:
    f.write(''.join(sources[list(sources.keys())[0]]))
else:
  source_path = pathlib.Path(dstsource)
  if not source_path.is_dir():
    source_path.mkdir()
  for source, lines in sources.items():
    with open(source_path / source, 'w', encoding='utf8') as f:
      f.write(''.join(lines))

Usage: python ./test.pytxcode ./extract_source.py

sergei-mironov avatar Mar 05 '21 09:03 sergei-mironov