ford
ford copied to clipboard
Refactor parsing library and split into separate project
The current state of the parsing library is... not so great. It's very complicated and confusing and results in way too large a file in [ford/sourceform.py]. However, the functionality it implements is very useful and could allow for the development of static analysis tools, linters, etc. As such, we should ideally look towards refactoring this part of the code, simplifying it and reducing dependencies between the parts. The output should also be something more standard than the mess of Python objects it produces now (e.g., JSON, YAML, XML, or some other easily parsed and interpreted serialisation). This would be a major project but would be extremely worthwhile as it would result in the code being simpler, more maintainble, and more widely useable.
Some things to consider doing:
- Change ford/reader.py to keep better track of locations in code, so these can be used in docs
- Get rid of inheritance hierarchy in ford/sourceform.py? (Not sure it helps much...)
- Change the IDs and output layout to be based off of slugs or hashes?
- Consider relationship between parsing documentation comments and parsing code (i.e., static analysis doesn't need or want the former)
- Think about some more general way of parsing documentation comments
Would Protocol Buffers be a good choice? Fast and powerful, but not human readable. However, I like the ability to define schemas!
I'll point out that there is also a parser in hansec/fortran-language-server. Is it possible that there is useful communication which should be happening between these projects?
Yes, that's a nice tool, it would be good to see if there's a common parsing library we can work on. There's also the LLVM flang project, which is C++, but might also be a good foundation.