__init__.py

Markdown to PO files extractor according to mdpo specification.

mdpo.md2po.__init__.markdown_to_pofile(glob_or_content, ignore=[], msgstr='', po_filepath=None, save=False, mo_filepath=None, plaintext=False, wrapwidth=78, mark_not_found_as_obsolete=True, preserve_not_found=True, location=True, extensions=['collapse_whitespace', 'tables', 'strikethrough', 'tasklists', 'latex_math_spans', 'wikilinks'], po_encoding=None, md_encoding='utf-8', xheaders=False, include_codeblocks=False, ignore_msgids=[], command_aliases={}, metadata={}, events={}, debug=False, **kwargs)

Extracts all the msgids from a string of Markdown content or a group of files.

Parameters
  • glob_or_content (str) – Glob path to Markdown files or a string with valid Markdown content.

  • ignore (list) – Paths of files to ignore. Useful when a glob does not fit your requirements indicating the files to extract content. Also, filename or a dirname can be defined without indicate the full path.

  • msgstr (str) – Default message string for extracted msgids.

  • po_filepath (str) – File that will be used as polib.POFile instance where to dump the new msgids and that will be used as source checking not found strings that will be marked as obsolete if is the case (see save and mark_not_found_as_obsolete optional parameters).

  • save (bool) – Save the new content to the pofile indicated in the parameter po_filepath. If is enabled and po_filepath is None a ValueError will be raised.

  • mo_filepath (str) – The resulting pofile will be compiled to a mofile and saved in the path specified at this parameter.

  • plaintext (bool) – If you pass True to this parameter (as default) the content will be extracted as is, without markup characters included. Passing plaintext as False, extracted msgids will contain some markup characters used to appoint the location of `inline code`, **bold text**, *italic text* and `[links]`, that might be useful for you. It depends on the use you are going to give to this library activate this mode (plaintext=False) or not.

  • wrapwidth (int) – Wrap width for po file indicated at po_filepath parameter. Only useful when the -w option was passed to xgettext.

  • mark_not_found_as_obsolete (bool) – The strings extracted from markdown that will not be found inside the provided pofile will be marked as obsolete.

  • preserve_not_found (bool) – The strings extracted from markdown that will not be found inside the provided pofile wouldn’t be removed. Only has effect if mark_not_found_as_obsolete is False.

  • location (bool) – Store references of top-level blocks in which are found the messages in PO file #: reference comments.

  • extensions (list) – md4c extensions used to parse markdown content, formatted as a list of ‘pymd4c’ keyword arguments. You can see all available at pymd4c repository.

  • po_encoding (str) – Resulting pofile encoding.

  • md_encoding (str) – Markdown content encoding.

  • xheaders (bool) – Indicates if the resulting pofile will have mdpo x-headers included. These only can be included if the parameter plaintext is False.

  • include_codeblocks (bool) – Include all code blocks found inside pofile result. This is useful if you want to translate all your blocks of code. Equivalent to append <!-- mdpo-include-codeblock --> command before each code block.

  • ignore_msgids (list) – List of msgids ot ignore from being extracted.

  • command_aliases (dict) – Mapping of aliases to use custom mdpo command names in comments. The mdpo- prefix in command names resolution is optional. For example, if you want to use <!-- mdpo-on --> instead of <!-- mdpo-enable -->, you can pass the dictionaries {"mdpo-on": "mdpo-enable"} or {"mdpo-on": "enable"} to this parameter.

  • metadata (dict) – Metadata to include in the produced PO file. If the file contains previous metadata fields, these will be updated preserving the values of the already defined.

  • events (dict) –

    Preprocessing events executed during the parsing process. You can use these to customize the extraction process. Takes functions or list of functions as values. If one of these functions returns False, that part of the parsing is skipped by md2po (usually a MD4C event). The available events are:

    • enter_block(self, block, details): Executed when the parsing a Markdown block starts.

    • leave_block(self, block, details): Executed when the parsing a Markdown block ends.

    • enter_span(self, span, details): Executed when the parsing of a Markdown span starts.

    • leave_span(self, span, details): Executed when the parsing of a Markdown span ends.

    • text(self, block, text): Executed when the parsing of text starts.

    • command(self, mdpo_command, comment, original command): Executed when a mdpo HTML command is found.

    • msgid(self, msgid, msgstr, msgctxt, tcomment, flags): Executed when a msgid is going to be stored.

    • link_reference(self, target, href, title): Executed when a link reference is going to be stored.

    All self arguments are an instance of Md2Po parser. You can take advanced control of the parsing process manipulating the state of the parser. For example, if you want to skip a certain msgid to be included, you can do:

    def msgid_event(self, msgid, *args):
        if msgid == 'foo':
            self._disable_next_line = True
    

  • debug (bool) – Add events displaying all parsed elements in the extraction process.

Examples

>>> content = 'Some text with `inline code`'
>>> entries = markdown_to_pofile(content, plaintext=True)
>>> {e.msgid: e.msgstr for e in entries}
{'Some text with inline code': ''}
>>> entries = markdown_to_pofile(content)
>>> {e.msgid: e.msgstr for e in entries}
{'Some text with `inline code`': ''}
>>> entries = markdown_to_pofile(content, msgstr='Default message')
>>> {e.msgid: e.msgstr for e in entries}
{'Some text with `inline code`': 'Default message'}
Returns

polib.POFile Pofile instance with new msgids included.

Raises

ValueError: when po_filepath is None and save is True.