Discussion:
[O] [PATCH] Add new keyword :coding for #+include directive
Pierre Téchoueyres
2018-04-16 19:52:23 UTC
Permalink
Hello org's developpers,

I want to propose the attached patch which allow you to specify an
optionnal `:coding' keyword to the `#+INCLUDE:' directive.

This allow you to specify something like
#+begin_example
,#+INCLUDE: "myfile.cmd" src cmd :coding "cp850-dos"
#+end_example
Which allow you to have different encoding for your various sources
files.

This allow me to include localised Microsoft Windows batch sources
inside my utf-8-unix org-files.

Thanks in advance for your remarks.

Pierre
Pierre Téchoueyres
2018-04-16 21:30:14 UTC
Permalink
Hello org's developpers,
Sorry, I forgot the patch.
so here is the whole mail + patch:

I want to propose the attached patch which allow you to specify an
optionnal `:coding' keyword to the `#+INCLUDE:' directive.

This allow you to specify something like
#+begin_example
,#+INCLUDE: "myfile.cmd" src cmd :coding "cp850-dos"
#+end_example
Which allow you to have different encoding for your various sources
files.

This allow me to include localised Microsoft Windows batch sources
inside my utf-8-unix org-files.

Thanks in advance for your remarks.

Pierre
Nicolas Goaziou
2018-04-17 08:36:35 UTC
Permalink
Hello,
Post by Pierre Téchoueyres
I want to propose the attached patch which allow you to specify an
optionnal `:coding' keyword to the `#+INCLUDE:' directive.
Thank you.
Post by Pierre Téchoueyres
This allow you to specify something like
#+begin_example
,#+INCLUDE: "myfile.cmd" src cmd :coding "cp850-dos"
#+end_example
The quotes are not necessary. AFAICT, coding systems do not contain
spaces.
Post by Pierre Téchoueyres
Which allow you to have different encoding for your various sources
files.
This allow me to include localised Microsoft Windows batch sources
inside my utf-8-unix org-files.
Is it really an Org problem? E.g., couldn't you put a coding: cookie in
your ".cmd" file? IMO, the coding system depends on the includee, not
the includer.

WDYT?

Regards,
--
Nicolas Goaziou
Pierre Téchoueyres
2018-04-18 18:40:32 UTC
Permalink
Hello,
Post by Nicolas Goaziou
Hello,
...
Post by Pierre Téchoueyres
This allow you to specify something like
#+begin_example
,#+INCLUDE: "myfile.cmd" src cmd :coding "cp850-dos"
#+end_example
The quotes are not necessary. AFAICT, coding systems do not contain
spaces.
Ok, I've reworked the rexexp to suppress them. New patch attached
Pierre Téchoueyres
2018-04-20 23:08:07 UTC
Permalink
***@free.fr (Pierre Téchoueyres) writes:

Hello,
...
Post by Nicolas Goaziou
...
Is it really an Org problem? E.g., couldn't you put a coding: cookie in
your ".cmd" file? IMO, the coding system depends on the includee, not
the includer.
I tend to aggree with you that TRTDT is to put cookies or something
a) This seem to not work as expected (see exemples joinned).
b) Sometimes you can't modify the included file (ex: remote file access).
...
For a I can try to jump into the rabbit hole and find a solution (but
for now I'm lost)
I did my homework : found a fix for a) and rebased previous patch on master.

Regards,
Nicolas Goaziou
2018-04-23 10:27:45 UTC
Permalink
Hello,
Post by Pierre Téchoueyres
I did my homework : found a fix for a) and rebased previous patch on master.
Thank you. Some comments follow.
Post by Pierre Téchoueyres
* lisp/ox.el (org-export-expand-include-keyword): Add new keyword
`:coding' for specify the file encoding whith the `#+include:'
directive.
#+include: "./myfile" :coding cp850-dos
when your org-file is encoded in utf-8 for example.
Nitpick: Org file
Post by Pierre Téchoueyres
** New features
+*** New keyword for ~#+include:~ directive
+Add ~:coding "codign-system"~ keyword to allow include of files from
=~:coding coding-system=
Post by Pierre Téchoueyres
+different codign system than the main org-file.
Org file.
Post by Pierre Téchoueyres
+#+begin_example
+,#+INCLUDE: "myfile.cmd" src cmd :coding cp850-dos
+#+end_example
Note that we are freezing Org 9.2 right now. This will go into master
once 9.2 is out.
Post by Pierre Téchoueyres
+ (coding
+ (intern (or (and (string-match
+ ":coding +\\<\\([a-z0-9\\-]+\\)\\>" value)
+ (prog1 (match-string 1 value)
+ (setq value (replace-match "" nil nil value))))
+ (symbol-name coding-system-for-read))))
(coding
(or (and (string-match ":coding +\\([a-z0-9\\-]+\\)" value)
(prog1 (intern (match-string 1 value))
(setq value (replace-match "" nil nil value))))
coding-system-for-read))

In Lisp, symbol constituents are all words plus $&*+-_<>

We might want to update the regexp accordingly.
Post by Pierre Téchoueyres
+Subject: [PATCH] Correctly convert encoding of included files
+
+* lisp/ox.el (org-export--prepare-file-contents): convert temporary
+ buffer encoding to org buffer's encoding.
Org buffer's encoding.

Could you send an updated patch?

Regards,
--
Nicolas Goaziou 0x80A93738
Pierre Téchoueyres
2018-04-23 19:44:29 UTC
Permalink
Hello Nicolas,
I think I've corrected all points. You'll find new versions attached.

Would you mind consider to include the patch for the detection of
encoding with the #+include keyword in 9.2 release ?
Nicolas Goaziou
2018-04-24 21:59:57 UTC
Permalink
Hello,
Post by Pierre Téchoueyres
I think I've corrected all points. You'll find new versions attached.
Thank you.
Post by Pierre Téchoueyres
Would you mind consider to include the patch for the detection of
encoding with the #+include keyword in 9.2 release ?
This patch is still missing some small parts for proper integration,
namely documentation, and, if possible, a couple of tests. Besides, 9.2
branch is supposedly frozen.

Granted, it doesn't seem too harmful, but is there any strong reason to
integrate it in Org 9.2 (assuming documentation is ready)?
Post by Pierre Téchoueyres
+ (coding
+ (intern (or (and (string-match
+ ":coding[[:space:]]+\\_<\\(\\(?:\\sw\\|\\$\\|&\\|\\*\\|\\+\\|-\\|_\\|<\\|>\\)+\\)\\_>" value)
+ (prog1 (match-string 1 value)
+ (setq value (replace-match "" nil nil value))))
+ (symbol-name coding-system-for-read))))
I suggested a refactoring that you didn't integrate: it seems wasteful
to call `intern' on the return value of `symbol-name'.

Besides, my suggestion about the regexp was wrong. We shouldn't make the
syntax foolproof. I think

":coding +\\(\\S-+\\)"

is enough actually. Sorry about sending you in the wrong track.

Regards,
--
Nicolas Goaziou
Pierre Téchoueyres
2018-04-24 22:57:09 UTC
Permalink
Hello,
Post by Nicolas Goaziou
Hello,
Post by Pierre Téchoueyres
I think I've corrected all points. You'll find new versions attached.
Thank you.
Post by Pierre Téchoueyres
Would you mind consider to include the patch for the detection of
encoding with the #+include keyword in 9.2 release ?
This patch is still missing some small parts for proper integration,
namely documentation, and, if possible, a couple of tests. Besides, 9.2
branch is supposedly frozen.
I argree for the documentation and tests (but I have to admit I don't
know how to add them).
Post by Nicolas Goaziou
Granted, it doesn't seem too harmful, but is there any strong reason to
integrate it in Org 9.2 (assuming documentation is ready)?
I think I wasn't clear enough : I had hope you will only include the
part which correct the decoding of include keyword, not the whole two
patchs. I think the former is simply a bug fixes.
Post by Nicolas Goaziou
Post by Pierre Téchoueyres
+ (coding
+ (intern (or (and (string-match
+
":coding[[:space:]]+\\_<\\(\\(?:\\sw\\|\\$\\|&\\|\\*\\|\\+\\|-\\|_\\|<\\|>\\)+\\)\\_>"
value)
+ (prog1 (match-string 1 value)
+ (setq value (replace-match "" nil nil value))))
+ (symbol-name coding-system-for-read))))
I suggested a refactoring that you didn't integrate: it seems wasteful
to call `intern' on the return value of `symbol-name'.
Besides, my suggestion about the regexp was wrong. We shouldn't make the
syntax foolproof. I think
":coding +\\(\\S-+\\)"
is enough actually. Sorry about sending you in the wrong track.
Regards,
Here is a new amended patch.
Pierre Téchoueyres
2018-05-04 22:41:37 UTC
Permalink
Hello Nicolas,
Did you have time to review the patches ?
Post by Nicolas Goaziou
Hello,
Post by Nicolas Goaziou
Hello,
Post by Pierre Téchoueyres
I think I've corrected all points. You'll find new versions attached.
Thank you.
Post by Pierre Téchoueyres
Would you mind consider to include the patch for the detection of
encoding with the #+include keyword in 9.2 release ?
This patch is still missing some small parts for proper integration,
namely documentation, and, if possible, a couple of tests. Besides, 9.2
branch is supposedly frozen.
I argree for the documentation and tests (but I have to admit I don't
know how to add them).
Post by Nicolas Goaziou
Granted, it doesn't seem too harmful, but is there any strong reason to
integrate it in Org 9.2 (assuming documentation is ready)?
I think I wasn't clear enough : I had hope you will only include the
part which correct the decoding of include keyword, not the whole two
patchs. I think the former is simply a bug fixes.
Post by Nicolas Goaziou
Post by Pierre Téchoueyres
+ (coding
+ (intern (or (and (string-match
+
":coding[[:space:]]+\\_<\\(\\(?:\\sw\\|\\$\\|&\\|\\*\\|\\+\\|-\\|_\\|<\\|>\\)+\\)\\_>"
value)
+ (prog1 (match-string 1 value)
+ (setq value (replace-match "" nil nil value))))
+ (symbol-name coding-system-for-read))))
I suggested a refactoring that you didn't integrate: it seems wasteful
to call `intern' on the return value of `symbol-name'.
Besides, my suggestion about the regexp was wrong. We shouldn't make the
syntax foolproof. I think
":coding +\\(\\S-+\\)"
is enough actually. Sorry about sending you in the wrong track.
Regards,
Here is a new amended patch.
Regards,
Pierre Téchoueyres
2018-05-14 23:44:08 UTC
Permalink
Hello,
And sorry for the delay.
Post by Nicolas Goaziou
Hello,
Post by Pierre Téchoueyres
Hello Nicolas,
Did you have time to review the patches ?
Sorry for the delay, I have been sidetracked.
I admit I don't fully understand your bugfix patch, i.e., "[PATCH]
Correctly convert encoding of included files".
(with-temp-buffer
- (insert-file-contents file)
+ (let ((org-buffer-coding-system buffer-file-coding-system))
+ (insert-file-contents file)
+ (unless (eq org-buffer-coding-system buffer-file-coding-system)
+ (set-buffer-file-coding-system org-buffer-coding-system)))
You pretend `org-buffer-coding-system' is storing coding-system from the
Org buffer, but the let-binding happens from within `with-temp-buffer'.
So the coding system comes from the temporary buffer instead.
Yes, that's true, but the coding system of the temporary buffer is
inherited of the one of the org file (At least it's what I've found in
my tests). But my be should I add a comment to explain that.
Post by Nicolas Goaziou
Also, `insert-file-contents' is not supposed to change coding system, or
is it? So when would the `unless' be triggered?
For the tests I've made and the examples I had sent, it seem to change
the coding system.
Post by Nicolas Goaziou
Could you explain a bit the issue you are fixing here?
Thank you.
Regards,
I hoped the example Ive had sent was clear enough.

I'm using org on an Windows machine where the default encoding seem to
be iso-latin-1-dos.
But I edit my org files in utf-8 encoding and of course I want to
include some files as source files for reference. Thoses files are
sometime encoded wit an different coding system (espetially cmd files
with comments in french : cp850).

I use principally those coding ystems :
- cmd : cp850
- org : utf-8 or iso8859-15
- sql : window1252-dos

But when I include thoses files their content isn't correctly inserted,
especially the accents.
Hope I'me clearer now.


I reattach the examples, but note that the cmd.txt or sh.txt extensions
are only there to avoid my mail to be wiped out.

Regards,

Continue reading on narkive:
Search results for '[O] [PATCH] Add new keyword :coding for #+include directive' (Questions and Answers)
6
replies
c language based project titles plz give me?
started 2006-10-19 04:08:28 UTC
programming & design
Loading...