[O] org babel support for tcl and awk

Discussion:

o***@h-rd.org

2011-05-24 09:31:09 UTC

Hi,

I am looking for support for Tcl (and AWK) for org-babel. Both have a
supplied emacs mode and Tcl also has an inferior interpreter mode. I
was trying to do it myself, however I am quite lost in the
instructions. Is there someone with the knowledge and willingness to
provide a support file for org babel for Tcl (and maybe awk)?

thanks.

Eric Schulte

2011-05-24 12:51:58 UTC

Permalink

Hi,

Are you aware of the ob-template.el file [1], which can be used as a
jumping off point to simplify the addition of new languages? After
globally replacing the term "template" with you language name, the only
function that necessarily needs to be re-written is the main
`org-babel-execute:template' function.

I would recommend starting with only non-session based evaluation, and
then slowly adding functionality. If you run into any specific problems
I am happy to help trouble shoot.

As an example, I've worked up an very simple ob-awk.el file from
ob-template.el, it is attached along with an example org-mode file which
demonstrates its usage.

Best -- Eric

Eric S Fraga

2011-05-24 17:53:24 UTC

Permalink

Eric Schulte <***@gmail.com> writes:

[...]

Post by Eric Schulte
As an example, I've worked up an very simple ob-awk.el file from
ob-template.el, it is attached along with an example org-mode file which
demonstrates its usage.

Eric,

this is great to see as I use awk quite often. What is involved in
extending this to be able to run an awk script on input from within the
org file (output of another babel block, for instance, as my typical use
of awk is to re-arrange output from another program...)? Or, if you
wish, can you suggest one of the ob-XXX modules that best illustrates
how to do this and I can give it a try?

Thanks,
eric

--
: Eric S Fraga (GnuPG: 0xC89193D8FFFCF67D) in Emacs 24.0.50.1
: using Org-mode version 7.5 (release_7.5.298.g7c436)

Eric Schulte

2011-05-24 19:03:41 UTC

Permalink

Post by Eric S Fraga
[...]

Post by Eric Schulte
As an example, I've worked up an very simple ob-awk.el file from
ob-template.el, it is attached along with an example org-mode file which
demonstrates its usage.

Eric,
this is great to see as I use awk quite often. What is involved in
extending this to be able to run an awk script on input from within the
org file (output of another babel block, for instance, as my typical use
of awk is to re-arrange output from another program...)? Or, if you
wish, can you suggest one of the ob-XXX modules that best illustrates
how to do this and I can give it a try?

I've made a quick change so that any variable named "stdin" is treated
specially, in that, rather than using its value to replace strings of
$stdin in the text of the awk code, the value of the stdin variable is
saved into the file processed by awk. This allows awk to operate over
Org-mode references.

See the attached example file.

If babel code block supported a pipe or an actual stdin header argument,
that would be the ideal way to add this behavior, but currently nothing
of that nature exists.

Please let me know if this misses part of your suggestion, or more
generally what else may be advisable before we add this to the core.

Cheers -- Eric

Sebastien Vauban

2011-05-24 19:55:39 UTC

Permalink

Hi Eric(s),

Post by Eric Schulte

this is great to see as I use awk quite often. What is involved in
extending this to be able to run an awk script on input from within the org
file (output of another babel block, for instance, as my typical use of awk
is to re-arrange output from another program...)? Or, if you wish, can you
suggest one of the ob-XXX modules that best illustrates how to do this and
I can give it a try?

I've made a quick change so that any variable named "stdin" is treated
specially, in that, rather than using its value to replace strings of $stdin
in the text of the awk code, the value of the stdin variable is saved into
the file processed by awk. This allows awk to operate over Org-mode
references.
If babel code block supported a pipe or an actual stdin header argument,
that would be the ideal way to add this behavior, but currently nothing of
that nature exists.
Please let me know if this misses part of your suggestion, or more generally
what else may be advisable before we add this to the core.

Could this be implemented for sh as well?

AFAI understand, this is exactly the missing piece for me to be able to:

- run consecutive partial blocks of code in my Org buffer (seeing what they
really do on input data),

- export the full list of block as a script.

This was described in
http://www.mail-archive.com/emacs-***@gnu.org/msg36976.html, and still
impossible -- for me! -- to do right now. But I'm not very sure I wrote my
thoughts in an enough understandable way -- maybe not clear enough to me?

Best regards,
Seb

--
Sébastien Vauban

Eric Schulte

2011-05-24 23:51:01 UTC

Permalink

Post by Sebastien Vauban
Hi Eric(s),

Post by Eric Schulte

I've made a quick change so that any variable named "stdin" is treated
specially, in that, rather than using its value to replace strings of $stdin
in the text of the awk code, the value of the stdin variable is saved into
the file processed by awk. This allows awk to operate over Org-mode
references.
If babel code block supported a pipe or an actual stdin header argument,
that would be the ideal way to add this behavior, but currently nothing of
that nature exists.
Please let me know if this misses part of your suggestion, or more generally
what else may be advisable before we add this to the core.

Could this be implemented for sh as well?

Hi Seb,

Unfortunately this simple hack for ob-awk does not address the need you
link to below -- which I am aware of and which is on my list of larger
longer-term Babel development items. I think that a future piping
implementation will be the ultimate solution to the issues you address.

Such an implementation -- allowing data to flow between concurrently
executing blocks utilizing posix pipes -- will require more
sophisticated processes interaction and possibly some form of
multi-threaded elisp execution.

Best -- Eric

Post by Sebastien Vauban
- run consecutive partial blocks of code in my Org buffer (seeing what
they really do on input data),
- export the full list of block as a script.
This was described in
impossible -- for me! -- to do right now. But I'm not very sure I wrote my
thoughts in an enough understandable way -- maybe not clear enough to me?
Best regards,
Seb

--
Eric Schulte
http://cs.unm.edu/~eschulte/

Sebastien Vauban

2011-05-25 12:30:01 UTC

Permalink

Hi Eric,

Post by Sebastien Vauban

any variable named "stdin" is treated specially, in that, rather than
using its value to replace strings of $stdin in the text of the awk code,
the value of the stdin variable is saved into the file processed by awk.
This allows awk to operate over Org-mode references.
If babel code block supported a pipe or an actual stdin header argument,
that would be the ideal way to add this behavior, but currently nothing of
that nature exists.
Please let me know if this misses part of your suggestion, or more
generally what else may be advisable before we add this to the core.

Could this be implemented for sh as well?

Unfortunately this simple hack for ob-awk does not address the need you link
to below -- which I am aware of and which is on my list of larger
longer-term Babel development items. I think that a future piping
implementation will be the ultimate solution to the issues you address.

Glad to hear you understand my wish. It's not always easy to express myself in
a very clean, with English not being my mother tongue, especially when trying
to tackle difficult subjects.

Such an implementation -- allowing data to flow between concurrently
executing blocks utilizing posix pipes -- will require more sophisticated
processes interaction and possibly some form of multi-threaded elisp
execution.

Just for the sake of clarity, I don't need concurrent or multi-threaded
execution of any kind.

My double-sided goal is:

1. to cut a shell script in small parts, and explain what every part does,
with a runnable example (=C-c C-v C-e=).

2. to tangle the executable script out of the Babel document, by concatenating
all its parts (=C-c C-v C-t=).

A quite "dumb" example follows. I've made it as _minimal_ and as _complete_ as
possible, to be able to _express my point_, for further reference.

* Abstract

This script "americanizes" a European CSV file.

* Sample data

The following is a sample CSV file:

#+results: sample-csv
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example

* Script

What the script must do is:

** Load the data

Read the raw contents of the input file.

#+srcname: load-data
#+begin_src sh :var data=sample-csv :results output :exports both
echo "$data"
#+end_src

#+results: load-data
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example

** Convert the date in American format

Convert the date in =MM/DD/YYYY= format.

#+srcname: convert-date
#+begin_src sh :var data=load-data :results output :exports both
echo "$data" |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g'
#+end_src

#+results: convert-date
#+begin_example
Date;Amount;Account
28/05/202010;-6.806,25;999-1974050-30
04/06/202009;420,00;999-1500974-23
24/02/202009;-54,93;999-1974050-30
#+end_example

** Convert the separators

Apply the following operations in order to "americanize" the CSV file received
from the bank:

- remove the dot used as thousands separator (=.= -> ==)
- replace the comma used as decimal separator by a dot (=,= -> =.=)
- replace other commas by a dot (=,= -> =.=)
- replace the semi-comma used as field separator by a comma (=;= -> =,=)

#+srcname: convert-separators
#+begin_src sh :var data=convert-date :results output :exports both
echo "$data" |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'
#+end_src

#+results: convert-separators
#+begin_example
Date,Amount,Account
28/05/202010,-6806.25,999-1974050-30
04/06/202009,420.00,999-1500974-23
24/02/202009,-54.93,999-1974050-30
#+end_example

* Full code

The script is then:

#+begin_src sh :tangle americanize-csv.sh :noweb yes
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
<<convert-date>> |\
<<convert-separators>>

exit 0

# americanize-csv.sh ends here
#+end_src

As you can see, the tangled script is not executable anymore, as I've been
forced to put =echo $data= commands, in every apart code block, as their first
command to run.

#+begin_src sh
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
echo "$data" |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g' |\
echo "$data" |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'

exit 0

# americanize-csv.sh ends here
#+end_src

Would I have the possibility to play with =stdin=, I could have "hidden" that
first line, and assume all the code I'm writing will be executed against
what's read on =stdin=. As well in the Org buffer, as in the stand-alone shell
script. Right?

#+begin_src sh
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g' |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'

exit 0

# americanize-csv.sh ends here
#+end_src

* Conclusions

As you can see, I did not really mean any concurrent execution. Simply being
able to execute parts of code in-situ, in the Org buffer, to document (and
test) what I'm writing.

And to be able to assemble all the parts in one single script file, by the
means of literate programming.

Best regards,
Seb

--
Sébastien Vauban

Eric Schulte

2011-05-25 15:57:14 UTC

Permalink

Post by Sebastien Vauban
* Conclusions
As you can see, I did not really mean any concurrent execution. Simply being
able to execute parts of code in-situ, in the Org buffer, to document (and
test) what I'm writing.
And to be able to assemble all the parts in one single script file, by the
means of literate programming.

I see, you want to be able to construct a large pipe chain STDOUT>STDIN,
but you don't care if the parts of the chain (e.g., the code block)
execute in serial or concurrently (as they do in the shell).

The attached patch (can be applied with "git am") implements this
behavior as I understand it. The result is a new :stdin header argument
with which org-mode references can be passed to shell scripts as
standard input. Given the technique used in this patch, I'll probably
re-write part of ob-awk.el.

The following Org-mode snippet demonstrates it's use, please let me know
if this works for your use-case described above.

Sebastien Vauban

2011-05-26 11:18:43 UTC

Permalink

Hi Eric,

Post by Eric Schulte

Post by Sebastien Vauban
As you can see, I did not really mean any concurrent execution. Simply
being able to execute parts of code in-situ, in the Org buffer, to document
(and test) what I'm writing.
And to be able to assemble all the parts in one single script file, by the
means of literate programming.

I see, you want to be able to construct a large pipe chain STDOUT>STDIN,

That's it!

Post by Eric Schulte
but you don't care if the parts of the chain (e.g., the code block) execute
in serial or concurrently (as they do in the shell).

For me, there is no concept of serial or concurrent execution here, as I am
executing manually the calls, when writing the Org document.

Not sure to understand you...

Are you talking of what happens for the export, maybe?

Are you talking of the shell constructs which will be used in the way the
"full script" is assembled?

Post by Eric Schulte
The attached patch (can be applied with "git am") implements this
behavior as I understand it. The result is a new :stdin header argument
with which org-mode references can be passed to shell scripts as
standard input. Given the technique used in this patch, I'll probably
re-write part of ob-awk.el.

Your patch is simply wonderful. It completely meets my need! Thanks a lot.

Look with the updated (and, now working) example of yesterday.

* Abstract

This script "americanizes" a European CSV file.

* Sample data

The following is a sample CSV file:

#+results: sample-csv
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example

This input data will be used to show what the results of the transformations
are.

* Script

What the script must do is:

** Convert the date in American format

Convert the date in =MM/DD/YYYY= format.

#+srcname: convert-date
#+begin_src sh :stdin sample-csv :results output :exports both
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g'
#+end_src

#+results: convert-date
#+begin_example
Date;Amount;Account
05/28/2010;-6.806,25;999-1974050-30
06/04/2009;420,00;999-1500974-23
02/24/2009;-54,93;999-1974050-30
#+end_example

** Convert the separators

Apply the following operations in order to "americanize" the CSV file received
from the bank:

- remove the dot used as thousands separator (=.= -> ==)
- replace the comma used as decimal separator by a dot (=,= -> =.=)
- replace other commas by a dot (=,= -> =.=)
- replace the semi-comma used as field separator by a comma (=;= -> =,=)

#+srcname: convert-separators
#+begin_src sh :stdin convert-date :results output :exports both
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'
#+end_src

#+results: convert-separators
#+begin_example
Date,Amount,Account
05/28/2010,-6806.25,999-1974050-30
06/04/2009,420.00,999-1500974-23
02/24/2009,-54.93,999-1974050-30
#+end_example

* Full code

The script is then:

#+begin_src sh :tangle americanize-csv.sh :noweb yes
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
<<convert-date>> |\
<<convert-separators>>

exit 0

# americanize-csv.sh ends here
#+end_src

* Conclusions

The new =stdin= option allows one to:

- execute parts of code in-situ, in the Org buffer, documenting (and testing)
them.

- assemble all the parts in one single script file, by the means of literate
programming.

Go for applying it!

Thanks a lot, Eric, for your time.

Best regards,
Seb

--
Sébastien Vauban

Eric Schulte

2011-05-26 13:37:12 UTC

Permalink

Post by Sebastien Vauban
Go for applying it!

Great, happy it works. I've just pushed this up to the git repository.

Post by Sebastien Vauban
Thanks a lot, Eric, for your time.

Its my pleasure. Best -- Eric

Post by Sebastien Vauban
Best regards,
Seb

--
Eric Schulte
http://cs.unm.edu/~eschulte/

Eric Schulte

2011-05-26 13:03:34 UTC

Permalink

Post by Eric Schulte

Post by Eric S Fraga
[...]

Post by Eric Schulte
As an example, I've worked up an very simple ob-awk.el file from
ob-template.el, it is attached along with an example org-mode file which
demonstrates its usage.

Eric,
this is great to see as I use awk quite often. What is involved in
extending this to be able to run an awk script on input from within the
org file (output of another babel block, for instance, as my typical use
of awk is to re-arrange output from another program...)? Or, if you
wish, can you suggest one of the ob-XXX modules that best illustrates
how to do this and I can give it a try?

I've made a quick change so that any variable named "stdin" is treated
specially, in that, rather than using its value to replace strings of
$stdin in the text of the awk code, the value of the stdin variable is
saved into the file processed by awk. This allows awk to operate over
Org-mode references.
See the attached example file.
If babel code block supported a pipe or an actual stdin header argument,
that would be the ideal way to add this behavior, but currently nothing
of that nature exists.
Please let me know if this misses part of your suggestion, or more
generally what else may be advisable before we add this to the core.

I've now added ob-awk.el to the Org-mode core. The newest version
incorporates some change inspired by recent work with Sebastien, notably
:stdin is now its own header argument, rather than a special variable
name.

Best -- Eric

--
Eric Schulte
http://cs.unm.edu/~eschulte/

Eric S Fraga

2011-05-26 15:15:54 UTC

Permalink

Eric Schulte <***@gmail.com> writes:

[...]

Post by Eric Schulte
I've now added ob-awk.el to the Org-mode core. The newest version
incorporates some change inspired by recent work with Sebastien, notably
:stdin is now its own header argument, rather than a special variable
name.
Best -- Eric

Thanks Eric.

My apologies for not trying out your interim solution but I have been
bogged down by marking examination scripts... the temptation to play
with org + babel had to be resisted! Anyway, from the discussion in
this thread related to stdin for sh scripts, it sounds like the final
solution is cleaner and more consistent!

Thanks again,
eric

--
: Eric S Fraga (GnuPG: 0xC89193D8FFFCF67D) in Emacs 24.0.50.1
: using Org-mode version 7.5 (release_7.5.311.g5c1cc)

o***@h-rd.org

2011-05-24 18:57:20 UTC

Permalink

Hi Eric,

yes I am aware of op-template and tried to use it. However it was not
clear to me how to proceed and I looked into ob-perl, ob-ruby,
ob-scheme and ob-python. But it seemed to me they use a different
structure than op-template and I was stuck. I also saw that the file
ob-template is documented, however I am not so good in org-emacs-speak
that I can decipher it.

Really thanks for the awk example, I will tru to study it.

thanks.

Post by Eric Schulte
Hi,
Are you aware of the ob-template.el file [1], which can be used as a
jumping off point to simplify the addition of new languages? After
globally replacing the term "template" with you language name, the only
function that necessarily needs to be re-written is the main
`org-babel-execute:template' function.
I would recommend starting with only non-session based evaluation, and
then slowly adding functionality. If you run into any specific problems
I am happy to help trouble shoot.
As an example, I've worked up an very simple ob-awk.el file from
ob-template.el, it is attached along with an example org-mode file which
demonstrates its usage.
Best -- Eric