Hi Eric,
Post by Sebastien Vaubanany variable named "stdin" is treated specially, in that, rather than
using its value to replace strings of $stdin in the text of the awk code,
the value of the stdin variable is saved into the file processed by awk.
This allows awk to operate over Org-mode references.
If babel code block supported a pipe or an actual stdin header argument,
that would be the ideal way to add this behavior, but currently nothing of
that nature exists.
Please let me know if this misses part of your suggestion, or more
generally what else may be advisable before we add this to the core.
Could this be implemented for sh as well?
Unfortunately this simple hack for ob-awk does not address the need you link
to below -- which I am aware of and which is on my list of larger
longer-term Babel development items. I think that a future piping
implementation will be the ultimate solution to the issues you address.
Glad to hear you understand my wish. It's not always easy to express myself in
a very clean, with English not being my mother tongue, especially when trying
to tackle difficult subjects.
Such an implementation -- allowing data to flow between concurrently
executing blocks utilizing posix pipes -- will require more sophisticated
processes interaction and possibly some form of multi-threaded elisp
execution.
Just for the sake of clarity, I don't need concurrent or multi-threaded
execution of any kind.
My double-sided goal is:
1. to cut a shell script in small parts, and explain what every part does,
with a runnable example (=C-c C-v C-e=).
2. to tangle the executable script out of the Babel document, by concatenating
all its parts (=C-c C-v C-t=).
A quite "dumb" example follows. I've made it as _minimal_ and as _complete_ as
possible, to be able to _express my point_, for further reference.
* Abstract
This script "americanizes" a European CSV file.
* Sample data
The following is a sample CSV file:
#+results: sample-csv
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example
* Script
What the script must do is:
** Load the data
Read the raw contents of the input file.
#+srcname: load-data
#+begin_src sh :var data=sample-csv :results output :exports both
echo "$data"
#+end_src
#+results: load-data
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example
** Convert the date in American format
Convert the date in =MM/DD/YYYY= format.
#+srcname: convert-date
#+begin_src sh :var data=load-data :results output :exports both
echo "$data" |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g'
#+end_src
#+results: convert-date
#+begin_example
Date;Amount;Account
28/05/202010;-6.806,25;999-1974050-30
04/06/202009;420,00;999-1500974-23
24/02/202009;-54,93;999-1974050-30
#+end_example
** Convert the separators
Apply the following operations in order to "americanize" the CSV file received
from the bank:
- remove the dot used as thousands separator (=.= -> ==)
- replace the comma used as decimal separator by a dot (=,= -> =.=)
- replace other commas by a dot (=,= -> =.=)
- replace the semi-comma used as field separator by a comma (=;= -> =,=)
#+srcname: convert-separators
#+begin_src sh :var data=convert-date :results output :exports both
echo "$data" |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'
#+end_src
#+results: convert-separators
#+begin_example
Date,Amount,Account
28/05/202010,-6806.25,999-1974050-30
04/06/202009,420.00,999-1500974-23
24/02/202009,-54.93,999-1974050-30
#+end_example
* Full code
The script is then:
#+begin_src sh :tangle americanize-csv.sh :noweb yes
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format
# Usage: americanize-csv FILE.CSV
cat $1 |\
<<convert-date>> |\
<<convert-separators>>
exit 0
# americanize-csv.sh ends here
#+end_src
As you can see, the tangled script is not executable anymore, as I've been
forced to put =echo $data= commands, in every apart code block, as their first
command to run.
#+begin_src sh
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format
# Usage: americanize-csv FILE.CSV
cat $1 |\
echo "$data" |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g' |\
echo "$data" |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'
exit 0
# americanize-csv.sh ends here
#+end_src
Would I have the possibility to play with =stdin=, I could have "hidden" that
first line, and assume all the code I'm writing will be executed against
what's read on =stdin=. As well in the Org buffer, as in the stand-alone shell
script. Right?
#+begin_src sh
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format
# Usage: americanize-csv FILE.CSV
cat $1 |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g' |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'
exit 0
# americanize-csv.sh ends here
#+end_src
* Conclusions
As you can see, I did not really mean any concurrent execution. Simply being
able to execute parts of code in-situ, in the Org buffer, to document (and
test) what I'm writing.
And to be able to assemble all the parts in one single script file, by the
means of literate programming.
Best regards,
Seb
--
Sébastien Vauban